📄 rfc2279.txt
字号:
UCS-2系列"A<NOT IDENTICAL TO><ALPHA>." (0041, 2262, 0391, 002E)用UTF-8编码
如下:
41 E2 89 A2 CE 91 2E
对韩文"hangugo" (D55C, AD6D, C5B4),表示Hangul 字符的UCS-2序列可以编码如下:
ED 95 9C EA B5 AD EC 96 B4
对日文"nihongo" (65E5, 672C, 8A9E),表示汉字的UCS-2序列可以编码如下:
E6 97 A5 E6 9C AC E8 AA 9E
5、MIME注册
本备忘录计划服务于MIME字符集参数 [CHARSET-REG]注册基础。被提到的字符集参
数值是UTF-8。这个字符标签媒介类型包含由ISO/IEC 10646指令组成的字符文本,ISO/IEC
10646包括了直到修正5(韩文组)的所有修正版本。此类型使用上面概述的编码方案进行8
比特字节序列编码。UTF-8适合于在文本的上层类型下使用MIME内容类型
值得注意的是,"UTF-8"标签不包含一般由ISO/IEC 10646提交的版本标识。特意这样做
的原因如下:
MIME字符集标签的设计仅用于给予需要翻译从有线接收的字节序列到字符序列的信
息,而没有其他的用途(参见 RFC 2045, 2.2节[MIME])。只要字符集标准没有不兼容的改变,
版本数字没有意义,因为一方接收到不认识的新分配字符,通过标签的理解得不到任何东西。
标签可能被随时接收,标签自己对新字符不提供任何信息。
因此,只要标准适当地改进,拥有标识版本标签的益处是显而可见,但对依赖于版本的
标签不利因素为:当旧的应用收到一个包含新的不认识标签的数据时,它可能认识标签失败,
而不能完成对数据的处理;而一个普通的熟悉标签会引发大多数正确的数据处理,它可能不
包含任何新的字符。
现今“韩文混乱”(ISO/IEC 10646 修正5)是一种不协调的变化,理论上同上面描述的与
版本无关的MIME字符集标签的适用性相矛盾。但是兼容性问题仅会出现在包含采用Unicode
1.1(或等同的ISO/IEC 10646修正5以前)编码的韩文Hangul字符数据中。可以证明没有这
样的数据值得担心,因此,这是不协调改变可以被接收的主要原因。
实际上,假定标签理解为对修正5以后的所有版本进行引用,并且假定实际不会出现不
协调的改变,则独立于版本的标签是有理由的。由此,除非ISO/IEC 10646以后版本出现不
兼容改变,这里的MIME字符集定义将同以前的版本保持一致,除非IETF明确规定为不同。
也计划注册字符集参数值为"UNICODE-1-1-UTF-8",唯一用途是用于可标签的文本数
据。可标签的文本数据包含没有考虑进ISO/IEC 10646修正5(即修正5前的代码点分配)的
Hangul音节编码成UTF-8。其他的UTF-8数据不应该使用此标签,特别是不包含任何Hangul
音节的数据。非常重要的强烈建议是反对不考虑ISO/IEC 10646修正5的情况下,创建任何
新的包含Hangul的数据。
6、安全考虑
UTF-8实现需要进行安全考虑的方面是如何处理非法的UTF-8序列。可以想象,在某些
环境中攻击者可能进行的攻击是发送一个UTF-8语法不允许的8比特字节序列给不谨慎的
UTF-8分析器。
这种攻击一个特别敏感的形态是攻击分析器。此分析器对输入的UTF-8编码格式执行安
全鉴定有效性检查,但是解释了一些非法的8比特字节作为字符。例如,当遇到单个8比特
字节序列00时,分析器可能禁止NUL字符,但是允许非法的两个8比特字节序列C0 80,
解释它为NUL字符。另一个例子是禁止8比特字节序列2F 2E 2E 2F ("/../")的分析器,允许
非法8比特字节序列2F C0 AE 2E 2F。
鸣谢
下列人员参与本备忘录的起草和讨论:
James E. Agenbroad Andries Brouwer
Martin J. D|rst Ned Freed
David Goldsmith Edwin F. Hart
Kent Karlsson Markus Kuhn
Michael Kung Alain LaBonte
John Gardiner Myers Murray Sargent
Keld Simonsen Arnold Winkler
参考
[CHARSET-REG] Freed, N., and J. Postel, "IANA Charset Registration
Procedures", BCP 19, RFC 2278, January 1998.
[FSS_UTF] X/Open CAE Specification C501 ISBN 1-85912-082-2 28cm.
22p. pbk. 172g. 4/95, X/Open Company Ltd., "File
System Safe UCS Transformation Format (FSS_UTF)",
X/Open Preleminary Specification, Document Number
P316. Also published in Unicode Technical Report #4.
[ISO-10646] ISO/IEC 10646-1:1993. International Standard --
Information technology -- Universal Multiple-Octet
Coded Character Set (UCS) -- Part 1: Architecture and
Basic Multilingual Plane. Five amendments and a
technical corrigendum have been published up to now.
UTF-8 is described in Annex R, published as Amendment
2. UTF-16 is described in Annex Q, published as
Amendment 1. 17 other amendments are currently at
various stages of standardization.
[MIME] Freed, N., and N. Borenstein, "Multipurpose Internet
Mail Extensions (MIME) Part One: Format of Internet
Message Bodies", RFC 2045. N. Freed, N. Borenstein,
"Multipurpose Internet Mail Extensions (MIME) Part
Two: Media Types", RFC 2046. K. Moore, "MIME
(Multipurpose Internet Mail Extensions) Part Three:
Message Header Extensions for Non-ASCII Text", RFC
2047. N. Freed, J. Klensin, J. Postel, "Multipurpose
Internet Mail Extensions (MIME) Part Four:
Registration Procedures", RFC 2048. N. Freed, N.
Borenstein, " Multipurpose Internet Mail Extensions
(MIME) Part Five: Conformance Criteria and Examples",
RFC 2049. All November 1996.
[RFC2152] Goldsmith, D., and M. Davis, "UTF-7: A Mail-safe
Transformation Format of Unicode", RFC 1642, Taligent
inc., May 1997. (Obsoletes RFC1642)
[UNICODE] The Unicode Consortium, "The Unicode Standard --
Version 2.0", Addison-Wesley, 1996.
[US-ASCII] Coded Character Set--7-bit American Standard Code for
Information Interchange, ANSI X3.4-1986.
作者地址
Francois Yergeau
Alis Technologies
100, boul. Alexis-Nihon
Suite 600
Montreal QC H4M 2P2
Canada
Phone: +1 (514) 747-2547
Fax: +1 (514) 747-2561
EMail: fyergeau@alis.com
版权说明
Copyright (C) The Internet Society (1998). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
RFC 2279——UTF-8, a transformation format of ISO 10646 UTF-8,ISO 10646的一种转换格式
7
RFC文档中文翻译计划
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -