⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 rfc1922.txt

📁 著名的RFC文档,其中有一些文档是已经翻译成中文的的.
💻 TXT
📖 第 1 页 / 共 4 页
字号:
Network Working Group                                            HF. ZhuRequest for Comments: 1922                                    Tsinghua UCategory: Informational                                           DY. Hu                                                              Tsinghua U                                                                ZG. Wang                                                                    CITS                                                                 TC. Kao                                                                     III                                                              WCH. Chang                                                                     III                                                              M. Crispin                                                            U Washington                                                              March 1996            Chinese Character Encoding for Internet MessagesStatus of this Memo   This memo provides information for the Internet community.  It does   not specify an Internet standard.  Distribution of this memo is   unlimited.Abstract   This memo describes methods of transporting Chinese characters in   Internet services which transport text, such as electronic mail   [RFC-822], network news [RFC-1036], telnet [RFC-854] and the World   Wide Web [RFC-1866].Introduction   As the use of Internet covers more and more Chinese people in the   world, the need has increased for the ability to send documents   containing Chinese characters on the Internet.  The methods described   in this document provide means of transporting existing Chinese   character sets as well as leaving space for future extension.   This document describes two encodings, ISO-2022-CN and   ISO-2022-CN-EXT.  These are designed with interoperability in mind   and are encouraged in this document for current Chinese interchange;   they are 7-bit, support both simplified and traditional characters   using both GB and CNS/Big5, and do not impose any unusual quoting   requirements on ASCII characters.   As important related issues, this document gives detailed   descriptions of the two encodings CN-GB and CN-Big5, and a brief   description of ISO/IEC 10646 [ISO-10646].  CN-GB and CN-Big5 areZhu, et al                   Informational                      [Page 1]RFC 1922               Chinese Character Encoding             March 1996   currently used as the internal codes for Chinese documents.   ISO-10646 is the universal multi-octet character set defined by ISO;   we feel that in the future it may become the preferred technology for   Chinese documents and electronic mail when it is widely available.Specification1.    7-bit Chinese encodings: ISO-2022-CN and ISO-2022-CN-EXT1.1.  Description   ISO-2022-CN is based on ISO 2022 [ISO-2022], similar to earlier work   on ISO-2022-JP [RFC-1468] and ISO-2022-KR [RFC-1557] for the Japanese   and Korean languages respectively.  It is 7-bit, and supports both   simplified Chinese characters using GB 2312-80 [GB-2312] and   traditional Chinese characters using the first two planes of CNS   11643 [CNS-11643], as well as ASCII [ASCII] characters.   ISO-2022-CN-EXT is a superset of ISO-2022-CN that additionally   supports other GB character sets and planes of CNS 11643.   Since ISO-2022-CN and ISO-2022-CN-EXT are 7-bit encodings, they do   not require the 8-bit SMTP extensions.  ISO-2022-CN supports all the   Chinese characters that appear in Big5 [BIG5].1.2.  ISO-2022-CN   The starting code of ISO-2022-CN is ASCII.  ASCII and Chinese   characters are distinguished by designations (ESC sequences) and   shift functions.   Designations define the Chinese character sets used in the text.   There are three kinds of designations: SOdesignation, SS2designation   and SS3designation.   The SOdesignation is in the form ESC $ ) <F>, where <F> is the "final   character" assigned to the character set by ISO (refer to the ISO   registry [ISOREG] for more details).  The SS2designation is in the   form ESC $ * <F>, and the SS3designation is in the form ESC $ + <F>.   A designation overrides any previous designation for subsequent bytes   in the text.   There are four kinds of shifts: SI, SO, SS2 and SS3.  Shift functions   specify how to interpret the subsequent bytes.   The shift SI (one byte with hexadecimal value 0F) declares that   subsequent bytes are interpreted in ASCII.Zhu, et al                   Informational                      [Page 2]RFC 1922               Chinese Character Encoding             March 1996   The shift SO (one byte with hexadecimal value 0E) declares that   subsequent bytes are interpreted in the character set defined by   SOdesignation.   The shift SS2 (two bytes with hexadecimal values 1B 4E) declares that   the subsequent TWO bytes are interpreted in the character set defined   by SS2designation, after which the previous interpretation (from SI   or SO) is restored.   The shift SS3 (two bytes with hexadecimal values 1B 4F) declares that   the subsequent TWO bytes are interpreted in the character set defined   by SS3designation, after which the previous interpretation (from SI   or SO) is restored.   The escape sequences, shift functions and character sets used in an   ISO-2022-CN text are as follows:    Character sets                                       Shift in with   --------------------------------------------------------------------     ASCII                                                     SI     GB 2312, CNS 11643-plane-1                                SO              CNS 11643-plane-2                                SS2      ESC $ ) A         Indicates the bytes following SO are Chinese                        characters as defined in GB 2312-80, until                        another SOdesignation appears      ESC $ ) G         Indicates the bytes following SO are as defined                        in CNS 11643-plane-1, until another                        SOdesignation appears      ESC $ * H         Indicates the two bytes immediately following                        SS2 is a Chinese character as defined in CNS                        11643-plane-2, until another SS2designation                        appears   If there are any GB or CNS characters on a line, a designation for   the corresponding character set must be used so that each line has   its own character set information and the text can be displayed   correctly when scroll back in a window.  Also, there must be a shift   to ASCII (SI) before the end of the line (i.e., before the CRLF).  In   other words, each line starts in ASCII, and ends in ASCII.      Example: the hex sequence         1b 24 29 41 0e 3d 3b 3b 3b 1b 24 29 47 47 28 5f 50 0f      represents the Chinese word for "Interchange" (jiao huan) twice;Zhu, et al                   Informational                      [Page 3]RFC 1922               Chinese Character Encoding             March 1996      the first time in simplified form using GB-2312 (the 3d 3b 3b 3b      sequence above), and the second time in traditional form using      CNS-11643 (the 47 28 5f 50 sequence above).  The sequence 1b 24 29      41 is the SOdesignation for GB-2312, the 0e is SO to switch to      Chinese from ASCII, the 1b 24 29 47 is the SOdesignation for      CNS-11643 plane 1, and finally the 0f is the SI to return to ASCII      at the end of the line.   The name given to this character encoding is "ISO-2022-CN". This name   is intended to be used as the "charset" parameter in MIME [MIME-1,   MIME-2] messages.      Content-Type: text/plain; charset=iso-2022-cn   The ISO-2022-CN encoding is already in 7-bit form, so it is not   necessary to use a Content-Transfer-Encoding header.   Other restrictions are given in the "Formal Syntax of ISO-2022-CN"   (Section 7.1 of this document).1.3.  ISO-2022-CN-EXT   ISO-2022-CN-EXT supports all characters in existing GB, Big5 and CNS   11643 character sets.   The escape sequences, shift functions and character sets used in an   ISO-2022-CN-EXT text are as follows:    Character sets                                       Shift in with   --------------------------------------------------------------------     ASCII                                                    SI     GB 2312, GB 12345, CNS 11643-plane-1, ISO-IR-165         SO     GB 7589, GB 13131, CNS 11643-plane-2                     SS2     GB 7590, GB 13132 or other new GBs,CNS 11643-plane-3 or  SS3      higher planes of CNS 11643      Note: Currently, there are some GB sets that have not been      registered in ISO. Here <X7589>, <X7590>, <X12345>, <X13131> and      <X13132> represent the final character that will be assigned by      ISO for those sets.  These GB sets shall only be used once these      final characters are assigned.Zhu, et al                   Informational                      [Page 4]RFC 1922               Chinese Character Encoding             March 1996      ESC $ ) A         Indicates the bytes following SO are Chinese                        characters as defined in GB 2312-80, until                        another SOdesignation appears      ESC $ * <X7589>   Indicates the two bytes immediately following                        SS2 is a Chinese character as defined in GB                        7589-87 [GB-7589], until another SS2designation                        appears      ESC $ + <X7590>   Indicates the two bytes immediately following                        SS3 is a Chinese character as defined in GB                        7590-87 [GB-7590], until another SS3designation                        appears      ESC $ ) <X12345>  Indicates the bytes following SO are as defined                        in GB 12345-90 [GB-12345], until another                        SOdesignation appears      ESC $ * <X13131>  Indicates the two bytes immediately following                        SS2 is a Chinese character as defined in GB                        13131-91 [GB-13131], until another                        SS2designation appears      ESC $ + <X13132>  Indicates the two bytes immediately following                        SS3 is a Chinese character as defined in GB                        13132-91 [GB-13131], until another                        SS3designation appears      ESC $ ) E         Indicates the bytes following SO are as defined                        in ISO-IR-165 (for details, see section 2.1),                        until another SOdesignation appears      ESC $ ) G         Indicates the bytes following SO are as defined                        in CNS 11643-plane-1, until another                        SOdesignation appears      ESC $ * H         Indicates the two bytes immediately following                        SS2 is a Chinese character as defined in CNS                        11643-plane-2, until another SS2designation                        appears      ESC $ + I         Indicates the immediate two bytes following SS3                        is a Chinese character as defined in CNS                        11643-plane-3, until another SS3designation                        appearsZhu, et al                   Informational                      [Page 5]RFC 1922               Chinese Character Encoding             March 1996      ESC $ + J         Indicates the immediate two bytes following SS3                        is a Chinese character as defined in CNS                        11643-plane-4, until another SS3designation                        appears      ESC $ + K         Indicates the immediate two bytes following SS3                        is a Chinese character as defined in CNS                        11643-plane-5, until another SS3designation                        appears      ESC $ + L         Indicates the immediate two bytes following SS3                        is a Chinese character as defined in CNS                        11643-plane-6, until another SS3designation                        appears      ESC $ + M         Indicates the immediate two bytes following SS3                        is a Chinese character as defined in CNS                        11643-plane-7, until another SS3designation                        appears   As in ISO-2022-CN, each line starts in ASCII, and ends in ASCII, and   has its own designation information before any Chinese characters   appear.   The name given to this character encoding is "ISO-2022-CN-EXT". This   name is intended to be used as the "charset" parameter in MIME   messages.      Content-Type: text/plain; charset=ISO-2022-CN-EXT   The ISO-2022-CN-EXT encoding is also in 7-bit form, so it is not   necessary to use a Content-Transfer-Encoding header.   Other restrictions are given in the "Formal Syntax of   ISO-2022-CN-EXT" (Section 7.2 of this document).1.4.  How to Support Big5 or other internal codesets with ISO-2022-CN      and ISO-2022-CN-EXT   Since there are many different Chinese internal coding systems   [CJKINF], such as EUC GB, Big5, CCCII (an encoding for library   systems mainly used in Taiwan), GBK (the new standard specification   for Chinese internal code, also is the codepage for Microsoft   simplified Chinese Windows 95) etc., ISO-2022-CN and ISO-2022-CN-EXT,   which are 7-bit and will not lose information during communication   among different codesets,  facilitate interchange between the various   Chinese coding systems in the Internet.Zhu, et al                   Informational                      [Page 6]RFC 1922               Chinese Character Encoding             March 1996   For instance, ISO-2022-CN and ISO-2022-CN-EXT can be used to support   the popular Big5 codeset, because the first two planes of CNS-11643   contain the same Chinese characters as Big5's "common part" except   two duplicate characters.  By the "common part" we mean the part that   is not specific to any Big5 vendor, consisting of 5401 more   frequently used characters in Big5 range 0xA440-0xC67E, 7652 less   frequently used characters in Big5 range 0xC940-0xF9D5, and 441 other   symbols in Big5 range 0xA140-0xA3E0, as defined in Institute for   Information Industry's (III) technical report C-26 (see also [Big5]).   The appendix of this document presents a conversion table for   converting Big5 into CNS-11643, including specific extensions of some   popular vendors.  For other extensions, vendors and implementors of   Big5 products are ENCOURAGED to create detailed conversion tables, in   order to increase interoperability between different coding systems.   Public domain software (binary or C source code) for conversion   between Big5 and CNS-11643 is available on many Internet sites.  At   the time of this writing, the following FTP sites and software are   advertised:   1) Beijing:      ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/convert/big5cns.zip      (IP address: 166.111.1.6)   2) Xi'an:      ftp://ftp.xanet.edu.cn      /pub/chinese-soft/unix/convert/BeTTY-1.534.tar.gz      (IP address: 202.112.11.131)   3) Taiwan:      ftp://ftp.seed.net.tw/Pub/Chinese/DOS/code-convert/chcode.zip      (IP address: 140.92.1.65)   4) US:      ftp://ftp.ifcss.org/pub/software/unix/convert/BeTTY-1.534.tar.gz      (IP address: 128.123.1.55)

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -