📄 rfc1642.ps
字号:
gS 40 662 468 26 rC41 684 :Mf0_12 sf-.062(Goldsmith & Davis)A467 684 :M-.063([Page )A497 684 :M(3)S503 684 :M(])Sendp%%Page: 4 4%%BeginPageSetupinitializepage(David Goldsmith; page: 4 of 14)setjob%%EndPageSetupgS 0 0 552 730 rC40 40 468 648 rC76 76 :Mf1_12 sf-.04(Rationale)A125 76 :Mf0_12 sf-.008(. The pad character \322=\323 is excluded because UTF-7 is designed for use within)A76 88 :M-.007(header fields as set forth in RFC 1522. Since the only readable encoding in RFC 1522 is)A76 100 :M-.003(\322Q\323 \(based on RFC 1521\325s Quoted-Printable\), the \322=\323 character is not available for use)A76 112 :M-.011(\(without a lot of escape sequences\). This was very unfortunate but unavoidable.The \322=\323)A76 124 :M-.005(character could otherwise have been used as the UTF-7 escape character as well \(rather)A76 136 :M-.005(than using \322+\323\).)A58 164 :M-.001(Note that all characters in US-ASCII have the same value in Unicode when zero-extended to)A58 176 :M-.048(16 bits.)A41 207 :Mf1_14 sf(UTF-7 Definition)S58 235 :Mf0_12 sf-.01(A UTF-7 stream represents 16-bit Unicode characters in 7-bit US-ASCII as follows:)A76 263 :M-.002(Rule 1: \(direct encoding\) Unicode characters in set D above may be encoded directly as)A76 275 :M-.01(their ASCII equivalents. Unicode characters in Set O may optionally be encoded directly)A76 287 :M-.009(as their ASCII equivalents, bearing in mind that many of these characters are illegal in)A76 299 :M-.008(header fields, or may not pass correctly through some mail gateways.)A76 327 :M-.006(Rule 2: \(Unicode shifted encoding\) Any Unicode character sequence may be encoded)A76 340 :M-.002(using a sequence of characters in set B, when preceded by the shift character \322)A452 340 :Mf3_12 sf(+)S459 340 :Mf0_12 sf-.129(\323 \(US-)A76 353 :M-.006(ASCII character value decimal 43\). The \322)A276 353 :Mf3_12 sf(+)S283 353 :Mf0_12 sf-.006(\323\312signals that subsequent octets are to be)A76 365 :M-.01(interpreted as elements of the Modified Base64 alphabet until a character not in that)A76 377 :M-.005(alphabet is encountered. Such characters include control characters such as carriage)A76 389 :M-.002(returns and line feeds; thus, a Unicode shifted sequence always terminates at the end of a)A76 402 :M-.011(line. As a special case, if the sequence terminates with the character \322)A410 402 :Mf3_12 sf(-)S417 402 :Mf0_12 sf-.097(\323 \(US-ASCII)A76 414 :M-.001(decimal 45\) then that character is absorbed; other terminating characters are not absorbed)A76 426 :M-.023(and are processed normally.)A76 454 :Mf1_12 sf-.04(Rationale)A125 454 :Mf0_12 sf-.005(. A terminating character is necessary for cases where the next character after)A76 466 :M-.005(the Modified Base64 sequence is part of character set B. It can also enhance readability)A76 478 :M-.031(by delimiting encoded sequences.)A76 507 :M-.024(Also as a special case, the sequence \322)A256 507 :Mf3_12 sf-.397(+-)A270 507 :Mf0_12 sf-.015(\323 may be used to encode the character \322)A460 507 :Mf3_12 sf(+)S467 507 :Mf0_12 sf-.33(\323. A)A76 520 :M(\322)S81 520 :Mf3_12 sf(+)S88 520 :Mf0_12 sf-.001(\323 character followed immediately by any character other than members of set B or \322)A491 520 :Mf3_12 sf(-)S498 520 :Mf0_12 sf(\323)S76 532 :M-.038(is an ill-formed sequence.)A76 560 :M-.002(Unicode is encoded using Modified Base64 by first converting Unicode 16-bit quantities)A76 572 :M-.01(to an octet stream \(with the most significant octet first\). Text with an odd number of)A76 584 :M-.031(octets is ill-formed.)A76 612 :Mf1_12 sf-.04(Rationale)A125 612 :Mf0_12 sf-.012(. ISO/IEC 10646-1:1993\(E\) specifies that when characters in the UCS-2 form)A76 624 :M-.009(are serialized as octets, that the most significant octet appear first. This is also in keeping)A76 636 :M-.011(with common network practice of choosing a canonical format for transmission.)A40 40 468 26 rC41 50 :M-.097(RFC 1642)A258 50 :M-.164(UTF-7)A460 50 :M(July 1994)SgRgS 40 662 468 26 rC41 684 :Mf0_12 sf-.062(Goldsmith & Davis)A467 684 :M-.063([Page )A497 684 :M(4)S503 684 :M(])Sendp%%Page: 5 5%%BeginPageSetupinitializepage(David Goldsmith; page: 5 of 14)setjob%%EndPageSetupgS 0 0 552 730 rC40 40 468 648 rC76 76 :Mf0_12 sf-.002(Next, the octet stream is encoded by applying the Base64 content transfer encoding)A76 88 :M-.003(algorithm as defined in RFC 1521, modified to omit the \322=\323 pad character. Instead, when)A76 100 :M-.01(encoding, zero bits are added to pad to a Base64 character boundary. When decoding,)A76 112 :M-.002(any bits at the end of the Modified Base64 sequence that do not constitute a complete 16-)A76 124 :M-.001(bit Unicode character are discarded. If such discarded bits are non-zero the sequence is)A76 136 :M-.064(ill-formed.)A76 164 :Mf1_12 sf-.04(Rationale)A125 164 :Mf0_12 sf-.008(. The pad character \322=\323 is not used when encoding Modified Base64 because)A76 176 :M-.009(of the conflict with its use as an escape character for the Q content transfer encoding in)A76 188 :M-.014(RFC 1522 header fields, as mentioned above.)A76 216 :M-.005(Rule 3: The space \(decimal 32\), tab \(decimal 9\), carriage return \(decimal 13\), and line)A76 228 :M-.001(feed \(decimal 10\) characters may be directly represented by their ASCII\312equivalents.)A76 240 :M-.01(However, note that MIME\312content transfer encodings have rules concerning the use of)A76 252 :M-.002(such characters. Usage that does not conform to the restrictions of RFC 822, for example,)A76 264 :M-.002(would have to be encoded using MIME content transfer encodings other than 7bit or 8bit,)A76 276 :M-.006(such as quoted-printable, binary, or base64.)A58 304 :M-.002(Given this set of rules, Unicode characters which may be encoded via rules 1 or 3 take one)A58 316 :M-.005(octet per character, and other Unicode characters are encoded on average with 2 2/3 octets)A58 328 :M-.002(per character plus one octet to switch into Modified Base64 and an optional octet to switch)A58 340 :M-.111(out.)A76 368 :Mf1_12 sf-.055(Example)A121 368 :Mf0_12 sf-.005(. The Unicode sequence \322A<NOT\312IDENTICAL\312TO><ALPHA>.\323)A76 380 :M-.004(\(hexadecimal 0041,2262,0391,002E\) may be encoded as follows:)A148 409 :Mf3_12 sf-.223(A+ImIDkQ.)A76 436 :Mf1_12 sf-.055(Example)A121 436 :Mf0_12 sf-.006(. The Unicode sequence \322Hi Mom <WHITE SMILING FACE>!\323 \(hexadecimal)A76 448 :M-.013(0048,0069,0020,004D,006F,004D,0020,263A,0021\) may be encoded as follows:)A148 477 :Mf3_12 sf-.215(Hi Mom +Jjo-!)A76 504 :Mf1_12 sf-.055(Example)A121 504 :Mf0_12 sf-.01(. The Unicode sequence representing the Han characters for the Japanese word)A76 516 :M-.003(\322nihongo\323 \(hexadecimal 65E5,672C,8A9E\) may be encoded as follows:)A148 545 :Mf3_12 sf-.22(+ZeVnLIqe-)A41 575 :Mf1_14 sf-.01(Use of Character Set UTF-7 Within MIME)A58 603 :Mf0_12 sf-.002(Character set UTF-7 is safe for mail transmission and therefore may be used with any)A58 615 :M-.005(content transfer encoding in MIME \(except where line length and line break restrictions are)A58 627 :M-.009(violated\). Specifically, the 7 bit encoding for bodies and the Q encoding for headers are both)A58 639 :M-.012(acceptable. The MIME character set identifier is UNICODE-1-1-UTF-7.)A40 40 468 26 rC41 50 :M-.097(RFC 1642)A258 50 :M-.164(UTF-7)A460 50 :M(July 1994)SgRgS 40 662 468 26 rC41 684 :Mf0_12 sf-.062(Goldsmith & Davis)A467 684 :M-.063([Page )A497 684 :M(5)S503 684 :M(])Sendp%%Page: 6 6%%BeginPageSetupinitializepage(David Goldsmith; page: 6 of 14)setjob%%EndPageSetupgS 0 0 552 730 rC40 40 468 648 rC76 76 :Mf1_12 sf-.055(Example)A121 76 :Mf0_12 sf-.012(. Here is a text portion of a MIME message containing the Unicode sequence)A76 88 :M-.006(\322Hi Mom <WHITE SMILING FACE>!\323 \(hexadecimal 0048, 0069, 0020, 004D, 006F,)A76 100 :M-.014(004D, 0020, 263A, 0021\).)A94 129 :Mf3_12 sf-.202(Content-Type: text/plain; charset=UNICODE-1-1-UTF-7)A94 153 :M-.215(Hi Mom +Jjo-!)A76 176 :Mf1_12 sf-.055(Example)A121 176 :Mf0_12 sf-.012(. Here is a text portion of a MIME message containing the Unicode sequence)A76 188 :M-.002(representing the Han characters for the Japanese word \322nihongo\323 \(hexadecimal)A76 200 :M-.021(65E5,672C,8A9E\).)A94 229 :Mf3_12 sf-.202(Content-Type: text/plain; charset=UNICODE-1-1-UTF-7)A94 253 :M-.22(+ZeVnLIqe-)A76 276 :Mf1_12 sf-.055(Example)A121 276 :Mf0_12 sf-.012(. Here is a text portion of a MIME message containing the Unicode sequence)A76 288 :M-.015(\322A<NOT\312IDENTICAL\312TO><ALPHA>.\323 \(hexadecimal 0041,2262,0391,002E\).)A94 317 :Mf3_12 sf-.202(Content-Type: text/plain; charset=UNICODE-1-1-UTF-7)A94 341 :M-.223(A+ImIDkQ.)A76 364 :Mf1_12 sf-.055(Example)A121 364 :Mf0_12 sf-.012(. Here is a text portion of a MIME message containing the Unicode sequence)A76 376 :M-.01(\322Item 3 is <POUND\312SIGN>1.\323 \(hexadecimal 0049, 0074, 0065, 006D, 0020, 0033,)A76 388 :M-.024(0020, 0069, 0073, 0020, 00A3, 0031, 002E\).)A94 417 :Mf3_12 sf-.202(Content-Type: text/plain; charset=UNICODE-1-1-UTF-7)A94 441 :M-.211(Item 3 is +AKM-1.)A58 464 :Mf0_12 sf-.002(Note that to achieve the best interoperability with systems that may not support Unicode or)A58 476 :M-.006(MIME, when preparing text for mail transmission line breaks should follow Internet)A58 488 :M-.003(conventions. This means that lines should be short and terminated with the proper SMTP)A58 500 :M-.008(CRLF sequence. Unicode LINE SEPARATOR \(hexadecimal 2028\) and PARAGRAPH)A58 512 :M-.011(SEPARATOR \(hexadecimal 2029\) should be converted to SMTP line breaks. Ideally, this)A58 524 :M-.009(would be handled transparently by a Unicode-aware user agent.)A58 552 :M-.01(This preparation is not absolutely necessary, since UTF-7 and the appropriate MIME)A58 564 :M-.01(content transfer encoding can handle text that does not follow Internet conventions, but)A58 576 :M-.006(readability by systems without Unicode or MIME will be impaired. See RFC 1521 for an in-)A58 588 :M-.013(depth discussion of mail interoperability issues.)A58 616 :M-.006(Lines should never be broken in the middle of a UTF-7 shifted sequence, since such)A58 628 :M-.009(sequences may not cross line breaks. Therefore, UTF-7 encoding should take place after line)A58 640 :M-.009(breaking. If a line containing a shifted sequence is too long after encoding, a MIME content)A58 652 :M-.006(transfer encoding such as Quoted Printable can be used to encode the text. Another)A40 40 468 26 rC41 50 :M-.097(RFC 1642)A258 50 :M-.164(UTF-7)A460 50 :M(July 1994)SgRgS 40 662 468 26 rC41 684 :Mf0_12 sf-.062(Goldsmith & Davis)A467 684 :M-.063([Page )A497 684 :M(6)S503 684 :M(])Sendp%%Page: 7 7%%BeginPageSetupinitializepage(David Goldsmith; page: 7 of 14)setjob%%EndPageSetupgS 0 0 552 730 rC40 40 468 648 rC58 76 :Mf0_12 sf-.006(possibility is to perform line breaking and UTF-7 encoding at the same time, so that lines)A58 88 :M-.008(containing shifted sequences already conform to length restrictions.)A41 119 :Mf1_14 sf(Discussion)S58 147 :Mf0_12 sf-.006(In this section we will motivate the introduction of UTF-7 as opposed to the alternative of)A58 159 :M-.006(using the existing transformation formats of Unicode \(e.g., UTF-8\) with MIME\325s content)A58 171 :M-.006(transfer encodings. Before discussing this, it will be useful to list some assumptions about)A58 183 :M-.002(character frequency within typical natural language text strings that we use to estimate)A58 195 :M-.01(typical storage requirements:)A76 223 :M-.01(1. Most Western European languages use roughly 7/8 of their letters from US-ASCII and)A76 235 :M-.01(1/8 from Latin 1 \(ISO-8859-1\).)A76 263 :M-.002(2. Most non-European alphabet-based languages \(e.g., Greek\) use about 1/6 of their)A76 275 :M-.002(letters from ASCII \(since white space is in the 7-bit area\) and the rest from their)A76 287 :M-.035(alphabets.)A76 315 :M-.002(3. East Asian ideographic-based languages \(including Japanese\) use essentially all of)A76 327 :M-.017(their characters from the Han or CJK syllabary area.)A76 355 :M-.009(4. Non-directly encoded punctuation characters do not occur frequently enough to affect)A76 367 :M-.089(the results.)A58 395 :M-.005(Notice that current 8 bit standards, such as ISO-8859-x, require use of a content transfer)A58 407 :M-.01(encoding. For comparison with the subsequent discussion, the costs break down as follows)A58 419 :M-.002(\(note that many of these figures are approximate since they depend on the exact composition)A58 431 :M-.081(of the text\):)A76 459 :M-.043(8859-x in Base64)A148 487 :M-.081(Text type)A148 488.5 -.5 .5 194.5 488 .5 148 488 @a256 487 :M-.039(Average octets/character)A256 488.5 -.5 .5 375.5 488 .5 256 488 @a148 499 :M-.165(All)A256 499 :M(1.33)S76 523 :M-.039(8859-x in Quoted Printable)A148 551 :M-.081(Text type)A148 552.5 -.5 .5 194.5 552 .5 148 552 @a256 551 :M-.039(Average octets/character)A256 552.5 -.5 .5 375.5 552 .5 256 552 @a148 563 :M-.093(US-ASCII)A256 563 :M(1)S148 575 :M-.063(Western European)A256 575 :M(1.25)S148 587 :M-.079(Other)A256 587 :M(2.67)S58 611 :M-.009(Note also that Unicode encoded in Base64 takes a constant 2.67 octets per character. For)A58 623 :M-.006(purposes of comparison, we will look at UTF-8\312in Base64 and Quoted Printable, and UTF-7.)A58 635 :M-.006(UTF-1 gives results substantially similar to UTF-8. Also note that fixed overhead for long)A58 647 :M-.003(strings is relative to 1/n, where n is the encoded string length in octets.)A40 40 468 26 rC41 50 :M-.097(RFC 1642)A258 50 :M-.164(UTF-7)A460 50 :M(July 1994)SgRgS 40 662 468 26 rC41 684 :Mf0_12 sf-.062(Goldsmith & Davis)A467 684 :M-.063([Page )A497 684 :M(7)S503 684 :M(])Sendp%%Page: 8 8%%BeginPageSetupinitializepage(David Goldsmith; page: 8 of 14)setjob%%EndPageSetupgS 0 0 552 730 rC40 40 468 648 rC76 76 :Mf0_12 sf-.021(UTF-8 in Base64 )A148 104 :M-.081(Text type)A148 105.5 -.5 .5 194.5 105 .5 148 105 @a256 104 :M-.039(Average octets/character)A256 105.5 -.5 .5 375.5 105 .5 256 105 @a148 116 :M-.093(US-ASCII)A256 116 :M(1.33)S148 128 :M-.063(Western European)A256 128 :M(1.5)S148 140 :M-.043(Some Alphabetics)A256 140 :M(2.44)S148 152 :M-.072(All others)A256 152 :M(4)S76 176 :M-.026(UTF-8 in Quoted Printable)A148 204 :M-.081(Text type)A148 205.5 -.5 .5 194.5 205 .5 148 205 @a256 204 :M-.039(Average octets/character)A256 205.5 -.5 .5 375.5 205 .5 256 205 @a148 216 :M-.093(US-ASCII)A256 216 :M(1)S148 228 :M-.063(Western European)A256 228 :M(1.63)S148 240 :M-.043(Some Alphabetics)A256 240 :M(5.17)S148 252 :M-.072(All others)A256 252 :M-.497(7-9)A76 276 :M-.164(UTF-7)A148 304 :M-.081(Text type)A148 305.5 -.5 .5 194.5 305 .5 148 305 @a256 304 :M-.039(Average octets/character)A256 305.5 -.5 .5 375.5 305 .5 256 305 @a148 316 :M-.027(Most US-ASCII)A256 316 :M(1)S148 328 :M-.063(Western European)A256 328 :M(1.5)S148 340 :M
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -