rfc2047.txt

来自「< VB高级网络编程技术>>随书源代码第2章,里面有很多有用的」· 文本代码 · 共 844 行 · 第 1/3 页
TXT
844 行






Network Working Group                                           K. Moore
Request for Comments: 2047                       University of Tennessee
Obsoletes: 1521, 1522, 1590                                November 1996
Category: Standards Track


        MIME (Multipurpose Internet Mail Extensions) Part Three:
              Message Header Extensions for Non-ASCII Text

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

   STD 11, RFC 822, defines a message representation protocol specifying
   considerable detail about US-ASCII message headers, and leaves the
   message content, or message body, as flat US-ASCII text.  This set of
   documents, collectively called the Multipurpose Internet Mail
   Extensions, or MIME, redefines the format of messages to allow for

   (1) textual message bodies in character sets other than US-ASCII,

   (2) an extensible set of different formats for non-textual message
       bodies,

   (3) multi-part message bodies, and

   (4) textual header information in character sets other than US-ASCII.

   These documents are based on earlier work documented in RFC 934, STD
   11, and RFC 1049, but extends and revises them.  Because RFC 822 said
   so little about message bodies, these documents are largely
   orthogonal to (rather than a revision of) RFC 822.

   This particular document is the third document in the series.  It
   describes extensions to RFC 822 to allow non-US-ASCII text data in
   Internet mail header fields.









Moore                       Standards Track                     [Page 1]

RFC 2047               Message Header Extensions           November 1996


   Other documents in this series include:

   + RFC 2045, which specifies the various headers used to describe
     the structure of MIME messages.

   + RFC 2046, which defines the general structure of the MIME media
     typing system and defines an initial set of media types,

   + RFC 2048, which specifies various IANA registration procedures
     for MIME-related facilities, and

   + RFC 2049, which describes MIME conformance criteria and
     provides some illustrative examples of MIME message formats,
     acknowledgements, and the bibliography.

   These documents are revisions of RFCs 1521, 1522, and 1590, which
   themselves were revisions of RFCs 1341 and 1342.  An appendix in RFC
   2049 describes differences and changes from previous versions.

1. Introduction

   RFC 2045 describes a mechanism for denoting textual body parts which
   are coded in various character sets, as well as methods for encoding
   such body parts as sequences of printable US-ASCII characters.  This
   memo describes similar techniques to allow the encoding of non-ASCII
   text in various portions of a RFC 822 [2] message header, in a manner
   which is unlikely to confuse existing message handling software.

   Like the encoding techniques described in RFC 2045, the techniques
   outlined here were designed to allow the use of non-ASCII characters
   in message headers in a way which is unlikely to be disturbed by the
   quirks of existing Internet mail handling programs.  In particular,
   some mail relaying programs are known to (a) delete some message
   header fields while retaining others, (b) rearrange the order of
   addresses in To or Cc fields, (c) rearrange the (vertical) order of
   header fields, and/or (d) "wrap" message headers at different places
   than those in the original message.  In addition, some mail reading
   programs are known to have difficulty correctly parsing message
   headers which, while legal according to RFC 822, make use of
   backslash-quoting to "hide" special characters such as "<", ",", or
   ":", or which exploit other infrequently-used features of that
   specification.

   While it is unfortunate that these programs do not correctly
   interpret RFC 822 headers, to "break" these programs would cause
   severe operational problems for the Internet mail system.  The
   extensions described in this memo therefore do not rely on little-
   used features of RFC 822.



Moore                       Standards Track                     [Page 2]

RFC 2047               Message Header Extensions           November 1996


   Instead, certain sequences of "ordinary" printable ASCII characters
   (known as "encoded-words") are reserved for use as encoded data.  The
   syntax of encoded-words is such that they are unlikely to
   "accidentally" appear as normal text in message headers.
   Furthermore, the characters used in encoded-words are restricted to
   those which do not have special meanings in the context in which the
   encoded-word appears.

   Generally, an "encoded-word" is a sequence of printable ASCII
   characters that begins with "=?", ends with "?=", and has two "?"s in
   between.  It specifies a character set and an encoding method, and
   also includes the original text encoded as graphic ASCII characters,
   according to the rules for that encoding method.

   A mail composer that implements this specification will provide a
   means of inputting non-ASCII text in header fields, but will
   translate these fields (or appropriate portions of these fields) into
   encoded-words before inserting them into the message header.

   A mail reader that implements this specification will recognize
   encoded-words when they appear in certain portions of the message
   header.  Instead of displaying the encoded-word "as is", it will
   reverse the encoding and display the original text in the designated
   character set.

NOTES

   This memo relies heavily on notation and terms defined RFC 822 and
   RFC 2045.  In particular, the syntax for the ABNF used in this memo
   is defined in RFC 822, as well as many of the terminal or nonterminal
   symbols from RFC 822 are used in the grammar for the header
   extensions defined here.  Among the symbols defined in RFC 822 and
   referenced in this memo are: 'addr-spec', 'atom', 'CHAR', 'comment',
   'CTLs', 'ctext', 'linear-white-space', 'phrase', 'quoted-pair'.
   'quoted-string', 'SPACE', and 'word'.  Successful implementation of
   this protocol extension requires careful attention to the RFC 822
   definitions of these terms.

   When the term "ASCII" appears in this memo, it refers to the "7-Bit
   American Standard Code for Information Interchange", ANSI X3.4-1986.
   The MIME charset name for this character set is "US-ASCII".  When not
   specifically referring to the MIME charset name, this document uses
   the term "ASCII", both for brevity and for consistency with RFC 822.
   However, implementors are warned that the character set name must be
   spelled "US-ASCII" in MIME message and body part headers.






Moore                       Standards Track                     [Page 3]

RFC 2047               Message Header Extensions           November 1996


   This memo specifies a protocol for the representation of non-ASCII
   text in message headers.  It specifically DOES NOT define any
   translation between "8-bit headers" and pure ASCII headers, nor is
   any such translation assumed to be possible.

2. Syntax of encoded-words

   An 'encoded-word' is defined by the following ABNF grammar.  The
   notation of RFC 822 is used, with the exception that white space
   characters MUST NOT appear between components of an 'encoded-word'.

   encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

   charset = token    ; see section 3

   encoding = token   ; see section 4

   token = 1*<Any CHAR except SPACE, CTLs, and especials>

   especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
               <"> / "/" / "[" / "]" / "?" / "." / "="

   encoded-text = 1*<Any printable ASCII character other than "?"
                     or SPACE>
                  ; (but see "Use of encoded-words in message
                  ; headers", section 5)

   Both 'encoding' and 'charset' names are case-independent.  Thus the
   charset name "ISO-8859-1" is equivalent to "iso-8859-1", and the
   encoding named "Q" may be spelled either "Q" or "q".

   An 'encoded-word' may not be more than 75 characters long, including
   'charset', 'encoding', 'encoded-text', and delimiters.  If it is
   desirable to encode more text than will fit in an 'encoded-word' of
   75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
   be used.

   While there is no limit to the length of a multiple-line header
   field, each line of a header field that contains one or more
   'encoded-word's is limited to 76 characters.

   The length restrictions are included both to ease interoperability
   through internetwork mail gateways, and to impose a limit on the
   amount of lookahead a header parser must employ (while looking for a
   final ?= delimiter) before it can decide whether a token is an
   "encoded-word" or something else.





Moore                       Standards Track                     [Page 4]

RFC 2047               Message Header Extensions           November 1996


   IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
   by an RFC 822 parser.  As a consequence, unencoded white space
   characters (such as SPACE and HTAB) are FORBIDDEN within an
   'encoded-word'.  For example, the character sequence

      =?iso-8859-1?q?this is some text?=

   would be parsed as four 'atom's, rather than as a single 'atom' (by
   an RFC 822 parser) or 'encoded-word' (by a parser which understands
   'encoded-words').  The correct way to encode the string "this is some
   text" is to encode the SPACE characters as well, e.g.

      =?iso-8859-1?q?this=20is=20some=20text?=

   The characters which may appear in 'encoded-text' are further
   restricted by the rules in section 5.

3. Character sets

   The 'charset' portion of an 'encoded-word' specifies the character
   set associated with the unencoded text.  A 'charset' can be any of
   the character set names allowed in an MIME "charset" parameter of a
   "text/plain" body part, or any character set name registered with
   IANA for use with the MIME text/plain content-type.

   Some character sets use code-switching techniques to switch between
   "ASCII mode" and other modes.  If unencoded text in an 'encoded-word'
   contains a sequence which causes the charset interpreter to switch
   out of ASCII mode, it MUST contain additional control codes such that
   ASCII mode is again selected at the end of the 'encoded-word'.  (This
   rule applies separately to each 'encoded-word', including adjacent
   'encoded-word's within a single header field.)

   When there is a possibility of using more than one character set to
   represent the text in an 'encoded-word', and in the absence of
   private agreements between sender and recipients of a message, it is
   recommended that members of the ISO-8859-* series be used in
   preference to other character sets.

4. Encodings

   Initially, the legal values for "encoding" are "Q" and "B".  These
   encodings are described below.  The "Q" encoding is recommended for
   use when most of the characters to be encoded are in the ASCII
   character set; otherwise, the "B" encoding should be used.
   Nevertheless, a mail reader which claims to recognize 'encoded-word's
   MUST be able to accept either encoding for any character set which it
   supports.



Moore                       Standards Track                     [Page 5]
rfc2047.txt - 源码说明

本页面展示了「< VB高级网络编程技术>>随书源代码第2章,里面有很多有用的例程,希望对大家的开发工作有帮助!」中的 rfc2047.txt 源码文件，采用文本编程语言编写，共 844 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫开发者社区收录了大量与VB相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?