📄 rfc2822.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   value 10).  (The carriage-return/line-feed pair is usually written in
   this document as "CRLF".)

   A message consists of header fields (collectively called "the header
   of the message") followed, optionally, by a body.  The header is a
   sequence of lines of characters with special syntax as defined in
   this standard. The body is simply a sequence of characters that
   follows the header and is separated from the header by an empty line
   (i.e., a line with nothing preceding the CRLF).

2.1.1. Line Length Limits

   There are two limits that this standard places on the number of
   characters in a line. Each line of characters MUST be no more than
   998 characters, and SHOULD be no more than 78 characters, excluding
   the CRLF.

   The 998 character limit is due to limitations in many implementations
   which send, receive, or store Internet Message Format messages that
   simply cannot handle more than 998 characters on a line. Receiving
   implementations would do well to handle an arbitrarily large number
   of characters in a line for robustness sake. However, there are so
   many implementations which (in compliance with the transport
   requirements of [RFC2821]) do not accept messages containing more
   than 1000 character including the CR and LF per line, it is important
   for implementations not to create such messages.

   The more conservative 78 character recommendation is to accommodate
   the many implementations of user interfaces that display these
   messages which may truncate, or disastrously wrap, the display of
   more than 78 characters per line, in spite of the fact that such
   implementations are non-conformant to the intent of this
   specification (and that of [RFC2821] if they actually cause
   information to be lost). Again, even though this limitation is put on
   messages, it is encumbant upon implementations which display messages





Resnick                     Standards Track                     [Page 6]

RFC 2822                Internet Message Format               April 2001


   to handle an arbitrarily large number of characters in a line
   (certainly at least up to the 998 character limit) for the sake of
   robustness.

2.2. Header Fields

   Header fields are lines composed of a field name, followed by a colon
   (":"), followed by a field body, and terminated by CRLF.  A field
   name MUST be composed of printable US-ASCII characters (i.e.,
   characters that have values between 33 and 126, inclusive), except
   colon.  A field body may be composed of any US-ASCII characters,
   except for CR and LF.  However, a field body may contain CRLF when
   used in header "folding" and  "unfolding" as described in section
   2.2.3.  All field bodies MUST conform to the syntax described in
   sections 3 and 4 of this standard.

2.2.1. Unstructured Header Field Bodies

   Some field bodies in this standard are defined simply as
   "unstructured" (which is specified below as any US-ASCII characters,
   except for CR and LF) with no further restrictions.  These are
   referred to as unstructured field bodies.  Semantically, unstructured
   field bodies are simply to be treated as a single line of characters
   with no further processing (except for header "folding" and
   "unfolding" as described in section 2.2.3).

2.2.2. Structured Header Field Bodies

   Some field bodies in this standard have specific syntactical
   structure more restrictive than the unstructured field bodies
   described above. These are referred to as "structured" field bodies.
   Structured field bodies are sequences of specific lexical tokens as
   described in sections 3 and 4 of this standard.  Many of these tokens
   are allowed (according to their syntax) to be introduced or end with
   comments (as described in section 3.2.3) as well as the space (SP,
   ASCII value 32) and horizontal tab (HTAB, ASCII value 9) characters
   (together known as the white space characters, WSP), and those WSP
   characters are subject to header "folding" and "unfolding" as
   described in section 2.2.3.  Semantic analysis of structured field
   bodies is given along with their syntax.

2.2.3. Long Header Fields

   Each header field is logically a single line of characters comprising
   the field name, the colon, and the field body.  For convenience
   however, and to deal with the 998/78 character limitations per line,
   the field body portion of a header field can be split into a multiple
   line representation; this is called "folding".  The general rule is



Resnick                     Standards Track                     [Page 7]

RFC 2822                Internet Message Format               April 2001


   that wherever this standard allows for folding white space (not
   simply WSP characters), a CRLF may be inserted before any WSP.  For
   example, the header field:

           Subject: This is a test

   can be represented as:

           Subject: This
            is a test

   Note: Though structured field bodies are defined in such a way that
   folding can take place between many of the lexical tokens (and even
   within some of the lexical tokens), folding SHOULD be limited to
   placing the CRLF at higher-level syntactic breaks.  For instance, if
   a field body is defined as comma-separated values, it is recommended
   that folding occur after the comma separating the structured items in
   preference to other places where the field could be folded, even if
   it is allowed elsewhere.

   The process of moving from this folded multiple-line representation
   of a header field to its single line representation is called
   "unfolding". Unfolding is accomplished by simply removing any CRLF
   that is immediately followed by WSP.  Each header field should be
   treated in its unfolded form for further syntactic and semantic
   evaluation.

2.3. Body

   The body of a message is simply lines of US-ASCII characters.  The
   only two limitations on the body are as follows:

   - CR and LF MUST only occur together as CRLF; they MUST NOT appear
     independently in the body.

   - Lines of characters in the body MUST be limited to 998 characters,
     and SHOULD be limited to 78 characters, excluding the CRLF.

   Note: As was stated earlier, there are other standards documents,
   specifically the MIME documents [RFC2045, RFC2046, RFC2048, RFC2049]
   that extend this standard to allow for different sorts of message
   bodies.  Again, these mechanisms are beyond the scope of this
   document.








Resnick                     Standards Track                     [Page 8]

RFC 2822                Internet Message Format               April 2001


3. Syntax

3.1. Introduction

   The syntax as given in this section defines the legal syntax of
   Internet messages.  Messages that are conformant to this standard
   MUST conform to the syntax in this section.  If there are options in
   this section where one option SHOULD be generated, that is indicated
   either in the prose or in a comment next to the syntax.

   For the defined expressions, a short description of the syntax and
   use is given, followed by the syntax in ABNF, followed by a semantic
   analysis.  Primitive tokens that are used but otherwise unspecified
   come from [RFC2234].

   In some of the definitions, there will be nonterminals whose names
   start with "obs-".  These "obs-" elements refer to tokens defined in
   the obsolete syntax in section 4.  In all cases, these productions
   are to be ignored for the purposes of generating legal Internet
   messages and MUST NOT be used as part of such a message.  However,
   when interpreting messages, these tokens MUST be honored as part of
   the legal syntax.  In this sense, section 3 defines a grammar for
   generation of messages, with "obs-" elements that are to be ignored,
   while section 4 adds grammar for interpretation of messages.

3.2. Lexical Tokens

   The following rules are used to define an underlying lexical
   analyzer, which feeds tokens to the higher-level parsers.  This
   section defines the tokens used in structured header field bodies.

   Note: Readers of this standard need to pay special attention to how
   these lexical tokens are used in both the lower-level and
   higher-level syntax later in the document.  Particularly, the white
   space tokens and the comment tokens defined in section 3.2.3 get used
   in the lower-level tokens defined here, and those lower-level tokens
   are in turn used as parts of the higher-level tokens defined later.
   Therefore, the white space and comments may be allowed in the
   higher-level tokens even though they may not explicitly appear in a
   particular definition.

3.2.1. Primitive Tokens

   The following are primitive tokens referred to elsewhere in this
   standard, but not otherwise defined in [RFC2234].  Some of them will
   not appear anywhere else in the syntax, but they are convenient to
   refer to in other parts of this document.




Resnick                     Standards Track                     [Page 9]

RFC 2822                Internet Message Format               April 2001


   Note: The "specials" below are just such an example.  Though the
   specials token does not appear anywhere else in this standard, it is
   useful for implementers who use tools that lexically analyze
   messages.  Each of the characters in specials can be used to indicate
   a tokenization point in lexical analysis.

NO-WS-CTL       =       %d1-8 /         ; US-ASCII control characters
                        %d11 /          ;  that do not include the
                        %d12 /          ;  carriage return, line feed,
                        %d14-31 /       ;  and white space characters
                        %d127

text            =       %d1-9 /         ; Characters excluding CR and LF
                        %d11 /
                        %d12 /
                        %d14-127 /
                        obs-text

specials        =       "(" / ")" /     ; Special characters used in
                        "<" / ">" /     ;  other parts of the syntax
                        "[" / "]" /
                        ":" / ";" /
                        "@" / "\" /
                        "," / "." /
                        DQUOTE

   No special semantics are attached to these tokens.  They are simply
   single characters.

3.2.2. Quoted characters

   Some characters are reserved for special interpretation, such as
   delimiting lexical tokens.  To permit use of these characters as
   uninterpreted data, a quoting mechanism is provided.

quoted-pair     =       ("\" text) / obs-qp

   Where any quoted-pair appears, it is to be interpreted as the text
   character alone.  That is to say, the "\" character that appears as
   part of a quoted-pair is semantically "invisible".

   Note: The "\" character may appear in a message where it is not part
   of a quoted-pair.  A "\" character that does not appear in a
   quoted-pair is not semantically invisible.  The only places in this
   standard where quoted-pair currently appears are ccontent, qcontent,
   dcontent, no-fold-quote, and no-fold-literal.





Resnick                     Standards Track                    [Page 10]

RFC 2822                Internet Message Format               April 2001


3.2.3. Folding white space and comments

   White space characters, including white space used in folding
   (described in section 2.2.3), may appear between many elements in
   header field bodies.  Also, strings of characters that are treated as
   comments may be included in structured field bodies as characters
   enclosed in parentheses.  The following defines the folding white
   space (FWS) and comment constructs.

   Strings of characters enclosed in parentheses are considered comments
   so long as they do not appear within a "quoted-string", as defined in
   section 3.2.5.  Comments may nest.

   There are several places in this standard where comments and FWS may
   be freely inserted.  To accommodate that syntax, an additional token
   for "CFWS" is defined for places where comments and/or FWS can occur.
   However, where CFWS occurs in this standard, it MUST NOT be inserted
   in such a way that any line of a folded header field is made up
   entirely of WSP characters and nothing else.

FWS             =       ([*WSP CRLF] 1*WSP) /   ; Folding white space
                        obs-FWS

ctext           =       NO-WS-CTL /     ; Non white space controls

                        %d33-39 /       ; The rest of the US-ASCII
                        %d42-91 /       ;  characters not including "(",
                        %d93-126        ;  ")", or "\"
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -