📄 rfc2822.txt
字号:
sequence of lines of characters with special syntax as defined in this standard. The body is simply a sequence of characters that follows the header and is separated from the header by an empty line (i.e., a line with nothing preceding the CRLF).2.1.1. Line Length Limits There are two limits that this standard places on the number of characters in a line. Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF. The 998 character limit is due to limitations in many implementations which send, receive, or store Internet Message Format messages that simply cannot handle more than 998 characters on a line. Receiving implementations would do well to handle an arbitrarily large number of characters in a line for robustness sake. However, there are so many implementations which (in compliance with the transport requirements of [RFC2821]) do not accept messages containing more than 1000 character including the CR and LF per line, it is important for implementations not to create such messages. The more conservative 78 character recommendation is to accommodate the many implementations of user interfaces that display these messages which may truncate, or disastrously wrap, the display of more than 78 characters per line, in spite of the fact that such implementations are non-conformant to the intent of this specification (and that of [RFC2821] if they actually cause information to be lost). Again, even though this limitation is put on messages, it is encumbant upon implementations which display messagesResnick Standards Track [Page 6]RFC 2822 Internet Message Format April 2001 to handle an arbitrarily large number of characters in a line (certainly at least up to the 998 character limit) for the sake of robustness.2.2. Header Fields Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding" as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this standard.2.2.1. Unstructured Header Field Bodies Some field bodies in this standard are defined simply as "unstructured" (which is specified below as any US-ASCII characters, except for CR and LF) with no further restrictions. These are referred to as unstructured field bodies. Semantically, unstructured field bodies are simply to be treated as a single line of characters with no further processing (except for header "folding" and "unfolding" as described in section 2.2.3).2.2.2. Structured Header Field Bodies Some field bodies in this standard have specific syntactical structure more restrictive than the unstructured field bodies described above. These are referred to as "structured" field bodies. Structured field bodies are sequences of specific lexical tokens as described in sections 3 and 4 of this standard. Many of these tokens are allowed (according to their syntax) to be introduced or end with comments (as described in section 3.2.3) as well as the space (SP, ASCII value 32) and horizontal tab (HTAB, ASCII value 9) characters (together known as the white space characters, WSP), and those WSP characters are subject to header "folding" and "unfolding" as described in section 2.2.3. Semantic analysis of structured field bodies is given along with their syntax.2.2.3. Long Header Fields Each header field is logically a single line of characters comprising the field name, the colon, and the field body. For convenience however, and to deal with the 998/78 character limitations per line, the field body portion of a header field can be split into a multiple line representation; this is called "folding". The general rule isResnick Standards Track [Page 7]RFC 2822 Internet Message Format April 2001 that wherever this standard allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP. For example, the header field: Subject: This is a test can be represented as: Subject: This is a test Note: Though structured field bodies are defined in such a way that folding can take place between many of the lexical tokens (and even within some of the lexical tokens), folding SHOULD be limited to placing the CRLF at higher-level syntactic breaks. For instance, if a field body is defined as comma-separated values, it is recommended that folding occur after the comma separating the structured items in preference to other places where the field could be folded, even if it is allowed elsewhere. The process of moving from this folded multiple-line representation of a header field to its single line representation is called "unfolding". Unfolding is accomplished by simply removing any CRLF that is immediately followed by WSP. Each header field should be treated in its unfolded form for further syntactic and semantic evaluation.2.3. Body The body of a message is simply lines of US-ASCII characters. The only two limitations on the body are as follows: - CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body. - Lines of characters in the body MUST be limited to 998 characters, and SHOULD be limited to 78 characters, excluding the CRLF. Note: As was stated earlier, there are other standards documents, specifically the MIME documents [RFC2045, RFC2046, RFC2048, RFC2049] that extend this standard to allow for different sorts of message bodies. Again, these mechanisms are beyond the scope of this document.Resnick Standards Track [Page 8]RFC 2822 Internet Message Format April 20013. Syntax3.1. Introduction The syntax as given in this section defines the legal syntax of Internet messages. Messages that are conformant to this standard MUST conform to the syntax in this section. If there are options in this section where one option SHOULD be generated, that is indicated either in the prose or in a comment next to the syntax. For the defined expressions, a short description of the syntax and use is given, followed by the syntax in ABNF, followed by a semantic analysis. Primitive tokens that are used but otherwise unspecified come from [RFC2234]. In some of the definitions, there will be nonterminals whose names start with "obs-". These "obs-" elements refer to tokens defined in the obsolete syntax in section 4. In all cases, these productions are to be ignored for the purposes of generating legal Internet messages and MUST NOT be used as part of such a message. However, when interpreting messages, these tokens MUST be honored as part of the legal syntax. In this sense, section 3 defines a grammar for generation of messages, with "obs-" elements that are to be ignored, while section 4 adds grammar for interpretation of messages.3.2. Lexical Tokens The following rules are used to define an underlying lexical analyzer, which feeds tokens to the higher-level parsers. This section defines the tokens used in structured header field bodies. Note: Readers of this standard need to pay special attention to how these lexical tokens are used in both the lower-level and higher-level syntax later in the document. Particularly, the white space tokens and the comment tokens defined in section 3.2.3 get used in the lower-level tokens defined here, and those lower-level tokens are in turn used as parts of the higher-level tokens defined later. Therefore, the white space and comments may be allowed in the higher-level tokens even though they may not explicitly appear in a particular definition.3.2.1. Primitive Tokens The following are primitive tokens referred to elsewhere in this standard, but not otherwise defined in [RFC2234]. Some of them will not appear anywhere else in the syntax, but they are convenient to refer to in other parts of this document.Resnick Standards Track [Page 9]RFC 2822 Internet Message Format April 2001 Note: The "specials" below are just such an example. Though the specials token does not appear anywhere else in this standard, it is useful for implementers who use tools that lexically analyze messages. Each of the characters in specials can be used to indicate a tokenization point in lexical analysis.NO-WS-CTL = %d1-8 / ; US-ASCII control characters %d11 / ; that do not include the %d12 / ; carriage return, line feed, %d14-31 / ; and white space characters %d127text = %d1-9 / ; Characters excluding CR and LF %d11 / %d12 / %d14-127 / obs-textspecials = "(" / ")" / ; Special characters used in "<" / ">" / ; other parts of the syntax "[" / "]" / ":" / ";" / "@" / "\" / "," / "." / DQUOTE No special semantics are attached to these tokens. They are simply single characters.3.2.2. Quoted characters Some characters are reserved for special interpretation, such as delimiting lexical tokens. To permit use of these characters as uninterpreted data, a quoting mechanism is provided.quoted-pair = ("\" text) / obs-qp Where any quoted-pair appears, it is to be interpreted as the text character alone. That is to say, the "\" character that appears as part of a quoted-pair is semantically "invisible". Note: The "\" character may appear in a message where it is not part of a quoted-pair. A "\" character that does not appear in a quoted-pair is not semantically invisible. The only places in this standard where quoted-pair currently appears are ccontent, qcontent, dcontent, no-fold-quote, and no-fold-literal.Resnick Standards Track [Page 10]RFC 2822 Internet Message Format April 20013.2.3. Folding white space and comments White space characters, including white space used in folding (described in section 2.2.3), may appear between many elements in header field bodies. Also, strings of characters that are treated as comments may be included in structured field bodies as characters enclosed in parentheses. The following defines the folding white space (FWS) and comment constructs. Strings of characters enclosed in parentheses are considered comments so long as they do not appear within a "quoted-string", as defined in section 3.2.5. Comments may nest. There are several places in this standard where comments and FWS may be freely inserted. To accommodate that syntax, an additional token for "CFWS" is defined for places where comments and/or FWS can occur. However, where CFWS occurs in this standard, it MUST NOT be inserted in such a way that any line of a folded header field is made up entirely of WSP characters and nothing else.FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space obs-FWSctext = NO-WS-CTL / ; Non white space controls %d33-39 / ; The rest of the US-ASCII %d42-91 / ; characters not including "(", %d93-126 ; ")", or "\"ccontent = ctext / quoted-pair / commentcomment = "(" *([FWS] ccontent) [FWS] ")"CFWS = *([FWS] comment) (([FWS] comment) / FWS) Throughout this standard, where FWS (the folding white space token) appears, it indicates a place where header folding, as discussed in section 2.2.3, may take place. Wherever header folding appears in a
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -