📄 rfc2822.txt
字号:
value 10). (The carriage-return/line-feed pair is usually written in
this document as "CRLF".)
A message consists of header fields (collectively called "the header
of the message") followed, optionally, by a body. The header is a
sequence of lines of characters with special syntax as defined in
this standard. The body is simply a sequence of characters that
follows the header and is separated from the header by an empty line
(i.e., a line with nothing preceding the CRLF).
2.1.1. Line Length Limits
There are two limits that this standard places on the number of
characters in a line. Each line of characters MUST be no more than
998 characters, and SHOULD be no more than 78 characters, excluding
the CRLF.
The 998 character limit is due to limitations in many implementations
which send, receive, or store Internet Message Format messages that
simply cannot handle more than 998 characters on a line. Receiving
implementations would do well to handle an arbitrarily large number
of characters in a line for robustness sake. However, there are so
many implementations which (in compliance with the transport
requirements of [RFC2821]) do not accept messages containing more
than 1000 character including the CR and LF per line, it is important
for implementations not to create such messages.
The more conservative 78 character recommendation is to accommodate
the many implementations of user interfaces that display these
messages which may truncate, or disastrously wrap, the display of
more than 78 characters per line, in spite of the fact that such
implementations are non-conformant to the intent of this
specification (and that of [RFC2821] if they actually cause
information to be lost). Again, even though this limitation is put on
messages, it is encumbant upon implementations which display messages
Resnick Standards Track [Page 6]
RFC 2822 Internet Message Format April 2001
to handle an arbitrarily large number of characters in a line
(certainly at least up to the 998 character limit) for the sake of
robustness.
2.2. Header Fields
Header fields are lines composed of a field name, followed by a colon
(":"), followed by a field body, and terminated by CRLF. A field
name MUST be composed of printable US-ASCII characters (i.e.,
characters that have values between 33 and 126, inclusive), except
colon. A field body may be composed of any US-ASCII characters,
except for CR and LF. However, a field body may contain CRLF when
used in header "folding" and "unfolding" as described in section
2.2.3. All field bodies MUST conform to the syntax described in
sections 3 and 4 of this standard.
2.2.1. Unstructured Header Field Bodies
Some field bodies in this standard are defined simply as
"unstructured" (which is specified below as any US-ASCII characters,
except for CR and LF) with no further restrictions. These are
referred to as unstructured field bodies. Semantically, unstructured
field bodies are simply to be treated as a single line of characters
with no further processing (except for header "folding" and
"unfolding" as described in section 2.2.3).
2.2.2. Structured Header Field Bodies
Some field bodies in this standard have specific syntactical
structure more restrictive than the unstructured field bodies
described above. These are referred to as "structured" field bodies.
Structured field bodies are sequences of specific lexical tokens as
described in sections 3 and 4 of this standard. Many of these tokens
are allowed (according to their syntax) to be introduced or end with
comments (as described in section 3.2.3) as well as the space (SP,
ASCII value 32) and horizontal tab (HTAB, ASCII value 9) characters
(together known as the white space characters, WSP), and those WSP
characters are subject to header "folding" and "unfolding" as
described in section 2.2.3. Semantic analysis of structured field
bodies is given along with their syntax.
2.2.3. Long Header Fields
Each header field is logically a single line of characters comprising
the field name, the colon, and the field body. For convenience
however, and to deal with the 998/78 character limitations per line,
the field body portion of a header field can be split into a multiple
line representation; this is called "folding". The general rule is
Resnick Standards Track [Page 7]
RFC 2822 Internet Message Format April 2001
that wherever this standard allows for folding white space (not
simply WSP characters), a CRLF may be inserted before any WSP. For
example, the header field:
Subject: This is a test
can be represented as:
Subject: This
is a test
Note: Though structured field bodies are defined in such a way that
folding can take place between many of the lexical tokens (and even
within some of the lexical tokens), folding SHOULD be limited to
placing the CRLF at higher-level syntactic breaks. For instance, if
a field body is defined as comma-separated values, it is recommended
that folding occur after the comma separating the structured items in
preference to other places where the field could be folded, even if
it is allowed elsewhere.
The process of moving from this folded multiple-line representation
of a header field to its single line representation is called
"unfolding". Unfolding is accomplished by simply removing any CRLF
that is immediately followed by WSP. Each header field should be
treated in its unfolded form for further syntactic and semantic
evaluation.
2.3. Body
The body of a message is simply lines of US-ASCII characters. The
only two limitations on the body are as follows:
- CR and LF MUST only occur together as CRLF; they MUST NOT appear
independently in the body.
- Lines of characters in the body MUST be limited to 998 characters,
and SHOULD be limited to 78 characters, excluding the CRLF.
Note: As was stated earlier, there are other standards documents,
specifically the MIME documents [RFC2045, RFC2046, RFC2048, RFC2049]
that extend this standard to allow for different sorts of message
bodies. Again, these mechanisms are beyond the scope of this
document.
Resnick Standards Track [Page 8]
RFC 2822 Internet Message Format April 2001
3. Syntax
3.1. Introduction
The syntax as given in this section defines the legal syntax of
Internet messages. Messages that are conformant to this standard
MUST conform to the syntax in this section. If there are options in
this section where one option SHOULD be generated, that is indicated
either in the prose or in a comment next to the syntax.
For the defined expressions, a short description of the syntax and
use is given, followed by the syntax in ABNF, followed by a semantic
analysis. Primitive tokens that are used but otherwise unspecified
come from [RFC2234].
In some of the definitions, there will be nonterminals whose names
start with "obs-". These "obs-" elements refer to tokens defined in
the obsolete syntax in section 4. In all cases, these productions
are to be ignored for the purposes of generating legal Internet
messages and MUST NOT be used as part of such a message. However,
when interpreting messages, these tokens MUST be honored as part of
the legal syntax. In this sense, section 3 defines a grammar for
generation of messages, with "obs-" elements that are to be ignored,
while section 4 adds grammar for interpretation of messages.
3.2. Lexical Tokens
The following rules are used to define an underlying lexical
analyzer, which feeds tokens to the higher-level parsers. This
section defines the tokens used in structured header field bodies.
Note: Readers of this standard need to pay special attention to how
these lexical tokens are used in both the lower-level and
higher-level syntax later in the document. Particularly, the white
space tokens and the comment tokens defined in section 3.2.3 get used
in the lower-level tokens defined here, and those lower-level tokens
are in turn used as parts of the higher-level tokens defined later.
Therefore, the white space and comments may be allowed in the
higher-level tokens even though they may not explicitly appear in a
particular definition.
3.2.1. Primitive Tokens
The following are primitive tokens referred to elsewhere in this
standard, but not otherwise defined in [RFC2234]. Some of them will
not appear anywhere else in the syntax, but they are convenient to
refer to in other parts of this document.
Resnick Standards Track [Page 9]
RFC 2822 Internet Message Format April 2001
Note: The "specials" below are just such an example. Though the
specials token does not appear anywhere else in this standard, it is
useful for implementers who use tools that lexically analyze
messages. Each of the characters in specials can be used to indicate
a tokenization point in lexical analysis.
NO-WS-CTL = %d1-8 / ; US-ASCII control characters
%d11 / ; that do not include the
%d12 / ; carriage return, line feed,
%d14-31 / ; and white space characters
%d127
text = %d1-9 / ; Characters excluding CR and LF
%d11 /
%d12 /
%d14-127 /
obs-text
specials = "(" / ")" / ; Special characters used in
"<" / ">" / ; other parts of the syntax
"[" / "]" /
":" / ";" /
"@" / "\" /
"," / "." /
DQUOTE
No special semantics are attached to these tokens. They are simply
single characters.
3.2.2. Quoted characters
Some characters are reserved for special interpretation, such as
delimiting lexical tokens. To permit use of these characters as
uninterpreted data, a quoting mechanism is provided.
quoted-pair = ("\" text) / obs-qp
Where any quoted-pair appears, it is to be interpreted as the text
character alone. That is to say, the "\" character that appears as
part of a quoted-pair is semantically "invisible".
Note: The "\" character may appear in a message where it is not part
of a quoted-pair. A "\" character that does not appear in a
quoted-pair is not semantically invisible. The only places in this
standard where quoted-pair currently appears are ccontent, qcontent,
dcontent, no-fold-quote, and no-fold-literal.
Resnick Standards Track [Page 10]
RFC 2822 Internet Message Format April 2001
3.2.3. Folding white space and comments
White space characters, including white space used in folding
(described in section 2.2.3), may appear between many elements in
header field bodies. Also, strings of characters that are treated as
comments may be included in structured field bodies as characters
enclosed in parentheses. The following defines the folding white
space (FWS) and comment constructs.
Strings of characters enclosed in parentheses are considered comments
so long as they do not appear within a "quoted-string", as defined in
section 3.2.5. Comments may nest.
There are several places in this standard where comments and FWS may
be freely inserted. To accommodate that syntax, an additional token
for "CFWS" is defined for places where comments and/or FWS can occur.
However, where CFWS occurs in this standard, it MUST NOT be inserted
in such a way that any line of a folded header field is made up
entirely of WSP characters and nothing else.
FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space
obs-FWS
ctext = NO-WS-CTL / ; Non white space controls
%d33-39 / ; The rest of the US-ASCII
%d42-91 / ; characters not including "(",
%d93-126 ; ")", or "\"
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -