📄 rfc822.txt
字号:
The following rules are used to define an underlying lexical analyzer, which feeds tokens to higher level parsers. See the ANSI references, in the Bibliography. ; ( Octal, Decimal.) CHAR = <any ASCII character> ; ( 0-177, 0.-127.) ALPHA = <any ASCII alphabetic character> ; (101-132, 65.- 90.) ; (141-172, 97.-122.) DIGIT = <any ASCII decimal digit> ; ( 60- 71, 48.- 57.) CTL = <any ASCII control ; ( 0- 37, 0.- 31.) character and DEL> ; ( 177, 127.) CR = <ASCII CR, carriage return> ; ( 15, 13.) LF = <ASCII LF, linefeed> ; ( 12, 10.) SPACE = <ASCII SP, space> ; ( 40, 32.) HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.) <"> = <ASCII quote mark> ; ( 42, 34.) CRLF = CR LF LWSP-char = SPACE / HTAB ; semantics = SPACE linear-white-space = 1*([CRLF] LWSP-char) ; semantics = SPACE ; CRLF => folding specials = "(" / ")" / "<" / ">" / "@" ; Must be in quoted- / "," / ";" / ":" / "\" / <"> ; string, to use / "." / "[" / "]" ; within a word. delimiters = specials / linear-white-space / comment text = <any CHAR, including bare ; => atoms, specials, CR & bare LF, but NOT ; comments and including CRLF> ; quoted-strings are ; NOT recognized. atom = 1*<any CHAR except specials, SPACE and CTLs> quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or ; quoted chars. qtext = <any CHAR excepting <">, ; => may be folded "\" & CR, and including linear-white-space> domain-literal = "[" *(dtext / quoted-pair) "]" August 13, 1982 - 10 - RFC #822 Standard for ARPA Internet Text Messages dtext = <any CHAR excluding "[", ; => may be folded "]", "\" & CR, & including linear-white-space> comment = "(" *(ctext / quoted-pair / comment) ")" ctext = <any CHAR excluding "(", ; => may be folded ")", "\" & CR, & including linear-white-space> quoted-pair = "\" CHAR ; may quote any char phrase = 1*word ; Sequence of words word = atom / quoted-string 3.4. CLARIFICATIONS 3.4.1. QUOTING Some characters are reserved for special interpretation, such as delimiting lexical tokens. To permit use of these charac- ters as uninterpreted data, a quoting mechanism is provided. To quote a character, precede it with a backslash ("\"). This mechanism is not fully general. Characters may be quoted only within a subset of the lexical constructs. In particu- lar, quoting is limited to use within: - quoted-string - domain-literal - comment Within these constructs, quoting is REQUIRED for CR and "\" and for the character(s) that delimit the token (e.g., "(" and ")" for a comment). However, quoting is PERMITTED for any character. Note: In particular, quoting is NOT permitted within atoms. For example when the local-part of an addr-spec must contain a special character, a quoted string must be used. Therefore, a specification such as: Full\ Name@Domain is not legal and must be specified as: "Full Name"@Domain August 13, 1982 - 11 - RFC #822 Standard for ARPA Internet Text Messages 3.4.2. WHITE SPACE Note: In structured field bodies, multiple linear space ASCII characters (namely HTABs and SPACEs) are treated as single spaces and may freely surround any symbol. In all header fields, the only place in which at least one LWSP-char is REQUIRED is at the beginning of continua- tion lines in a folded field. When passing text to processes that do not interpret text according to this standard (e.g., mail protocol servers), then NO linear-white-space characters should occur between a period (".") or at-sign ("@") and a <word>. Exactly ONE SPACE should be used in place of arbitrary linear-white-space and comment sequences. Note: Within systems conforming to this standard, wherever a member of the list of delimiters is allowed, LWSP-chars may also occur before and/or after it. Writers of mail-sending (i.e., header-generating) programs should realize that there is no network-wide definition of the effect of ASCII HT (horizontal-tab) characters on the appear- ance of text at another network host; therefore, the use of tabs in message headers, though permitted, is discouraged. 3.4.3. COMMENTS A comment is a set of ASCII characters, which is enclosed in matching parentheses and which is not within a quoted-string The comment construct permits message originators to add text which will be useful for human readers, but which will be ignored by the formal semantics. Comments should be retained while the message is subject to interpretation according to this standard. However, comments must NOT be included in other cases, such as during protocol exchanges with mail servers. Comments nest, so that if an unquoted left parenthesis occurs in a comment string, there must also be a matching right parenthesis. When a comment acts as the delimiter between a sequence of two lexical symbols, such as two atoms, it is lex- ically equivalent with a single SPACE, for the purposes of regenerating the sequence, such as when passing the sequence onto a mail protocol server. Comments are detected as such only within field-bodies of structured fields. If a comment is to be "folded" onto multiple lines, then the syntax for folding must be adhered to. (See the "Lexical August 13, 1982 - 12 - RFC #822 Standard for ARPA Internet Text Messages Analysis of Messages" section on "Folding Long Header Fields" above, and the section on "Case Independence" below.) Note that the official semantics therefore do not "see" any unquoted CRLFs that are in comments, although particular pars- ing programs may wish to note their presence. For these pro- grams, it would be reasonable to interpret a "CRLF LWSP-char" as being a CRLF that is part of the comment; i.e., the CRLF is kept and the LWSP-char is discarded. Quoted CRLFs (i.e., a backslash followed by a CR followed by a LF) still must be followed by at least one LWSP-char. 3.4.4. DELIMITING AND QUOTING CHARACTERS The quote character (backslash) and characters that delimit syntactic units are not, generally, to be taken as data that are part of the delimited or quoted unit(s). In particular, the quotation-marks that define a quoted-string, the parentheses that define a comment and the backslash that quotes a following character are NOT part of the quoted- string, comment or quoted character. A quotation-mark that is to be part of a quoted-string, a parenthesis that is to be part of a comment and a backslash that is to be part of either must each be preceded by the quote-character backslash ("\"). Note that the syntax allows any character to be quoted within a quoted-string or comment; however only certain characters MUST be quoted to be included as data. These characters are the ones that are not part of the alternate text group (i.e., ctext or qtext). The one exception to this rule is that a single SPACE is assumed to exist between contiguous words in a phrase, and this interpretation is independent of the actual number of LWSP-chars that the creator places between the words. To include more than one SPACE, the creator must make the LWSP- chars be part of a quoted-string. Quotation marks that delimit a quoted string and backslashes that quote the following character should NOT accompany the quoted-string when the string is passed to processes that do not interpret data according to this specification (e.g., mail protocol servers). 3.4.5. QUOTED-STRINGS Where permitted (i.e., in words in structured fields) quoted- strings are treated as a single symbol. That is, a quoted- string is equivalent to an atom, syntactically. If a quoted- string is to be "folded" onto multiple lines, then the syntax for folding must be adhered to. (See the "Lexical Analysis of August 13, 1982 - 13 - RFC #822 Standard for ARPA Internet Text Messages Messages" section on "Folding Long Header Fields" above, and the section on "Case Independence" below.) Therefore, the official semantics do not "see" any bare CRLFs that are in quoted-strings; however particular parsing programs may wish to note their presence. For such programs, it would be rea- sonable to interpret a "CRLF LWSP-char" as being a CRLF which is part of the quoted-string; i.e., the CRLF is kept and the LWSP-char is discarded. Quoted CRLFs (i.e., a backslash fol- lowed by a CR followed by a LF) are also subject to rules of folding, but the presence of the quoting character (backslash) explicitly indicates that the CRLF is data to the quoted string. Stripping off the first following LWSP-char is also appropriate when parsing quoted CRLFs. 3.4.6. BRACKETING CHARACTERS There is one type of bracket which must occur in matched pairs and may have pairs nested within each other: o Parentheses ("(" and ")") are used to indicate com- ments. There are three types of brackets which must occur in matched pairs, and which may NOT be nested: o Colon/semi-colon (":" and ";") are used in address specifications to indicate that the included list of addresses are to be treated as a group. o Angle brackets ("<" and ">") are generally used to indicate the presence of a one machine-usable refer- ence (e.g., delimiting mailboxes), possibly including source-routing to the machine. o Square brackets ("[" and "]") are used to indicate the presence of a domain-literal, which the appropriate name-domain is to use directly, bypassing normal name-resolution mechanisms. 3.4.7. CASE INDEPENDENCE Except as noted, alphabetic strings may be represented in any combination of upper and lower case. The only syntactic units August 13, 1982 - 14 - RFC #822 Standard for ARPA Internet Text Messages which requires preservation of case information are: - text - qtext - dtext - ctext - quoted-pair - local-part, except "Postmaster" When matching any other syntactic unit, case is to be ignored. For example, the field-names "From", "FROM", "from", and even "FroM" are semantically equal and should all be treated ident- ically. When generating these units, any mix of upper and lower case alphabetic characters may be used. The case shown in this specification is suggested for message-creating processes. Note: The reserved local-part address unit, "Postmaster", is an exception. When the value "Postmaster" is being interpreted, it must be accepted in any mixture of case, including "POSTMASTER", and "postmaster". 3.4.8. FOLDING LONG HEADER FIELDS Each header field may be represented on exactly one line con- sisting of the name of the field and its body, and terminated by a CRLF; this is what the parser sees. For readability, the field-body portion of long header fields may be "folded" onto multiple lines of the actual field. "Long" is commonly inter- preted to mean greater than 65 or 72 characters. The former length serves as a limit, when the message is to be viewed on most simple terminals which use simple display software; how- ever, the limit is not imposed by this standard. Note: Some display software often can selectively fold lines, to suit the display terminal. In such cases, sender- provided folding can interfere with the display software. 3.4.9. BACKSPACE CHARACTERS ASCII BS characters (Backspace, decimal 8) may be included in texts and quoted-strings to effect overstriking. However, any use of backspaces which effects an overstrike to the left of the beginning of the text or quoted-string is prohibited. August 13, 1982 - 15 - RFC #822 Standard for ARPA Internet Text Messages 3.4.10. NETWORK-SPECIFIC TRANSFORMATIONS During transmission through heterogeneous networks, it may be necessary to force data to conform to a network's local con- ventions. For example, it may be required that a CR be fol- lowed either by LF, making a CRLF, or by <null>, if the CR is
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -