📄 rfc1341.txt
字号:
cooperating user agents.
If a Content-Transfer-Encoding header field appears as part
of a message header, it applies to the entire body of that
message. If a Content-Transfer-Encoding header field
appears as part of a body part's headers, it applies only to
the body of that body part. If an entity is of type
"multipart" or "message", the Content-Transfer-Encoding is
not permitted to have any value other than a bit width
(e.g., "7bit", "8bit", etc.) or "binary".
It should be noted that email is character-oriented, so that
the mechanisms described here are mechanisms for encoding
arbitrary byte streams, not bit streams. If a bit stream is
to be encoded via one of these mechanisms, it must first be
converted to an 8-bit byte stream using the network standard
bit order ("big-endian"), in which the earlier bits in a
stream become the higher-order bits in a byte. A bit stream
not ending at an 8-bit boundary must be padded with zeroes.
This document provides a mechanism for noting the addition
of such padding in the case of the application Content-Type,
which has a "padding" parameter.
The encoding mechanisms defined here explicitly encode all
data in ASCII. Thus, for example, suppose an entity has
header fields such as:
Content-Type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: base64
This should be interpreted to mean that the body is a base64
ASCII encoding of data that was originally in ISO-8859-1,
and will be in that character set again after decoding.
The following sections will define the two standard encoding
mechanisms. The definition of new content-transfer-
encodings is explicitly discouraged and should only occur
when absolutely necessary. All content-transfer-encoding
namespace except that beginning with "X-" is explicitly
reserved to the IANA for future use. Private agreements
about content-transfer-encodings are also explicitly
discouraged.
Certain Content-Transfer-Encoding values may only be used on
certain Content-Types. In particular, it is expressly
forbidden to use any encodings other than "7bit", "8bit", or
"binary" with any Content-Type that recursively includes
other Content-Type fields, notably the "multipart" and
Borenstein & Freed [Page 12]
RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
"message" Content-Types. All encodings that are desired for
bodies of type multipart or message must be done at the
innermost level, by encoding the actual body that needs to
be encoded.
NOTE ON ENCODING RESTRICTIONS: Though the prohibition
against using content-transfer-encodings on data of type
multipart or message may seem overly restrictive, it is
necessary to prevent nested encodings, in which data are
passed through an encoding algorithm multiple times, and
must be decoded multiple times in order to be properly
viewed. Nested encodings add considerable complexity to
user agents: aside from the obvious efficiency problems
with such multiple encodings, they can obscure the basic
structure of a message. In particular, they can imply that
several decoding operations are necessary simply to find out
what types of objects a message contains. Banning nested
encodings may complicate the job of certain mail gateways,
but this seems less of a problem than the effect of nested
encodings on user agents.
NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-
TRANSFER-ENCODING: It may seem that the Content-Transfer-
Encoding could be inferred from the characteristics of the
Content-Type that is to be encoded, or, at the very least,
that certain Content-Transfer-Encodings could be mandated
for use with specific Content-Types. There are several
reasons why this is not the case. First, given the varying
types of transports used for mail, some encodings may be
appropriate for some Content-Type/transport combinations and
not for others. (For example, in an 8-bit transport, no
encoding would be required for text in certain character
sets, while such encodings are clearly required for 7-bit
SMTP.) Second, certain Content-Types may require different
types of transfer encoding under different circumstances.
For example, many PostScript bodies might consist entirely
of short lines of 7-bit data and hence require little or no
encoding. Other PostScript bodies (especially those using
Level 2 PostScript's binary encoding mechanism) may only be
reasonably represented using a binary transport encoding.
Finally, since Content-Type is intended to be an open-ended
specification mechanism, strict specification of an
association between Content-Types and encodings effectively
couples the specification of an application protocol with a
specific lower-level transport. This is not desirable since
the developers of a Content-Type should not have to be aware
of all the transports in use and what their limitations are.
NOTE ON TRANSLATING ENCODINGS: The quoted-printable and
base64 encodings are designed so that conversion between
them is possible. The only issue that arises in such a
conversion is the handling of line breaks. When converting
from quoted-printable to base64 a line break must be
converted into a CRLF sequence. Similarly, a CRLF sequence
Borenstein & Freed [Page 13]
RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
in base64 data should be converted to a quoted-printable
line break, but ONLY when converting text data.
NOTE ON CANONICAL ENCODING MODEL: There was some
confusion, in earlier drafts of this memo, regarding the
model for when email data was to be converted to canonical
form and encoded, and in particular how this process would
affect the treatment of CRLFs, given that the representation
of newlines varies greatly from system to system. For this
reason, a canonical model for encoding is presented as
Appendix H.
5.1 Quoted-Printable Content-Transfer-Encoding
The Quoted-Printable encoding is intended to represent data
that largely consists of octets that correspond to printable
characters in the ASCII character set. It encodes the data
in such a way that the resulting octets are unlikely to be
modified by mail transport. If the data being encoded are
mostly ASCII text, the encoded form of the data remains
largely recognizable by humans. A body which is entirely
ASCII may also be encoded in Quoted-Printable to ensure the
integrity of the data should the message pass through a
character-translating, and/or line-wrapping gateway.
In this encoding, octets are to be represented as determined
by the following rules:
Rule #1: (General 8-bit representation) Any octet,
except those indicating a line break according to the
newline convention of the canonical form of the data
being encoded, may be represented by an "=" followed by
a two digit hexadecimal representation of the octet's
value. The digits of the hexadecimal alphabet, for this
purpose, are "0123456789ABCDEF". Uppercase letters must
be
used when sending hexadecimal data, though a robust
implementation may choose to recognize lowercase
letters on receipt. Thus, for example, the value 12
(ASCII form feed) can be represented by "=0C", and the
value 61 (ASCII EQUAL SIGN) can be represented by
"=3D". Except when the following rules allow an
alternative encoding, this rule is mandatory.
Rule #2: (Literal representation) Octets with decimal
values of 33 through 60 inclusive, and 62 through 126,
inclusive, MAY be represented as the ASCII characters
which correspond to those octets (EXCLAMATION POINT
through LESS THAN, and GREATER THAN through TILDE,
respectively).
Rule #3: (White Space): Octets with values of 9 and 32
MAY be represented as ASCII TAB (HT) and SPACE
characters, respectively, but MUST NOT be so
Borenstein & Freed [Page 14]
RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
represented at the end of an encoded line. Any TAB (HT)
or SPACE characters on an encoded line MUST thus be
followed on that line by a printable character. In
particular, an "=" at the end of an encoded line,
indicating a soft line break (see rule #5) may follow
one or more TAB (HT) or SPACE characters. It follows
that an octet with value 9 or 32 appearing at the end
of an encoded line must be represented according to
Rule #1. This rule is necessary because some MTAs
(Message Transport Agents, programs which transport
messages from one user to another, or perform a part of
such transfers) are known to pad lines of text with
SPACEs, and others are known to remove "white space"
characters from the end of a line. Therefore, when
decoding a Quoted-Printable body, any trailing white
space on a line must be deleted, as it will necessarily
have been added by intermediate transport agents.
Rule #4 (Line Breaks): A line break in a text body
part, independent of what its representation is
following the canonical representation of the data
being encoded, must be represented by a (RFC 822) line
break, which is a CRLF sequence, in the Quoted-
Printable encoding. If isolated CRs and LFs, or LF CR
and CR LF sequences are allowed to appear in binary
data according to the canonical form, they must be
represented using the "=0D", "=0A", "=0A=0D" and
"=0D=0A" notations respectively.
Note that many implementation may elect to encode the
local representation of various content types directly.
In particular, this may apply to plain text material on
systems that use newline conventions other than CRLF
delimiters. Such an implementation is permissible, but
the generation of line breaks must be generalized to
account for the case where alternate representations of
newline sequences are used.
Rule #5 (Soft Line Breaks): The Quoted-Printable
encoding REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded with
the Quoted-Printable encoding, 'soft' line breaks must
be used. An equal sign as the last character on a
encoded line indicates such a non-significant ('soft')
line break in the encoded text. Thus if the "raw" form
of the line is a single unencoded line that says:
Now's the time for all folk to come to the aid of
their country.
This can be represented, in the Quoted-Printable
encoding, as
Borenstein & Freed [Page 15]
RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
Now's the time =
for all folk to come=
to the aid of their country.
This provides a mechanism with which long lines are
encoded in such a way as to be restored by the user
agent. The 76 character limit does not count the
trailing CRLF, but counts all other characters,
including any equal signs.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -