📄 rfc2049.txt
字号:
These MTAs, speaking the SMTP protocol, alter messages on the fly to
take advantage of the internal data structure of the hosts they are
implemented on, or are just plain broken.
The following guidelines may be useful to anyone devising a data
format (media type) that is supposed to survive the widest range of
networking technologies and known broken MTAs unscathed. Note that
anything encoded in the base64 encoding will satisfy these rules, but
that some well-known mechanisms, notably the UNIX uuencode facility,
will not. Note also that anything encoded in the Quoted-Printable
encoding will survive most gateways intact, but possibly not some
gateways to systems that use the EBCDIC character set.
(1) Under some circumstances the encoding used for data may
change as part of normal gateway or user agent
operation. In particular, conversion from base64 to
quoted-printable and vice versa may be necessary. This
may result in the confusion of CRLF sequences with line
breaks in text bodies. As such, the persistence of
CRLF as something other than a line break must not be
relied on.
(2) Many systems may elect to represent and store text data
using local newline conventions. Local newline
conventions may not match the RFC822 CRLF convention --
systems are known that use plain CR, plain LF, CRLF, or
counted records. The result is that isolated CR and LF
characters are not well tolerated in general; they may
be lost or converted to delimiters on some systems, and
hence must not be relied on.
(3) The transmission of NULs (US-ASCII value 0) is
problematic in Internet mail. (This is largely the
result of NULs being used as a termination character by
many of the standard runtime library routines in the C
programming language.) The practice of using NULs as
termination characters is so entrenched now that
messages should not rely on them being preserved.
(4) TAB (HT) characters may be misinterpreted or may be
automatically converted to variable numbers of spaces.
This is unavoidable in some environments, notably those
not based on the US-ASCII character set. Such
conversion is STRONGLY DISCOURAGED, but it may occur,
and mail formats must not rely on the persistence of
TAB (HT) characters.
(5) Lines longer than 76 characters may be wrapped or
truncated in some environments. Line wrapping or line
truncation imposed by mail transports is STRONGLY
DISCOURAGED, but unavoidable in some cases.
Applications which require long lines must somehow
differentiate between soft and hard line breaks. (A
simple way to do this is to use the quoted-printable
encoding.)
(6) Trailing "white space" characters (SPACE, TAB (HT)) on
a line may be discarded by some transport agents, while
other transport agents may pad lines with these
characters so that all lines in a mail file are of
equal length. The persistence of trailing white space,
therefore, must not be relied on.
(7) Many mail domains use variations on the US-ASCII
character set, or use character sets such as EBCDIC
which contain most but not all of the US-ASCII
characters. The correct translation of characters not
in the "invariant" set cannot be depended on across
character converting gateways. For example, this
situation is a problem when sending uuencoded
information across BITNET, an EBCDIC system. Similar
problems can occur without crossing a gateway, since
many Internet hosts use character sets other than US-
ASCII internally. The definition of Printable Strings
in X.400 adds further restrictions in certain special
cases. In particular, the only characters that are
known to be consistent across all gateways are the 73
characters that correspond to the upper and lower case
letters A-Z and a-z, the 10 digits 0-9, and the
following eleven special characters:
"'" (US-ASCII decimal value 39)
"(" (US-ASCII decimal value 40)
")" (US-ASCII decimal value 41)
"+" (US-ASCII decimal value 43)
"," (US-ASCII decimal value 44)
"-" (US-ASCII decimal value 45)
"." (US-ASCII decimal value 46)
"/" (US-ASCII decimal value 47)
":" (US-ASCII decimal value 58)
"=" (US-ASCII decimal value 61)
"?" (US-ASCII decimal value 63)
A maximally portable mail representation will confine
itself to relatively short lines of text in which the
only meaningful characters are taken from this set of
73 characters. The base64 encoding follows this rule.
(8) Some mail transport agents will corrupt data that
includes certain literal strings. In particular, a
period (".") alone on a line is known to be corrupted
by some (incorrect) SMTP implementations, and a line
that starts with the five characters "From " (the fifth
character is a SPACE) are commonly corrupted as well.
A careful composition agent can prevent these
corruptions by encoding the data (e.g., in the quoted-
printable encoding using "=46rom " in place of "From "
at the start of a line, and "=2E" in place of "." alone
on a line).
Please note that the above list is NOT a list of recommended
practices for MTAs. RFC 821 MTAs are prohibited from altering the
character of white space or wrapping long lines. These BAD and
invalid practices are known to occur on established networks, and
implementations should be robust in dealing with the bad effects they
can cause.
4. Canonical Encoding Model
There was some confusion, in earlier versions of these documents,
regarding the model for when email data was to be converted to
canonical form and encoded, and in particular how this process would
affect the treatment of CRLFs, given that the representation of
newlines varies greatly from system to system. For this reason, a
canonical model for encoding is presented below.
The process of composing a MIME entity can be modeled as being done
in a number of steps. Note that these steps are roughly similar to
those steps used in PEM [RFC-1421] and are performed for each
"innermost level" body:
(1) Creation of local form.
The body to be transmitted is created in the system's
native format. The native character set is used and,
where appropriate, local end of line conventions are
used as well. The body may be a UNIX-style text file,
or a Sun raster image, or a VMS indexed file, or audio
data in a system-dependent format stored only in
memory, or anything else that corresponds to the local
model for the representation of some form of
information. Fundamentally, the data is created in the
"native" form that corresponds to the type specified by
the media type.
(2) Conversion to canonical form.
The entire body, including "out-of-band" information
such as record lengths and possibly file attribute
information, is converted to a universal canonical
form. The specific media type of the body as well as
its associated attributes dictate the nature of the
canonical form that is used. Conversion to the proper
canonical form may involve character set conversion,
transformation of audio data, compression, or various
other operations specific to the various media types.
If character set conversion is involved, however, care
must be taken to understand the semantics of the media
type, which may have strong implications for any
character set conversion, e.g. with regard to
syntactically meaningful characters in a text subtype
other than "plain".
For example, in the case of text/plain data, the text
must be converted to a supported character set and
lines must be delimited with CRLF delimiters in
accordance with RFC 822. Note that the restriction on
line lengths implied by RFC 822 is eliminated if the
next step employs either quoted-printable or base64
encoding.
(3) Apply transfer encoding.
A Content-Transfer-Encoding appropriate for this body
is applied. Note that there is no fixed relationship
between the media type and the transfer encoding. In
particular, it may be appropriate to base the choice of
base64 or quoted-printable on character frequency
counts which are specific to a given instance of a
body.
(4) Insertion into entity.
The encoded body is inserted into a MIME entity with
appropriate headers. The entity is then inserted into
the body of a higher-level entity (message or
multipart) as needed.
Conversion from entity form to local form is accomplished by
reversing these steps. Note that reversal of these steps may produce
differing results since there is no guarantee that the original and
final local forms are the same.
It is vital to note that these steps are only a model; they are
specifically NOT a blueprint for how an actual system would be built.
In particular, the model fails to account for two common designs:
(1) In many cases the conversion to a canonical form prior
to encoding will be subsumed into the encoder itself,
which understands local formats directly. For example,
the local newline convention for text bodies might be
carried through to the encoder itself along with
knowledge of what that format is.
(2) The output of the encoders may have to pass through one
or more additional steps prior to being transmitted as
a message. As such, the output of the encoder may not
be conformant with the formats specified by RFC 822.
In particular, once again it may be appropriate for the
converter's output to be expressed using local newline
conventions rather than using the standard RFC 822 CRLF
delimiters.
Other implementation variations are conceivable as well. The vital
aspect of this discussion is that, in spite of any optimizations,
collapsings of required steps, or insertion of additional processing,
the resulting messages must be consistent with those produced by the
model described here. For example, a message with the following
header fields:
Content-type: text/foo; charset=bar
Content-Transfer-Encoding: base64
must be first represented in the text/foo form, then (if necessary)
represented in the "bar" character set, and finally transformed via
the base64 algorithm into a mail-safe form.
NOTE: Some confusion has been caused by systems that represent
messages in a format which uses local newline conventions which
differ from the RFC822 CRLF convention. It is important to note that
these formats are not canonical RFC822/MIME. These formats are
instead *encodings* of RFC822, where CRLF sequences in the canonical
representation of the message are encoded as the local newline
convention. Note that formats which encode CRLF sequences as, for
example, LF are not capable of representing MIME messages containing
binary data which contains LF octets not part of CRLF line separation
sequences.
5. Summary
This document defines what is meant by MIME Conformance. It also
details various problems known to exist in the Internet email system
and how to use MIME to overcome them. Finally, it describes MIME's
canonical encoding model.
6. Security Considerations
Security issues are discussed in the second document in this set, RFC
2046.
7. Authors' Addresses
For more information, the authors of this document are best contacted
via Internet mail:
Ned Freed
Innosoft International, Inc.
1050 East Garvey Avenue South
West Covina, CA 91790
USA
Phone: +1 818 919 3600
Fax: +1 818 919 3614
EMail: ned@innosoft.com
Nathaniel S. Borenstein
First Virtual Holdings
25 Washington Avenue
Morristown, NJ 07960
USA
Phone: +1 201 540 8967
Fax: +1 201 993 3032
EMail: nsb@nsb.fv.com
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -