📄 rfc2045-multipurposeinternetmailextensions(mime).mht

📁 很好的原始资料 RFC 2045 (rfc2045) - Multipurpose Internet Mail Extensions (MIME) Part One
💻 MHT
📖 第 1 页 / 共 5 页
字号:
6.1.  Content-Transfer-Encoding Syntax

   The Content-Transfer-Encoding field's value is a single token
   specifying the type of encoding, as enumerated below.  Formally:

     encoding :=3D "Content-Transfer-Encoding" ":" mechanism

     mechanism :=3D "7bit" / "8bit" / "binary" /
                  "quoted-printable" / "base64" /
                  ietf-token / x-token

   These values are not case sensitive -- Base64 and BASE64 and bAsE64
   are all equivalent.  An encoding type of 7BIT requires that the body

   is already in a 7bit mail-ready representation.  This is the default
   value -- that is, "Content-Transfer-Encoding: 7BIT" is assumed if the
   Content-Transfer-Encoding header field is not present.

6.2.  Content-Transfer-Encodings Semantics

   This single Content-Transfer-Encoding token actually provides two
   pieces of information.  It specifies what sort of encoding
   transformation the body was subjected to and hence what decoding
   operation must be used to restore it to its original form, and it
   specifies what the domain of the result is.

   The transformation part of any Content-Transfer-Encodings specifies,
   either explicitly or implicitly, a single, well-defined decoding
   algorithm, which for any sequence of encoded octets either transforms
   it to the original sequence of octets which was encoded, or shows
   that it is illegal as an encoded sequence.  Content-Transfer-
   Encodings transformations never depend on any additional external
   profile information for proper operation. Note that while decoders
   must produce a single, well-defined output for a valid encoding no
   such restrictions exist for encoders: Encoding a given sequence of
   octets to different, equivalent encoded sequences is perfectly legal.

   Three transformations are currently defined: identity, the "quoted-
   printable" encoding, and the "base64" encoding.  The domains are
   "binary", "8bit" and "7bit".

   The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
   mean that the identity (i.e. NO) encoding transformation has been
   performed.  As such, they serve simply as indicators of the domain of
   the body data, and provide useful information about the sort of
   encoding that might be needed for transmission in a given transport
   system.  The terms "7bit data", "8bit data", and "binary data" are
   all defined in Section 2.

   The quoted-printable and base64 encodings transform their input from
   an arbitrary domain into material in the "7bit" range, thus making it
   safe to carry over restricted transports.  The specific definition of
   the transformations are given below.

   The proper Content-Transfer-Encoding label must always be used.
   Labelling unencoded data containing 8bit characters as "7bit" is not
   allowed, nor is labelling unencoded non-line-oriented data as
   anything other than "binary" allowed.

   Unlike media subtypes, a proliferation of Content-Transfer-Encoding
   values is both undesirable and unnecessary.  However, establishing
   only a single transformation into the "7bit" domain does not seem

   possible.  There is a tradeoff between the desire for a compact and
   efficient encoding of largely- binary data and the desire for a
   somewhat readable encoding of data that is mostly, but not entirely,
   7bit.  For this reason, at least two encoding mechanisms are
   necessary: a more or less readable encoding (quoted-printable) and a
   "dense" or "uniform" encoding (base64).

   Mail transport for unencoded 8bit data is defined in <A =
href=3D"http://www.faqs.org/rfcs/rfc1652.html">RFC 1652</A>.  As of
   the initial publication of this document, there are no standardized
   Internet mail transports for which it is legitimate to include
   unencoded binary data in mail bodies.  Thus there are no
   circumstances in which the "binary" Content-Transfer-Encoding is
   actually valid in Internet mail.  However, in the event that binary
   mail transport becomes a reality in Internet mail, or when MIME is
   used in conjunction with any other binary-capable mail transport
   mechanism, binary bodies must be labelled as such using this
   mechanism.

   NOTE: The five values defined for the Content-Transfer-Encoding field
   imply nothing about the media type other than the algorithm by which
   it was encoded or the transport system requirements if unencoded.

6.3.  New Content-Transfer-Encodings

   Implementors may, if necessary, define private Content-Transfer-
   Encoding values, but must use an x-token, which is a name prefixed by
   "X-", to indicate its non-standard status, e.g., "Content-Transfer-
   Encoding: x-my-new-encoding".  Additional standardized Content-
   Transfer-Encoding values must be specified by a standards-track RFC.
   The requirements such specifications must meet are given in <A =
href=3D"http://www.faqs.org/rfcs/rfc2048.html">RFC 2048</A>.
   As such, all content-transfer-encoding namespace except that
   beginning with "X-" is explicitly reserved to the IETF for future
   use.

   Unlike media types and subtypes, the creation of new Content-
   Transfer-Encoding values is STRONGLY discouraged, as it seems likely
   to hinder interoperability with little potential benefit

6.4.  Interpretation and Use

   If a Content-Transfer-Encoding header field appears as part of a
   message header, it applies to the entire body of that message.  If a
   Content-Transfer-Encoding header field appears as part of an entity's
   headers, it applies only to the body of that entity.  If an entity is
   of type "multipart" the Content-Transfer-Encoding is not permitted to
   have any value other than "7bit", "8bit" or "binary".  Even more
   severe restrictions apply to some subtypes of the "message" type.

   It should be noted that most media types are defined in terms of
   octets rather than bits, so that the mechanisms described here are
   mechanisms for encoding arbitrary octet streams, not bit streams.  If
   a bit stream is to be encoded via one of these mechanisms, it must
   first be converted to an 8bit byte stream using the network standard
   bit order ("big-endian"), in which the earlier bits in a stream
   become the higher-order bits in a 8bit byte.  A bit stream not ending
   at an 8bit boundary must be padded with zeroes. <A =
href=3D"http://www.faqs.org/rfcs/rfc2046.html">RFC 2046</A> provides a
   mechanism for noting the addition of such padding in the case of the
   application/octet-stream media type, which has a "padding" parameter.

   The encoding mechanisms defined here explicitly encode all data in
   US-ASCII.  Thus, for example, suppose an entity has header fields
   such as:

     Content-Type: text/plain; charset=3DISO-8859-1
     Content-transfer-encoding: base64

   This must be interpreted to mean that the body is a base64 US-ASCII
   encoding of data that was originally in ISO-8859-1, and will be in
   that character set again after decoding.

   Certain Content-Transfer-Encoding values may only be used on certain
   media types.  In particular, it is EXPRESSLY FORBIDDEN to use any
   encodings other than "7bit", "8bit", or "binary" with any composite
   media type, i.e. one that recursively includes other Content-Type
   fields.  Currently the only composite media types are "multipart" and
   "message".  All encodings that are desired for bodies of type
   multipart or message must be done at the innermost level, by encoding
   the actual body that needs to be encoded.

   It should also be noted that, by definition, if a composite entity
   has a transfer-encoding value such as "7bit", but one of the enclosed
   entities has a less restrictive value such as "8bit", then either the
   outer "7bit" labelling is in error, because 8bit data are included,
   or the inner "8bit" labelling placed an unnecessarily high demand on
   the transport system because the actual included data were actually
   7bit-safe.

   NOTE ON ENCODING RESTRICTIONS:  Though the prohibition against using
   content-transfer-encodings on composite body data may seem overly
   restrictive, it is necessary to prevent nested encodings, in which
   data are passed through an encoding algorithm multiple times, and
   must be decoded multiple times in order to be properly viewed.
   Nested encodings add considerable complexity to user agents:  Aside
   from the obvious efficiency problems with such multiple encodings,
   they can obscure the basic structure of a message.  In particular,
   they can imply that several decoding operations are necessary simply

   to find out what types of bodies a message contains.  Banning nested
   encodings may complicate the job of certain mail gateways, but this
   seems less of a problem than the effect of nested encodings on user
   agents.

   Any entity with an unrecognized Content-Transfer-Encoding must be
   treated as if it has a Content-Type of "application/octet-stream",
   regardless of what the Content-Type header field actually says.

   NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-TRANSFER-
   ENCODING: It may seem that the Content-Transfer-Encoding could be
   inferred from the characteristics of the media that is to be encoded,
   or, at the very least, that certain Content-Transfer-Encodings could
   be mandated for use with specific media types.  There are several
   reasons why this is not the case. First, given the varying types of
   transports used for mail, some encodings may be appropriate for some
   combinations of media types and transports but not for others.  (For
   example, in an 8bit transport, no encoding would be required for text
   in certain character sets, while such encodings are clearly required
   for 7bit SMTP.)

   Second, certain media types may require different types of transfer
   encoding under different circumstances.  For example, many PostScript
   bodies might consist entirely of short lines of 7bit data and hence
   require no encoding at all.  Other PostScript bodies (especially
   those using Level 2 PostScript's binary encoding mechanism) may only
   be reasonably represented using a binary transport encoding.
   Finally, since the Content-Type field is intended to be an open-ended
   specification mechanism, strict specification of an association
   between media types and encodings effectively couples the
   specification of an application protocol with a specific lower-level
   transport.  This is not desirable since the developers of a media
   type should not have to be aware of all the transports in use and
   what their limitations are.

6.5.  Translating Encodings

   The quoted-printable and base64 encodings are designed so that
   conversion between them is possible.  The only issue that arises in
   such a conversion is the handling of hard line breaks in quoted-
   printable encoding output. When converting from quoted-printable to
   base64 a hard line break in the quoted-printable form represents a
   CRLF sequence in the canonical form of the data. It must therefore be
   converted to a corresponding encoded CRLF in the base64 form of the
   data.  Similarly, a CRLF sequence in the canonical form of the data
   obtained after base64 decoding must be converted to a quoted-
   printable hard line break, but ONLY when converting text data.

6.6.  Canonical Encoding Model

   There was some confusion, in the previous versions of this RFC,
   regarding the model for when email data was to be converted to
   canonical form and encoded, and in particular how this process would
   affect the treatment of CRLFs, given that the representation of
   newlines varies greatly from system to system, and the relationship
   between content-transfer-encodings and character sets.  A canonical
   model for encoding is presented in <A =
href=3D"http://www.faqs.org/rfcs/rfc2049.html">RFC 2049</A> for this =
reason.

6.7.  Quoted-Printable Content-Transfer-Encoding

   The Quoted-Printable encoding is intended to represent data that
   largely consists of octets that correspond to printable characters in
   the US-ASCII character set.  It encodes the data in such a way that
   the resulting octets are unlikely to be modified by mail transport.
   If the data being encoded are mostly US-ASCII text, the encoded form
   of the data remains largely recognizable by humans.  A body which is
   entirely US-ASCII may also be encoded in Quoted-Printable to ensure
   the integrity of the data should the message pass through a
   character-translating, and/or line-wrapping gateway.

   In this encoding, octets are to be represented as determined by the
   following rules:

    (1)   (General 8bit representation) Any octet, except a CR or
          LF that is part of a CRLF line break of the canonical
          (standard) form of the data being encoded, may be
          represented by an "=3D" followed by a two digit
          hexadecimal representation of the octet's value.  The
          digits of the hexadecimal alphabet, for this purpose,
          are "0123456789ABCDEF".  Uppercase letters must be
          used; lowercase letters are not allowed.  Thus, for
          example, the decimal value 12 (US-ASCII form feed) can
          be represented by "=3D0C", and the decimal value 61 (US-
          ASCII EQUAL SIGN) can be represented by "=3D3D".  This
          rule must be followed except when the following rules
          allow an alternative encoding.

    (2)   (Literal representation) Octets with decimal values of
          33 through 60 inclusive, and 62 through 126, inclusive,
          MAY be represented as the US-ASCII characters which
          correspond to those octets (EXCLAMATION POINT through
          LESS THAN, and GREATER THAN through TILDE,
          respectively).

    (3)   (White Space) Octets with values of 9 and 32 MAY be
          represented as US-ASCII TAB (HT) and SPACE characters,

          respectively, but MUST NOT be so represented at the end
💿 文件大小 30 K
👤 上传用户 zhangpeng
📂 所属分类 Internet/网络编程
🏷️ 相关标签

#2045 #Multipurpose #Extensions #Internet
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -