⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 rfc2046.txt

📁 < VB高级网络编程技术>>随书源代码第2章,里面有很多有用的例程,希望对大家的开发工作有帮助!
💻 TXT
📖 第 1 页 / 共 5 页
字号:



Freed & Borenstein          Standards Track                     [Page 5]

RFC 2046                      Media Types                  November 1996


    (2)   message -- an encapsulated message.  A body of media
          type "message" is itself all or a portion of some kind
          of message object.  Such objects may or may not in turn
          contain other entities.  The "rfc822" subtype is used
          when the encapsulated content is itself an RFC 822
          message.  The "partial" subtype is defined for partial
          RFC 822 messages, to permit the fragmented transmission
          of bodies that are thought to be too large to be passed
          through transport facilities in one piece.  Another
          subtype, "external-body", is defined for specifying
          large bodies by reference to an external data source.

   It should be noted that the list of media type values given here may
   be augmented in time, via the mechanisms described above, and that
   the set of subtypes is expected to grow substantially.

4.  Discrete Media Type Values

   Five of the seven initial media type values refer to discrete bodies.
   The content of these types must be handled by non-MIME mechanisms;
   they are opaque to MIME processors.

4.1.  Text Media Type

   The "text" media type is intended for sending material which is
   principally textual in form.  A "charset" parameter may be used to
   indicate the character set of the body text for "text" subtypes,
   notably including the subtype "text/plain", which is a generic
   subtype for plain text.  Plain text does not provide for or allow
   formatting commands, font attribute specifications, processing
   instructions, interpretation directives, or content markup.  Plain
   text is seen simply as a linear sequence of characters, possibly
   interrupted by line breaks or page breaks.  Plain text may allow the
   stacking of several characters in the same position in the text.
   Plain text in scripts like Arabic and Hebrew may also include
   facilitites that allow the arbitrary mixing of text segments with
   opposite writing directions.

   Beyond plain text, there are many formats for representing what might
   be known as "rich text".  An interesting characteristic of many such
   representations is that they are to some extent readable even without
   the software that interprets them.  It is useful, then, to
   distinguish them, at the highest level, from such unreadable data as
   images, audio, or text represented in an unreadable form. In the
   absence of appropriate interpretation software, it is reasonable to
   show subtypes of "text" to the user, while it is not reasonable to do
   so with most nontextual data. Such formatted textual data should be
   represented using subtypes of "text".



Freed & Borenstein          Standards Track                     [Page 6]

RFC 2046                      Media Types                  November 1996


4.1.1.  Representation of Line Breaks

   The canonical form of any MIME "text" subtype MUST always represent a
   line break as a CRLF sequence.  Similarly, any occurrence of CRLF in
   MIME "text" MUST represent a line break.  Use of CR and LF outside of
   line break sequences is also forbidden.

   This rule applies regardless of format or character set or sets
   involved.

   NOTE: The proper interpretation of line breaks when a body is
   displayed depends on the media type. In particular, while it is
   appropriate to treat a line break as a transition to a new line when
   displaying a "text/plain" body, this treatment is actually incorrect
   for other subtypes of "text" like "text/enriched" [RFC-1896].
   Similarly, whether or not line breaks should be added during display
   operations is also a function of the media type. It should not be
   necessary to add any line breaks to display "text/plain" correctly,
   whereas proper display of "text/enriched" requires the appropriate
   addition of line breaks.

   NOTE: Some protocols defines a maximum line length.  E.g. SMTP [RFC-
   821] allows a maximum of 998 octets before the next CRLF sequence.
   To be transported by such protocols, data which includes too long
   segments without CRLF sequences must be encoded with a suitable
   content-transfer-encoding.

4.1.2.  Charset Parameter

   A critical parameter that may be specified in the Content-Type field
   for "text/plain" data is the character set.  This is specified with a
   "charset" parameter, as in:

     Content-type: text/plain; charset=iso-8859-1

   Unlike some other parameter values, the values of the charset
   parameter are NOT case sensitive.  The default character set, which
   must be assumed in the absence of a charset parameter, is US-ASCII.

   The specification for any future subtypes of "text" must specify
   whether or not they will also utilize a "charset" parameter, and may
   possibly restrict its values as well.  For other subtypes of "text"
   than "text/plain", the semantics of the "charset" parameter should be
   defined to be identical to those specified here for "text/plain",
   i.e., the body consists entirely of characters in the given charset.
   In particular, definers of future "text" subtypes should pay close
   attention to the implications of multioctet character sets for their
   subtype definitions.



Freed & Borenstein          Standards Track                     [Page 7]

RFC 2046                      Media Types                  November 1996


   The charset parameter for subtypes of "text" gives a name of a
   character set, as "character set" is defined in RFC 2045.  The rules
   regarding line breaks detailed in the previous section must also be
   observed -- a character set whose definition does not conform to
   these rules cannot be used in a MIME "text" subtype.

   An initial list of predefined character set names can be found at the
   end of this section.  Additional character sets may be registered
   with IANA.

   Other media types than subtypes of "text" might choose to employ the
   charset parameter as defined here, but with the CRLF/line break
   restriction removed.  Therefore, all character sets that conform to
   the general definition of "character set" in RFC 2045 can be
   registered for MIME use.

   Note that if the specified character set includes 8-bit characters
   and such characters are used in the body, a Content-Transfer-Encoding
   header field and a corresponding encoding on the data are required in
   order to transmit the body via some mail transfer protocols, such as
   SMTP [RFC-821].

   The default character set, US-ASCII, has been the subject of some
   confusion and ambiguity in the past.  Not only were there some
   ambiguities in the definition, there have been wide variations in
   practice.  In order to eliminate such ambiguity and variations in the
   future, it is strongly recommended that new user agents explicitly
   specify a character set as a media type parameter in the Content-Type
   header field. "US-ASCII" does not indicate an arbitrary 7-bit
   character set, but specifies that all octets in the body must be
   interpreted as characters according to the US-ASCII character set.
   National and application-oriented versions of ISO 646 [ISO-646] are
   usually NOT identical to US-ASCII, and in that case their use in
   Internet mail is explicitly discouraged.  The omission of the ISO 646
   character set from this document is deliberate in this regard.  The
   character set name of "US-ASCII" explicitly refers to the character
   set defined in ANSI X3.4-1986 [US- ASCII].  The new international
   reference version (IRV) of the 1991 edition of ISO 646 is identical
   to US-ASCII.  The character set name "ASCII" is reserved and must not
   be used for any purpose.

   NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier
   version of the American Standard.  Insofar as one of the purposes of
   specifying a media type and character set is to permit the receiver
   to unambiguously determine how the sender intended the coded message
   to be interpreted, assuming anything other than "strict ASCII" as the
   default would risk unintentional and incompatible changes to the
   semantics of messages now being transmitted.  This also implies that



Freed & Borenstein          Standards Track                     [Page 8]

RFC 2046                      Media Types                  November 1996


   messages containing characters coded according to other versions of
   ISO 646 than US-ASCII and the 1991 IRV, or using code-switching
   procedures (e.g., those of ISO 2022), as well as 8bit or multiple
   octet character encodings MUST use an appropriate character set
   specification to be consistent with MIME.

   The complete US-ASCII character set is listed in ANSI X3.4- 1986.
   Note that the control characters including DEL (0-31, 127) have no
   defined meaning in apart from the combination CRLF (US-ASCII values
   13 and 10) indicating a new line.  Two of the characters have de
   facto meanings in wide use: FF (12) often means "start subsequent
   text on the beginning of a new page"; and TAB or HT (9) often (though
   not always) means "move the cursor to the next available column after
   the current position where the column number is a multiple of 8
   (counting the first column as column 0)."  Aside from these
   conventions, any use of the control characters or DEL in a body must
   either occur

    (1)   because a subtype of text other than "plain"
          specifically assigns some additional meaning, or

    (2)   within the context of a private agreement between the
          sender and recipient. Such private agreements are
          discouraged and should be replaced by the other
          capabilities of this document.

   NOTE: An enormous proliferation of character sets exist beyond US-
   ASCII.  A large number of partially or totally overlapping character
   sets is NOT a good thing.  A SINGLE character set that can be used
   universally for representing all of the world's languages in Internet
   mail would be preferrable.  Unfortunately, existing practice in
   several communities seems to point to the continued use of multiple
   character sets in the near future.  A small number of standard
   character sets are, therefore, defined for Internet use in this
   document.

   The defined charset values are:

    (1)   US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].

    (2)   ISO-8859-X -- where "X" is to be replaced, as
          necessary, for the parts of ISO-8859 [ISO-8859].  Note
          that the ISO 646 character sets have deliberately been
          omitted in favor of their 8859 replacements, which are
          the designated character sets for Internet mail.  As of
          the publication of this document, the legitimate values
          for "X" are the digits 1 through 10.




Freed & Borenstein          Standards Track                     [Page 9]

RFC 2046                      Media Types                  November 1996


   Characters in the range 128-159 has no assigned meaning in ISO-8859-
   X.  Characters with values below 128 in ISO-8859-X have the same
   assigned meaning as they do in US-ASCII.

   Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew
   alphabet) includes both characters for which the normal writing
   direction is right to left and characters for which it is left to
   right, but do not define a canonical ordering method for representing
   bi-directional text.  The charset values "ISO-8859-6" and "ISO-8859-
   8", however, specify that the visual method is used [RFC-1556].

   All of these character sets are used as pure 7bit or 8bit sets
   without any shift or escape functions.  The meaning of shift and
   escape sequences in these character sets is not defined.

   The character sets specified above are the ones that were relatively
   uncontroversial during the drafting of MIME.  This document does not
   endorse the use of any particular character set other than US-ASCII,
   and recognizes that the future evolution of world character sets
   remains unclear.

   Note that the character set used, if anything other than US- ASCII,
   must always be explicitly specified in the Content-Type field.

   No character set name other than those defined above may be used in
   Internet mail without the publication of a formal specification and
   its registration with IANA, or by private agreement, in which case
   the character set name must begin with "X-".

   Implementors are discouraged from defining new character sets unless
   absolutely necessary.

   The "charset" parameter has been defined primarily for the purpose of
   textual data, and is described in this section for that reason.
   However, it is conceivable that non-textual data might also wish to
   specify a charset value for some purpose, in which case the same
   syntax and values should be used.

   In general, composition software should always use the "lowest common
   denominator" character set possible.  For example, if a body contains
   only US-ASCII characters, it SHOULD be marked as being in the US-
   ASCII character set, not ISO-8859-1, which, like all the ISO-8859
   family of character sets, is a superset of US-ASCII.  More generally,
   if a widely-used character set is a subset of another character set,
   and a body contains only characters in the widely-used subset, it
   should be labelled as being in that subset.  This will increase the

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -