📄 rfc2130.txt

📁 著名的RFC文档,其中有一些文档是已经翻译成中文的的.
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   Layout includes the elements needed for displaying text to the user,   such as font selection, word-wrapping, etc.  It is similar to the   'presentation' layer in the 7-layer ISO telecommunications model   [ISO-7498].3.1.1.2:  Culture   Culture includes information about cultural preferences, which affect   spelling, word choice, and so forth.3.1.1.3:  Locale   The locale component includes the information necessary to make   choices about text manipulation which will present the text to the   user in an expected format.  This information may include the display   of date, time and monetary symbol preferences.  Notice that locale   modifications are typically applied to a text stream before it is   presented to the user, although they also are used to specify input   formats.3.1.1.4:  Language   This component specifies the language of the transmitted text.  At   times and in specific cases, language information may be required to   achieve a particular level of quality for the purpose of displaying a   text stream.  For example, UTF-8 encoded Han may require transmission   of a language tag to select the specific glyphs to be displayed at a   particular level of quality.   Note that information other than language may be used to achieve the   required level of quality in a display process.  In particular, a   font tag is sufficient to produce identical results.  However, the   association of a language with a specific block of text has   usefulness far beyond its use in display.  In particular, as the   amount of information available in multiple languages on the World   Wide Web grows, it becomes critical to specify which language is in   use in particular documents, to assist automatic indexing and   retrieval of relevant documents.Weider, et. al.              Informational                      [Page 7]RFC 2130             Character Set Workshop Report            April 1997   The term 'language tag' should be reserved for the short identifier   of RFC 1766 [RFC-1766] that only serves to identify the language.   While there may be other text attributes intimately associated with   the language of the document, such as desired font or text direction,   these should be specified with other identifiers rather than   overloading the language tag.3.2:  On the wire   There are three segments of the model which are required for   completely specifying the content of a transmitted text stream (with   the occasional exception of the Language component, mentioned above).   These components are:   1)  Coded Character Set,   2)  Character Encoding Scheme, and   3)  Transfer Encoding Syntax.   Each of these abstract components must be explicitly specified by the   transmitter when the data is sent.  There may be instances of an   implicit specification due to the protocol/standard being used (i.e.   ANSI/NISO Z39.50).  Also, in MIME, the Coded Character Set and   Character Encoding Scheme are specified by the Charset parameter to   the Content-Type header field, and Transfer Encoding Syntax is   specified by the Content-Transfer-Encoding header field.3.2.1:  Coded Character Set   A Coded Character Set (CCS) is a mapping from a set of abstract   characters to a set of integers.  Examples of coded character sets   are ISO 10646 [ISO-10646], US-ASCII [ASCII], and ISO-8859 series   [ISO-8859].3.2.2:  Character Encoding Scheme   A Character Encoding Scheme (CES) is a mapping from a Coded Character   Set or several coded character sets to a set of octets. Examples of   Character Encoding Schemes are ISO 2022 [ISO-2022] and UTF-8 [UTF-8].   A given CES is typically associated with a single CCS; for example,   UTF-8 applies only to ISO 10646.Weider, et. al.              Informational                      [Page 8]RFC 2130             Character Set Workshop Report            April 19973.2.3:  Transfer Encoding Syntax   It is frequently necessary to transform encoded text into a format   which is transmissible by specific protocols.  The Transfer Encoding   Syntax (TES) is a transformation applied to character data encoded   using a CCS and possibly a CES to allow it to be transmitted.   Examples of Transfer Encoding Syntaxes are Base64 Encoding [Base64],   gzip encoding, and so forth.3.3:  Determining which values of CCS, CES, and TES are used   To completely specify which CCS, CES, and TES are used in a specific   text transmission, there needs to be a consistent set of labels for   specifying which CCS, CES, and TES are used.  Once the appropriate   mechanisms have been selected, there are six techniques for attaching   these labels to the data.   The labels themselves are named and registered, either with IANA   [IANA] or with some other registry.  Ideally, their definitions are   retrievable from some registration authority.   Labels may be determined in one of the following ways:   -  Determined by guessing, where the receiver of the text has to      guess the values of the CCS, CES, and TES. For example: "I got      this from Sweden so it's probably  ISO-8859-1."  This is      obviously not a very foolproof way to decode text.   -  Determined by the standard, where the protocol used to transmit      the data has made documented choices of CCS, CES, and TES in the      standard. Thus, the encodings used are known through the      access protocol, for example HTTP [HTTP] uses (but is not      limited to) ISO-8859-1, SMTP uses US-ASCII.   -  Attached to the transfer envelope, where the descriptive labels are      attached to the wrapper placed around the text for transport.      MIME headers are a good example of this technique.   -  Included in the data stream, where the data stream itself has      been encoded in such a way as to signal the character set used.      For example, ISO-2022 encodes the data with escape sequences to      provide information on the character subset currently being used.   -  Agreed by prior bilateral agreement, where some out-of-band      negotiation has allowed the text transmitter and receiver to      determine the CCS, CES, and  TES for the transmitted text.   -  Agreed to by negotiation during some phase, typically      initialization of the protocol.Weider, et. al.              Informational                      [Page 9]RFC 2130             Character Set Workshop Report            April 19973.3.1:  Recommendations for value specification mechanisms   While each of these techniques (with the  exception of guessing) is   useful in particular situations, interoperability requires a more   consistent set of techniques.  Thus, we recommend that MIME   registered values be used for all tagging of character sets and   languages UNLESS there is an existing mechanism for determining the   required information using one of the other techniques (except   guessing).  This recommendation will require a fair bit of work on   the part of protocol designers, implementors, the IETF, the IESG, and   the IAB.   However, it is important to point out that the MIME concept of   'charset' in some cases cuts across several layers of components in   our model.  While this can be accepted in existing registrations, we   also recommend that the MIME registration procedure for character   sets be modified to show how a proposed character set deals with the   CCS and the CES. Most 'charsets' have a well defined CCS and CES,   they should merely be teased apart for the registration.   There are a number of other recommendations, but these will be   covered in the next sections.3.4:  Recommended Defaults   For a number of reasons, one cannot define a mandatory set of   defaults for all Internet protocols.  There is a mass of current   practice, future protocols are likely to have different purposes,   which may determine their handling of text, and protocols may need   specific variation support.  For example, in mail, text is a   predominant data type and coded character sets then become a major   issue for the protocol.  Also, since e-mail is ubiquitous and users   expect to be able to send it to everyone, the mail protocols need to   be quite adept at handling different character set encodings.  On the   other hand, if strings are seldom used in a given protocol, there is   no need to weigh the protocol down with a sophisticated apparatus for   handling multiple character sets, assuming that the predicated   character set can handle all the protocol's needs. This observation   also applies to the specification techniques for character set   parameters.  If only one character set encoding is needed, it can be   made explicit in the protocol specification.  Protocols with a   greater need for character set support will need a more elaborate   specification technique.Weider, et. al.              Informational                     [Page 10]RFC 2130             Character Set Workshop Report            April 19973.4.1:  Clarity of specification   We recommend that each protocol clearly specify what it is using for   each of the layers of the transmission model.  Users (or clients)   should never have to guess what the parameter is for a given layer.3.4.2:  Default Coded Character Set:   The default Coded Character Set is the repertoire of ISO-10646.3.4.3:   Default Character Encoding Scheme   For text-oriented protocols, new protocols should use UTF-8, and   protocols that have a backwards compatibility requirement should use   the default of the existing protocol, e.g. US-ASCII for mail, and   ISO-8859-1 for HTTP.  The recommended specification scheme is the   MIME "charset" specification, using the IANA "charset"   specifications.  The MIME specifications will need to be clarified to   meet this model in the future.   For other protocols, the default should be UTF-8 as this initially   allows US-ASCII to be entered as-is, and enables the full repertoire   of ISO 10646.   Some protocols, such as those descended from SGML [SGML], have other   natural notations for characters outside their "natural" repertoire;   for instance, HTML [HTML] allows the use of &#nnnn to refer to any   ISO 10646 character.  Note that this, like all other encodings that   depend on "escape characters", redefines at least one character from   the base character set for use as an indicator of "foreign"   characters.  Use of this approach must be weighed very carefully.3.4.4:   Default Transport Encoding Scheme   There is no recommended default for this level.  For plain text   oriented protocols, the bytestream transport format should be 8-bit   clean, possibly with normalization of end-of-line indicators.  Some   special cases could be made for protocols that are not 8-bit clean,   such as encoding it for transport over 7-bit connections.  For binary   the same recommendation holds as above.  The specification technique   should either be defined in the  protocol, if only one way is   permitted, or by use of MIME content-transfer-encoding (CTE)   techniques, using IANA registered values.Weider, et. al.              Informational                     [Page 11]RFC 2130             Character Set Workshop Report            April 19973.4.5:  Default Language   There is no recommended default for the language level.  For human   readable text, there should always be a way to specify the natural   language. The specification technique should be a MIME identifier   with IANA  registered values for languages.  If headers are used, the   header should be 'Content-Language'.3.4.6:  Default Locale   The default should be the POSIX locale.  The specification technique   should use the Cultural register of CEN ENV 12005 [CEN] for the   values.  If headers are used, the header should be 'Content-Locale'.3.4.7:  Default Culture   There is no recommended default for the Culture level.  The   specification  technique should be a MIME or MIME-like identifier   (e.g. Content-Culture) and should use the Cultural register of CEN   ENV 12005 for its values.3.4.8:  Default Presentation   There is no recommended default for the Presentation level.  The   specification technique should be a MIME or MIME-like identifier   (e.g.  Content-Layout) and use the glyph register of ISO 10036 and   other registers for its values.3.4.9:  Multiplexing   In some cases, text transmission may require the use of a number of   different values for a given parameter; for example, English   annotation of Japanese text might well require shifting the Content-   Language parameter.  The way to switch the value of parameters within   a single body of text depends on the application.  For instance, the   HTML I18N [I18N] work defines a language attribute on most of its   elements, including <SPAN>, <HTML>, and <BODY>, for the purpose of   switching between different languages.  When only one value is   needed, this value should be as general as possible, and specified in   the protocol standard with reference to the IANA or other registry   value.  All levels should be specified explicitly.3.4.10:  Storage   Because stored text may very well be stored without any of the   additional information necessary for decoding, stored text SHOULD be   tagged in a MIME compliant fashion.  This alleviates the problem of   being unable to interpret text which has been stored for a long time,Weider, et. al.              Informational                     [Page 12]RFC 2130             Character Set Workshop Report            April 1997   or text whose provenance is not available.3.5:  Guidelines for conversions between coded character sets   This section covers various algorithms to convert a source text S,   encoded in the coded character set CCS(S), to a target text T,   encoded in the coded character set CCS(T).   Rep(X) is the character repertoire of coded character set X, i.e. the   set of characters which can be represented with X.3.5.1:  Exact conversion   When Rep(CCS(S)) and Rep(CCS(T)) are equal or Rep(CCS(S)) is a subset   of Rep(CCS(T)), exact conversion is possible; i.e. T is equal to S.   The octets just need to be remapped.  The algorithm for performing   this remapping is simple, if the IANA-registered definition tables   for CCS(S) and CCS(T) are available.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -