📄 rfc2130.txt

📁 著名的RFC文档,其中有一些文档是已经翻译成中文的的.
💻 TXT
📖 第 1 页 / 共 5 页
字号:
3.5.2:  Approximate conversion   In all other cases, any conversion creates a text T which differs   from S.  There are different principles for how this inevitable   difference should be handled.  A choice between them should be made,   depending on the purpose and requirements of the conversion.  Where   possible, the client application should be given mechanisms to   determine what has been done to the text.   3.5.2.1:  Length-modifying conversion for human display   When the length of the target text T is allowed to differ from the   length of the source text S, one should use a conversion method in   which each source character is converted to one or several target   character(s), using a best resemblance criteria in the choice of that   target character(s).   Examples:      LATIN CAPITAL LETTER [*] ->  AE      COPYRIGHT SIGN       [*] -> (c)3.5.2.2:  Length-preserving conversion for human display   Where the text T must be presented and the length of T cannot differ   from the length of S, one should use a conversion method where each   source character is converted to one target character, using some   kind of best  resemblance criteria in the choice of target character.Weider, et. al.              Informational                     [Page 13]RFC 2130             Character Set Workshop Report            April 1997   Examples:     LATIN CAPITAL LETTER  [*] -> A     COPYRIGHT SIGN        [*] -> C3.5.2.3:  Conversion without data loss   Where the conversion of the text S into T must be completely   reversible, apply a Character Encoding Syntax or other reversible   transformation method.  This case is most frequently met in data   storage requirements.   Examples:     LATIN CAPITAL LETTER [*] -> &AE     COPYRIGHT SIGN       [*] -> &(C   An alternate method, which can be used if the size of Rep(CCS(T)) >=   Rep(CCS(S)), then for each character in Rep(CCS(S)) which is not   present in Rep(CCS(T)), define a mapping into a character in   Rep(CCS(T)) which is not present in Rep(CCS(S)).   Examples:     LATIN CAPITAL LETTER  [*] -> CYRILLIC CAPITAL LETTER [*]     COPYRIGHT SIGN  [*] -> PARTIAL DIFFERENTIAL SIGN [*]   Note that conversion without data loss requires redefining some   member of T to indicate "the introduction of character data outside   T".  This effectively adds another level of CES on top of CES(T).4: Presentation issues   There are a number of considerations to make in selecting the base   character set.  One such consideration is the protocol's convenience   to users with limited equipment (for example only ISO 8859-1 or a   keyboard without the ability to enter all the characters in ISO   10646).  Alternative representation should be considered for these   users, both for input and output.  Possible options for the   representation of characters that can not be displayed include   transliteration (a la CEN/TC304 or ISO TC46/SC2 ), RFC 1345 [RFC-   1345] representative icons, or the WG2 short name (u+xxxx).5: Open issues   In addition to the issues declared out of scope and enumerated in   section 2.1, the following issues are still open and will need to be   addressed in other forums.  These issues: language tags, public   identifiers such as URL names, and bi-directionality are briefly   discussed below as they repeatedly encroached the discussion.Weider, et. al.              Informational                     [Page 14]RFC 2130             Character Set Workshop Report            April 19975.1: Language tags   Although the workshop decided not to explicitly address the so-called   "CJK issue", a few members felt it was necessary to have some   mechanism to address the problem of correct Han character display in   the ISO-10646 issue, and that saying that it was a "font issue" would   not suffice.   The "CJK issue" refers to the extended discussion about "Han   unification", the use of a single ISO-10646 codepoint to represent   multiple national variants of a Chinese (Han) character.  ISO-10646   can map uniquely to any single CJK national character set, but in the   absence of additional  information an application can not display an   ISO-10646 text using the proper national variants for that text.   It was agreed that language tags would be sufficient to disambiguate   unified characters. There was not, in our opinion, a significant   technical difference between the use of different coded character   sets with overlapping codepoints, and a single coded character set   with language tags.  Either way, the application has sufficient   information to display the text properly.   It was observed that in contemporary usage of MIME charsets, the   language is implied as well as the coded character set and the   character encoding syntax.  We agreed that this is excessive   overloading of MIME charsets.   To specify the language used in a particular block of text, we   recommend that the MIME tag "Content-Language" be used.  There are a   number of questions about this approach that need to be worked out,   however:   -  Is Content-Language: actually suitable?   -  Is there an overload between this function and the other        intended functions of Content-Language: as described in RFC        1766?   -  What, precisely, does "Content-Language: zh-tw, ja, ko, zh-cn"        mean in this context? We believe it means that, in drawing a        Han character, the Taiwanese variant (presumably traditional        Han) is preferred, followed by the Japanese, Korean, and        mainland Chinese (presumably simplified Han) variants. It does        *NOT* mean "mixed text containing Taiwanese, Japanese, Korean,        and mainland Chinese text with all the national variants in        each of these".   Mixed CJK text, that simultaneously displays different variants   occupying the same codepoint, requires language tags embedded in the   data.  Ohta and Handa propose in RFC 1554 [RFC-1554] a MIME charsetWeider, et. al.              Informational                     [Page 15]RFC 2130             Character Set Workshop Report            April 1997   using ISO-2022 shifts between multiple coded character sets; in   effect this is an encoding that uses coded character sets for   displaying the appropriate glyphs.   There is some speculation that states that mixed CJK text is   relatively infrequent, and that therefore it is acceptable to require   that such text be represented using a rich text format that can   support language tags.  In other words, that a simplifying assumption   can be made for TEXT/PLAIN in  email using ISO-10646 that will not   require multiple display representations for the same codepoint.  A   mechanism such as RFC 1554 could address this need if it was   important; although arguably RFC 1554 should really be identified as   TEXT/ISO-2022.   Note again that we recommend that support for language tagging SHOULD   be built into new protocols, as this will become a critical component   of the automated indexing and retrieval in information applications   of the future.5.2:   Public identifiers   There is a considerable demand from the user community for the   ability to use non-ASCII characters in URL names, IMAP mailbox names,   file names, and other public identifiers. This is still an open   problem.5.3:   Bi-directionality   It was realized that a consistent framework for bi-directional text   was needed but there was no attempt to work on it in this workshop.6:  Security Considerations   There are no security considerations associated with character sets.7:  Conclusions   This paper provides a conceptual framework and a set of   recommendations which, if adopted, should provide a solid foundation   for interoperability on the Internet. There are, however, a number of   open issues which will need to be addressed to provide ever better   use of text on the Internet.Weider, et. al.              Informational                     [Page 16]RFC 2130             Character Set Workshop Report            April 19978:  Recommendations8.1:  To the IAB   There were a number of recommendations to the IAB about making the   standards process more aware of the need for character set   interoperability, and about the framework itself.   A: The IAB should trigger the examination of all RFCs to determine   the way  they handle character sets, and obsolete or annotate the   RFCs where necessary.   B: The IESG should trigger the recommendation of procedures to the   RFC editor  to encourage RFCs to specify character set handling if   they specify the  transmission of text.   C: The IAB should trigger the production of a perspectives document   on the  character set work that has gone on in the past and relate it   to the current framework.   D: Full ISO 10646 has a sufficiently broad repertoire, and scope for   further extension, that it is sufficient for use in Internet   Protocols (without excluding the use of existing alternatives).   There is no need for specific development of character set standards   for the Internet.   E: The IAB should encourage the IRTF to create a research group to   explore the open issues of character sets on the Internet. This group   should set its sights much higher than this workshop did.   F: The IANA (perhaps with the help of an IETF or IRTF group) should   develop  procedures for the registration of new character sets for   use in the Internet.   G: Register UTF-8 as a Character Encoding Scheme for MIME.   H: The current use of the "x-*" format for distinguishing   experimental tags should be continued for private use among   consenting parties. All other namespaces should be allocated by IANA.   I: Application protocol RFCs SHOULD include a section on   "multilingual Considerations".   J: Application Protocol RFCs SHOULD indicate how to transfer 'on the   wire' all characters in the character sets they use. They SHOULD also   specify how to transfer other information that applications may need   to know about the data.Weider, et. al.              Informational                     [Page 17]RFC 2130             Character Set Workshop Report            April 1997   K: The IESG should trigger a set of extensions to RFC 1522 to allow   language tagging of the free text parts of message headers.8.2:  For new Internet protocols   New protocols do not suffer from the need to be compatible with old   7-bit pipes.  New protocol specifications SHOULD use ISO 10646 as the   base charset unless there is an overriding need to use a different   base character set.   New protocols SHOULD use values from the IANA registries when   referring to parameter values.  The way these values are carried in   the protocols is protocol dependent; if the protocol uses RFC-822-   like headers, the header names already in use SHOULD be used.   For protocols with only a single choice for each component, the   protocol  should use the most general specification and should be   specified with reference to the registered value in the protocol   standard.   Protocols SHOULD tag text streams with the language of the text.8.3:  For the registration of new character sets   Ned Freed will be releasing a new MIME registration document in   conjunction with this paper.8.3.1:   A definition table for a coded character set   A definition table for a coded character set A must for each   character C that is in the repertoire of A give:   a) if C is present in ISO 10646, the code value (in hexadecimal form)        for that character.   b) If C is not present in ISO 10646, but may be constructed using ISO        10646 combining characters, the series of code values (in        hexadecimal form) used to construct that character.   c) if C is not present in ISO 10646, a textual description of the        character,  and a reference to its origin.Weider, et. al.              Informational                     [Page 18]RFC 2130             Character Set Workshop Report            April 19978.3.2:   A definition of a character encoding scheme   A definition of a character encoding scheme consists of:   -  A description of an algorithm which transforms every possible        sequence of octets to either a sequence of pairs <CCS, code        value> or to the  error state "illegal octet sequence"   -  Specifications, either by reference to CCS's registered by IANA or      in text, of each CCS upon which this CES is based.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -