📄 rfc2482.txt

📁 著名的RFC文档,其中有一些文档是已经翻译成中文的的.
💻 TXT
📖 第 1 页 / 共 2 页
字号:
上一页 12
RFC 2482         Language Tagging in Unicode Plain Text     January 1999      Meaning:   The first character of the string matched by this                 non-terminal must be '?'   2. A number of predicate functions are employed in semantic      constraint rules which are not otherwise defined; their name is      sufficient for determining their predication.      Example:   IsRFC1766LanguageIdentifier ( tag-argument )      Meaning:   tag-argument is a valid RFC1766 language identifier   3. A lexical expander function, TAG, is employed to denote the tag      form of an ASCII character; the argument to this function is      either a character or a character set specified by a range or      enumeration expression.      Example:   TAG('-')      Meaning:   TAG HYPHEN-MINUS      Example:   TAG([A-Z])      Meaning:   TAG LATIN CAPITAL LETTER A ...                 TAG LATIN CAPITAL LETTER Z   4. A macro is employed to denote terminal symbols that are character      literals which can't be directly represented in ASCII. The      argument to the macro is the UNICODE (ISO/IEC 10646) character      name.      Example:   '${TAG CANCEL}'      Meaning:   character literal whose code value is U-000E007F   5. Occurrence indicators used are '+' (one or more) and '*' (zero or      more); optional occurrence is indicated by enclosure in '[' and      ']'.4.6.1 Formal Tag Syntaxtag                     :   language-tag                        |   cancel-all-tag                        ;language-tag            :   language-tag-introducer language-tag-argument                        ;Whistler & Adams             Informational                      [Page 8]RFC 2482         Language Tagging in Unicode Plain Text     January 1999language-tag-argument   :   tag-argument              {{ Assert ( IsRFC1766LanguageIdentifier ( $$ ); }}                        |   tag-cancel                        ;cancel-all-tag          :   tag-cancel                        ;tag-argument            :   tag-character+                        ;tag-character           :   { c : c in              TAG( { a : a in printable ASCII characters or SPACE } ) }                        ;language-tag-introducer :   '${TAG LANGUAGE}'                        ;tag-cancel              :   '${TAG CANCEL}'                        ;5.0 Tag Types5.1 Language Tags   Language tags are of general interest and should have a high degree   of interoperability for protocol usage. To this end, a specific   LANGUAGE TAG tag identification character is provided.  A Plane 14   tag string prefixed by U-000E0001 LANGUAGE TAG is specified to   constitute a language tag. Furthermore, the tag values for the   language tag are to be spelled out as specified in RFC 1766, making   use only of registered tag values or of user-defined language tags   starting with the characters "x-".   For example, to embed a language tag for Japanese, the Plane 14   characters would be used as follows. The Japanese tag from RFC 1766   is "ja" (composed of ISO 639 language id) or, alternatively, "ja-JP"   (composed of ISO 639 language id plus ISO 3166 country id).  Since   RFC 1766 specifies that language tags are not case significant, it is   recommended that for language tags, the entire tag be lowercased   before conversion to Plane 14 tag characters. (This would not be   required for Unicode conformance, but should be followed as general   practice by protocols making use of RFC 1766 language tags, to   simplify and speed up the processing for operations which need to   identify or ignore language tags embedded in text.) Lowercasing,Whistler & Adams             Informational                      [Page 9]RFC 2482         Language Tagging in Unicode Plain Text     January 1999   rather than uppercasing, is recommended because it follows the   majority practice of expressing language tag values in lowercase   letters.   Thus the entire language tag (in its longer form) would be converted   to Plane 14 tag characters as follows:   U-000E0001 U-000E006A U-000E0061 U-000E002D U-000E006A U-000E0070   The language tag (in its shorter, "ja" form) could be expressed as   follows:   U-000E0001 U-000E006A U-000E0061   The value of this string is then expressed in whichever encoding form   (UCS-4, UTF-16, UTF-8) is required and embedded in text at the   relevant point.5.2 Additional Tags   Additional tag identification characters might be defined in the   future. An example would be a CHARACTER SET SOURCE TAG, or a GENERIC   TAG for private definition of tags.   In each case, when a specific tag identification character is   encoded, a corresponding reference standard for the values of the   tags associated with the identifier should be designated, so that   interoperating parties which make use of the tags will know how to   interpret the values the tags may take.6.0 Display Issues   All characters in the tag character block are considered to have no   visible rendering in normal text. A process which interprets tags may   choose to modify the rendering of text based on the tag values (as   for example, changing font to preferred style for rendering Chinese   versus Japanese). The tag characters themselves have no display; they   may be considered similar to a U+200B ZERO WIDTH SPACE in that   regard. The tag characters also do not affect breaking, joining, or   any other format or layout properties, except insofar as the process   interpreting the tag chooses to impose such behavior based on the tag   value.   For debugging or other operations which must render the tags   themselves visible, it is advisable that the tag characters be   rendered using the corresponding ASCII character glyphs (perhaps   modified systematically to differentiate them from normal ASCIIWhistler & Adams             Informational                     [Page 10]RFC 2482         Language Tagging in Unicode Plain Text     January 1999   characters). But, as noted below, the tag character values are chosen   so that even without display support, the tag characters will be   interpretable in most debuggers.7.0 Unicode Conformance Issues   The basic rules for Unicode conformance for the tag characters are   exactly the same as for any other Unicode characters. A conformant   process is not required to interpret the tag characters. If it does   not interpret tag characters, it should leave their values   undisturbed and do whatever it does with any other uninterpreted   characters. If it does interpret them, it should interpret them   according to the standard, i.e. as spelled-out tags.   So for a non-TagAware Unicode application, any language tag   characters (or any other kind of tag expressed with Plane 14 tag   characters) encountered would be handled exactly as for uninterpreted   Tibetan from the BMP, uninterpreted Linear B from Plane 1, or   uninterpreted Egyptian hieroglyphics from private use space in Plane   15.   A TagAware but TagPhobic Unicode application can recognize the tag   character range in Plane 14 and choose to deliberately strip them out   completely to produce plain text with no tags.   The presence of a correctly formed tag cannot be taken as a guarantee   that the data so tagged is correctly tagged. For example, nothing   prevents an application from erroneously labelling French data as   Spanish, or from labelling JIS-derived data as Japanese, even if it   contains Greek or Cyrillic characters.7.1 Note on Encoding Language Tags   The fact that this proposal for encoding tag characters in Unicode   includes a mechanism for specifying language tag values does not mean   that Unicode is departing from one of its basic encoding principles:       Unicode encodes scripts, not languages.   This is still true of the Unicode encoding (and ISO/IEC 10646), even   in the presence of a mechanism for specifying language tags in plain   text.  There is nothing obligatory about the use of Plane 14 tags,   whether for language tags or any other kind of tags.   Language tagging in no way impacts current encoded characters or the   encoding of future scripts.Whistler & Adams             Informational                     [Page 11]RFC 2482         Language Tagging in Unicode Plain Text     January 1999   It is fully anticipated that implementations of Unicode which already   make use of out-of-band mechanisms for language tagging or "heavy-   weight" in-band mechanisms such as HTML will continue to do exactly   what they are doing and will ignore Plane 14 tag characters   completely.8.0 Security Considerations   There are no known security issues raised by this document.References   [ISO10646] ISO/IEC 10646-1:1993 International Organization for              Standardization.  "Information Technology -- Universal              Multiple-Octet Coded Character Set (UCS) -- Part 1:              Architecture and Basic Multilingual Plane", Geneva, 1993.   [RFC1766]  Alvestrand, H., "Tags for the Identification of              Languages", RFC 1766, March 1995.   [RFC2070]  Yergeau, F., Nicol, G. Adams, G. and M. Duerst,              "Internationalization of the Hypertext Markup Language",              RFC 2070, January 1997.   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate              Requirement Levels", BCP 14, RFC 2119, March 1997.   [RFC2130]  Weider, C. Preston, C., Simonsen, K., Alvestrand, H.,              Atkinson, R., Crispin, M. and P. Svanberg, "The Report of              the IAB Character Set Workshop held 29 February - 1 March,              1996", RFC 2130, April 1997.   [UNICODE]  The Unicode Standard, Version 2.0, The Unicode Consortium,              Addison-Wesley, July 1996.Acknowledgements   The following people also contributed to this document, directly or   indirectly: Chris Newman, Mark Crispin, Rick McGowan, Joe Becker,   John Jenkins, and Asmus Freytag. This document also was reviewed by   the Unicode Technical Committee, and the authors wish to thank all of   the UTC representatives for their input. The authors are, of course,   responsible for any errors or omissions which may remain in the text.Whistler & Adams             Informational                     [Page 12]RFC 2482         Language Tagging in Unicode Plain Text     January 1999Authors' Addresses   Ken Whistler   Sybase, Inc.   6475 Christie Ave.   Emeryville, CA 94608-1050   Phone: +1 510 922 3611   EMail: kenw@sybase.com   Glenn Adams   Spyglass, Inc.   One Cambridge Center   Cambridge, MA 02142   Phone: +1 617 679 4652   EMail: glenn@spyglass.comWhistler & Adams             Informational                     [Page 13]RFC 2482         Language Tagging in Unicode Plain Text     January 1999Full Copyright Statement   Copyright (C) The Internet Society (1999).  All Rights Reserved.   This document and translations of it may be copied and furnished to   others, and derivative works that comment on or otherwise explain it   or assist in its implementation may be prepared, copied, published   and distributed, in whole or in part, without restriction of any   kind, provided that the above copyright notice and this paragraph are   included on all such copies and derivative works.  However, this   document itself may not be modified in any way, such as by removing   the copyright notice or references to the Internet Society or other   Internet organizations, except as needed for the purpose of   developing Internet standards in which case the procedures for   copyrights defined in the Internet Standards process must be   followed, or as required to translate it into languages other than   English.   The limited permissions granted above are perpetual and will not be   revoked by the Internet Society or its successors or assigns.   This document and the information contained herein is provided on an   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.Whistler & Adams             Informational                     [Page 14]
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -