📄 rfc3066.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页






Network Working Group                                      H. Alvestrand
Request for Comments: 3066                                 Cisco Systems
BCP: 47                                                     January 2001
Obsoletes: 1766
Category: Best Current Practice


                Tags for the Identification of Languages

Status of this Memo

   This document specifies an Internet Best Current Practices for the
   Internet Community, and requests discussion and suggestions for
   improvements.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

Abstract

   This document describes a language tag for use in cases where it is
   desired to indicate the language used in an information object, how
   to register values for use in this language tag, and a construct for
   matching such language tags.

1. Introduction

   Human beings on our planet have, past and present, used a number of
   languages.  There are many reasons why one would want to identify the
   language used when presenting information.

   In some contexts, it is possible to have information available in
   more than one language, or it might be possible to provide tools
   (such as dictionaries) to assist in the understanding of a language.

   Also, many types of information processing require knowledge of the
   language in which information is expressed in order for that process
   to be performed on the information; for example spell-checking,
   computer-synthesized speech, Braille, or high-quality print
   renderings.

   One means of indicating the language used is by labeling the
   information content with an identifier for the language that is used
   in this information content.






Alvestrand               Best Current Practice                  [Page 1]

RFC 3066          Tags for Identification of Languages      January 2001


   This document specifies an identifier mechanism, a registration
   function for values to be used with that identifier mechanism, and a
   construct for matching against those values.

   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC 2119].

2. The Language tag

2.1 Language tag syntax

   The language tag is composed of one or more parts: A primary language
   subtag and a (possibly empty) series of subsequent subtags.

   The syntax of this tag in ABNF [RFC 2234] is:

    Language-Tag = Primary-subtag *( "-" Subtag )

    Primary-subtag = 1*8ALPHA

    Subtag = 1*8(ALPHA / DIGIT)

   The productions ALPHA and DIGIT are imported from RFC 2234; they
   denote respectively the characters A to Z in upper or lower case and
   the digits from 0 to 9.  The character "-" is HYPHEN-MINUS (ABNF:
   %x2D).

   All tags are to be treated as case insensitive; there exist
   conventions for capitalization of some of them, but these should not
   be taken to carry meaning.  For instance, [ISO 3166] recommends that
   country codes are capitalized (MN Mongolia), while [ISO 639]
   recommends that language codes are written in lower case (mn
   Mongolian).

2.2 Language tag sources

   The namespace of language tags is administered by the Internet
   Assigned Numbers Authority (IANA) [RFC 2860] according to the rules
   in section 3 of this document.

   The following rules apply to the primary subtag:

   - All 2-letter subtags are interpreted according to assignments found
     in ISO standard 639, "Code for the representation of names of
     languages" [ISO 639], or assignments subsequently made by the ISO
     639 part 1 maintenance agency or governing standardization bodies.
     (Note: A revision is underway, and is expected to be released as



Alvestrand               Best Current Practice                  [Page 2]

RFC 3066          Tags for Identification of Languages      January 2001


     ISO 639-1:2000)

   - All 3-letter subtags are interpreted according to assignments found
     in ISO 639 part 2, "Codes for the representation of names of
     languages -- Part 2: Alpha-3 code [ISO 639-2]", or assignments
     subsequently made by the ISO 639 part 2 maintenance agency or
     governing standardization bodies.

   - The value "i" is reserved for IANA-defined registrations

   - The value "x" is reserved for private use.  Subtags of "x" shall
     not be registered by the IANA.

   - Other values shall not be assigned except by revision of this
     standard.

   The reason for reserving all other tags is to be open towards new
   revisions of ISO 639; the use of "i" and "x" is the minimum we can do
   here to be able to extend the mechanism to meet our immediate
   requirements.

   The following rules apply to the second subtag:

   - All 2-letter subtags are interpreted as ISO 3166 alpha-2 country
     codes from [ISO 3166], or subsequently assigned by the ISO 3166
     maintenance agency or governing standardization bodies, denoting
     the area to which this language variant relates.

   - Tags with second subtags of 3 to 8 letters may be registered with
     IANA, according to the rules in chapter 5 of this document.

   - Tags with 1-letter second subtags may not be assigned except after
     revision of this standard.

   There are no rules apart from the syntactic ones for the third and
   subsequent subtags.

   Tags constructed wholly from the codes that are assigned
   interpretations by this chapter do not need to be registered with
   IANA before use.

   The information in a subtag may for instance be:

   - Country identification, such as en-US (this usage is described in
     ISO 639)

   - Dialect or variant information, such as en-scouse




Alvestrand               Best Current Practice                  [Page 3]

RFC 3066          Tags for Identification of Languages      January 2001


   - Languages not listed in ISO 639 that are not variants of any listed
     language, which can be registered with the i-prefix, such as i-
     tsolyani

   - Region identification, such as sgn-US-MA (Martha's Vineyard Sign
     Language, which is found in the state of Massachusetts, US)

   This document leaves the decision on what tags are appropriate or not
   to the registration process described in section 3.

   ISO 639 defines a maintenance agency for additions to and changes in
   the list of languages in ISO 639.  This agency is:

        International Information Centre for Terminology (Infoterm)
        P.O. Box 130
        A-1021 Wien
        Austria

        Phone: +43 1 26 75 35 Ext. 312
        Fax:   +43 1 216 32 72

   ISO 639-2 defines a maintenance agency for additions to and changes
   in the list of languages in ISO 639-2.  This agency is:

        Library of Congress
        Network Development and MARC Standards Office
        Washington, D.C. 20540
        USA

        Phone: +1 202 707 6237
        Fax:   +1 202 707 0115
        URL: http://www.loc.gov/standards/iso639

   The maintenance agency for ISO 3166 (country codes) is:

        ISO 3166 Maintenance Agency Secretariat
        c/o DIN Deutsches Institut fuer Normung
        Burggrafenstrasse 6
        Postfach 1107
        D-10787 Berlin
        Germany

        Phone: +49 30 26 01 320
        Fax:   +49 30 26 01 231
        URL: http://www.din.de/gremien/nas/nabd/iso3166ma/

   ISO 3166 reserves the country codes AA, QM-QZ, XA-XZ and ZZ as user-
   assigned codes.  These MUST NOT be used to form language tags.



Alvestrand               Best Current Practice                  [Page 4]

RFC 3066          Tags for Identification of Languages      January 2001


2.3 Choice of language tag

   One may occasionally be faced with several possible tags for the same
   body of text.

   Interoperability is best served if all users send the same tag, and
   use the same tag for the same language for all documents.  If an
   application has requirements that make the rules here inapplicable,
   the application protocol specification MUST specify how the procedure
   varies from the one given here.

   The text below is based on the set of tags known to the tagging
   entity.

   1. Use the most precise tagging known to the sender that can be
      ascertained and is useful within the application context.

   2. When a language has both an ISO 639-1 2-character code and an ISO
      639-2 3-character code, you MUST use the tag derived from the ISO
      639-1 2-character code.

   3. When a language has no ISO 639-1 2-character code, and the ISO
      639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic)
      code differ, you MUST use the Terminology code.  NOTE: At present,
      all languages for which there is a difference have 2-character
      codes, and the displeasure of developers about the existence of 2
      code sets has been adequately communicated to ISO.  So this
      situation will hopefully not arise.

   4. When a language has both an IANA-registered tag (i-something) and
      a tag derived from an ISO registered code, you MUST use the ISO
      tag.  NOTE: When such a situation is discovered, the IANA-
      registered tag SHOULD be deprecated as soon as possible.

   5. You SHOULD NOT use the UND (Undetermined) code unless the protocol
      in use forces you to give a value for the language tag, even if
      the language is unknown.  Omitting the tag is preferred.

   6. You SHOULD NOT use the MUL (Multiple) tag if the protocol allows
      you to use multiple languages, as is the case for the Content-
      Language:  header.

   NOTE: In order to avoid versioning difficulties in applications such
   as that of RFC 1766, the ISO 639 Registration Authority Joint
   Advisory Committee (RA-JAC) has agreed on the following policy
   statement:





Alvestrand               Best Current Practice                  [Page 5]

RFC 3066          Tags for Identification of Languages      January 2001


     "After the publication of ISO/DIS 639-1 as an International
     Standard, no new 2-letter code shall be added to ISO 639-1 unless a
     3-letter code is also added at the same time to ISO 639-2.  In
     addition, no language with a 3-letter code available at the time of
     publication of ISO 639-1 which at that time had no 2-letter code
     shall be subsequently given a 2-letter code."

   This will ensure that, for example, a user who implements "hwi"
   (Hawaiian), which currently has no 2-letter code, will not find his
   or her data invalidated by eventual addition of a 2-letter code for
   that language."

2.4 Meaning of the language tag

   The language tag always defines a language as spoken (or written,
   signed or otherwise signaled) by human beings for communication of
   information to other human beings.  Computer languages such as
   programming languages are explicitly excluded.  There is no
   guaranteed relationship between languages whose tags begin with the
   same series of subtags; specifically, they are NOT guaranteed to be
   mutually intelligible, although it will sometimes be the case that
   they are.

   The relationship between the tag and the information it relates to is
   defined by the standard describing the context in which it appears.
   Accordingly, this section can only give possible examples of its
   usage.

   - For a single information object, it could be taken as the set of
     languages that is required for a complete comprehension of the
     complete object.
     Example: Plain text documents.

   - For an aggregation of information objects, it should be taken as
     the set of languages used inside components of that aggregation.
     Examples: Document stores and libraries.

   - For information objects whose purpose is to provide alternatives,
     the set of tags associated with it should be regarded as a hint
     that the content is provided in several languages, and that one has
     to inspect each of the alternatives in order to find its language
     or languages.  In this case, a tag with multiple languages does not
     mean that one needs to be multi-lingual to get complete
     understanding of the document.
     Example: MIME multipart/alternative.






Alvestrand               Best Current Practice                  [Page 6]

RFC 3066          Tags for Identification of Languages      January 2001


   - In markup languages, such as HTML and XML, language information can
     be added to each part of the document identified by the markup
     structure (including the whole document itself).  For example, one
     could write <span lang="FR">C'est la vie.</span> inside a Norwegian
     document; the Norwegian-speaking user could then access a French-
     Norwegian dictionary to find out what the marked section meant.  If
     the user were listening to that document through a speech synthesis
     interface, this formation could be used to signal the synthesizer
     to appropriately apply French text-to-speech pronunciation rules to
     that span of text, instead of misapplying the Norwegian rules.

2.5 Language-range

   Since the publication of RFC 1766, it has become apparent that there
   is a need to define a term for a set of languages whose tags all
   begin with the same sequence of subtags.

   The following definition of language-range is derived from HTTP/1.1
   [RFC 2616].

             language-range  = language-tag / "*"

   That is, a language-range has the same syntax as a language-tag, or
   is the single character "*".
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -