📄 rfc3490.txt
字号:
Network Working Group P. FaltstromRequest for Comments: 3490 CiscoCategory: Standards Track P. Hoffman IMC & VPNC A. Costello UC Berkeley March 2003 Internationalizing Domain Names in Applications (IDNA)Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved.Abstract Until now, there has been no standard method for domain names to use characters outside the ASCII repertoire. This document defines internationalized domain names (IDNs) and a mechanism called Internationalizing Domain Names in Applications (IDNA) for handling them in a standard fashion. IDNs use characters drawn from a large repertoire (Unicode), but IDNA allows the non-ASCII characters to be represented using only the ASCII characters already allowed in so- called host names today. This backward-compatible representation is required in existing protocols like DNS, so that IDNs can be introduced with no changes to the existing infrastructure. IDNA is only meant for processing domain names, not free text.Table of Contents 1. Introduction.................................................. 2 1.1 Problem Statement......................................... 3 1.2 Limitations of IDNA....................................... 3 1.3 Brief overview for application developers................. 4 2. Terminology................................................... 5 3. Requirements and applicability................................ 7 3.1 Requirements.............................................. 7 3.2 Applicability............................................. 8 3.2.1. DNS resource records................................ 8Faltstrom, et al. Standards Track [Page 1]RFC 3490 IDNA March 2003 3.2.2. Non-domain-name data types stored in domain names... 9 4. Conversion operations......................................... 9 4.1 ToASCII................................................... 10 4.2 ToUnicode................................................. 11 5. ACE prefix.................................................... 12 6. Implications for typical applications using DNS............... 13 6.1 Entry and display in applications......................... 14 6.2 Applications and resolver libraries....................... 15 6.3 DNS servers............................................... 15 6.4 Avoiding exposing users to the raw ACE encoding........... 16 6.5 DNSSEC authentication of IDN domain names................ 16 7. Name server considerations.................................... 17 8. Root server considerations.................................... 17 9. References.................................................... 18 9.1 Normative References...................................... 18 9.2 Informative References.................................... 18 10. Security Considerations...................................... 19 11. IANA Considerations.......................................... 20 12. Authors' Addresses........................................... 21 13. Full Copyright Statement..................................... 221. Introduction IDNA works by allowing applications to use certain ASCII name labels (beginning with a special prefix) to represent non-ASCII name labels. Lower-layer protocols need not be aware of this; therefore IDNA does not depend on changes to any infrastructure. In particular, IDNA does not depend on any changes to DNS servers, resolvers, or protocol elements, because the ASCII name service provided by the existing DNS is entirely sufficient for IDNA. This document does not require any applications to conform to IDNA, but applications can elect to use IDNA in order to support IDN while maintaining interoperability with existing infrastructure. If an application wants to use non-ASCII characters in domain names, IDNA is the only currently-defined option. Adding IDNA support to an existing application entails changes to the application only, and leaves room for flexibility in the user interface. A great deal of the discussion of IDN solutions has focused on transition issues and how IDN will work in a world where not all of the components have been updated. Proposals that were not chosen by the IDN Working Group would depend on user applications, resolvers, and DNS servers being updated in order for a user to use an internationalized domain name. Rather than rely on widespread updating of all components, IDNA depends on updates to user applications only; no changes are needed to the DNS protocol or any DNS servers or the resolvers on user's computers.Faltstrom, et al. Standards Track [Page 2]RFC 3490 IDNA March 20031.1 Problem Statement The IDNA specification solves the problem of extending the repertoire of characters that can be used in domain names to include the Unicode repertoire (with some restrictions). IDNA does not extend the service offered by DNS to the applications. Instead, the applications (and, by implication, the users) continue to see an exact-match lookup service. Either there is a single exactly-matching name or there is no match. This model has served the existing applications well, but it requires, with or without internationalized domain names, that users know the exact spelling of the domain names that the users type into applications such as web browsers and mail user agents. The introduction of the larger repertoire of characters potentially makes the set of misspellings larger, especially given that in some cases the same appearance, for example on a business card, might visually match several Unicode code points or several sequences of code points. IDNA allows the graceful introduction of IDNs not only by avoiding upgrades to existing infrastructure (such as DNS servers and mail transport agents), but also by allowing some rudimentary use of IDNs in applications by using the ASCII representation of the non-ASCII name labels. While such names are very user-unfriendly to read and type, and hence are not suitable for user input, they allow (for instance) replying to email and clicking on URLs even though the domain name displayed is incomprehensible to the user. In order to allow user-friendly input and output of the IDNs, the applications need to be modified to conform to this specification. IDNA uses the Unicode character repertoire, which avoids the significant delays that would be inherent in waiting for a different and specific character set be defined for IDN purposes by some other standards developing organization.1.2 Limitations of IDNA The IDNA protocol does not solve all linguistic issues with users inputting names in different scripts. Many important language-based and script-based mappings are not covered in IDNA and need to be handled outside the protocol. For example, names that are entered in a mix of traditional and simplified Chinese characters will not be mapped to a single canonical name. Another example is Scandinavian names that are entered with U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH STROKE).Faltstrom, et al. Standards Track [Page 3]RFC 3490 IDNA March 2003 An example of an important issue that is not considered in detail in IDNA is how to provide a high probability that a user who is entering a domain name based on visual information (such as from a business card or billboard) or aural information (such as from a telephone or radio) would correctly enter the IDN. Similar issues exist for ASCII domain names, for example the possible visual confusion between the letter 'O' and the digit zero, but the introduction of the larger repertoire of characters creates more opportunities of similar looking and similar sounding names. Note that this is a complex issue relating to languages, input methods on computers, and so on. Furthermore, the kind of matching and searching necessary for a high probability of success would not fit the role of the DNS and its exact matching function.1.3 Brief overview for application developers Applications can use IDNA to support internationalized domain names anywhere that ASCII domain names are already supported, including DNS master files and resolver interfaces. (Applications can also define protocols and interfaces that support IDNs directly using non-ASCII representations. IDNA does not prescribe any particular representation for new protocols, but it still defines which names are valid and how they are compared.) The IDNA protocol is contained completely within applications. It is not a client-server or peer-to-peer protocol: everything is done inside the application itself. When used with a DNS resolver library, IDNA is inserted as a "shim" between the application and the resolver library. When used for writing names into a DNS zone, IDNA is used just before the name is committed to the zone. There are two operations described in section 4 of this document: - The ToASCII operation is used before sending an IDN to something that expects ASCII names (such as a resolver) or writing an IDN into a place that expects ASCII names (such as a DNS master file). - The ToUnicode operation is used when displaying names to users, for example names obtained from a DNS zone. It is important to note that the ToASCII operation can fail. If it fails when processing a domain name, that domain name cannot be used as an internationalized domain name and the application has to have some method of dealing with this failure. IDNA requires that implementations process input strings with Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP], and then with Punycode [PUNYCODE]. Implementations of IDNA MUSTFaltstrom, et al. Standards Track [Page 4]RFC 3490 IDNA March 2003 fully implement Nameprep and Punycode; neither Nameprep nor Punycode are optional.2. Terminology The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in BCP 14, RFC 2119 [RFC2119]. A code point is an integer value associated with a character in a coded character set. Unicode [UNICODE] is a coded character set containing tens of thousands of characters. A single Unicode code point is denoted by "U+" followed by four to six hexadecimal digits, while a range of Unicode code points is denoted by two hexadecimal numbers separated by "..", with no prefixes. ASCII means US-ASCII [USASCII], a coded character set containing 128 characters associated with code points in the range 0..7F. Unicode is an extension of ASCII: it includes all the ASCII characters and associates them with the same code points. The term "LDH code points" is defined in this document to mean the code points associated with ASCII letters, digits, and the hyphen- minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an abbreviation for "letters, digits, hyphen". [STD13] talks about "domain names" and "host names", but many people use the terms interchangeably. Further, because [STD13] was not terribly clear, many people who are sure they know the exact definitions of each of these terms disagree on the definitions. In this document the term "domain name" is used in general. This document explicitly cites [STD3] whenever referring to the host name syntax restrictions defined therein. A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name "www.example.com" is composed of three labels: "www", "example", and "com". (The zero-length root label described in [STD13], which can be explicit as in "www.example.com." or implicit as in "www.example.com", is not considered a label in this specification.) IDNA extends the set of usable characters in labels that are text. For the rest of this document, the term "label" is shorthand for "text label", and "every label" means "every text label".Faltstrom, et al. Standards Track [Page 5]RFC 3490 IDNA March 2003 An "internationalized label" is a label to which the ToASCII operation (see section 4) can be applied without failing (with the UseSTD3ASCIIRules flag unset). This implies that every ASCII label that satisfies the [STD13] length restriction is an internationalized label. Therefore the term "internationalized label" is a generalization, embracing both old ASCII labels and new non-ASCII labels. Although most Unicode characters can appear in internationalized labels, ToASCII will fail for some input strings, and such strings are not valid internationalized labels. An "internationalized domain name" (IDN) is a domain name in which every label is an internationalized label. This implies that every ASCII domain name is an IDN (which implies that it is possible for a name to be an IDN without it containing any non-ASCII characters). This document does not attempt to define an "internationalized host name". Just as has been the case with ASCII names, some DNS zone administrators may impose restrictions, beyond those imposed by DNS or IDNA, on the characters or strings that may be registered as labels in their zones. Such restrictions have no impact on the syntax or semantics of DNS protocol messages; a query for a name that matches no records will yield the same response regardless of the reason why it is not in the zone. Clients issuing queries or interpreting responses cannot be assumed to have any knowledge of
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -