📄 rfc3467.txt
字号:
into the DNS to calling a directory service and then the DNS (in many situations, both actions could be accomplished in a single API call). A directory approach can be consistent both with "flat" models and multi-attribute ones. The DNS requires strict hierarchies, limiting its ability to differentiate among names by their properties. By contrast, modern directories can utilize independently-searched attributes and other structured schema to provide flexibilities not present in a strictly hierarchical system. There is a strong historical argument for a single directory structure (implying a need for mechanisms for registration, delegation, etc.). But a single structure is not a strict requirement, especially if in-depth case analysis and design work leads to the conclusion that reverse-mapping to directory names is not a requirement (see section 5). If a single structure is not needed, then, unlike the DNS, there would be no requirement for a global organization to authorize or delegate operation of portions of the structure.Klensin Informational [Page 14]RFC 3467 Role of the Domain Name System (DNS) February 2003 The "no single structure" concept could be taken further by moving away from simple "names" in favor of, e.g., multiattribute, multihierarchical, faceted systems in which most of the facets use restricted vocabularies. (These terms are fairly standard in the information retrieval and classification system literature, see, e.g., [IS5127].) Such systems could be designed to avoid the need for procedures to ensure uniqueness across, or even within, providers and databases of the faceted entities for which the search is to be performed. (See [DNS-Search] for further discussion.) While the discussion above includes very general comments about attributes, it appears that only a very small number of attributes would be needed. The list would almost certainly include country and language for internationalization purposes. It might require "charset" if we cannot agree on a character set and encoding, although there are strong arguments for simply using ISO 10646 (also known as Unicode or "UCS" (for Universal Character Set) [UNICODE], [IS10646] coding in interchange. Trademark issues might motivate "commercial" and "non-commercial" (or other) attributes if they would be helpful in bypassing trademark problems. And applications to resource location, such as those contemplated for Uniform Resource Identifiers (URIs) [RFC2396, RFC3305] or the Service Location Protocol [RFC2608], might argue for a few other attributes (as outlined above).4. Internationalization Much of the thinking underlying this document was driven by considerations of internationalizing the DNS or, more specifically, providing access to the functions of the DNS from languages and naming systems that cannot be accurately expressed in the traditional DNS subset of ASCII. Much of the relevant work was done in the IETF's "Internationalized Domain Names" Working Group (IDN-WG), although this document also draws on extensive parallel discussions in other forums. This section contains an evaluation of what was learned as an "internationalized DNS" or "multilingual DNS" was explored and suggests future steps based on that evaluation. When the IDN-WG was initiated, it was obvious to several of the participants that its first important task was an undocumented one: to increase the understanding of the complexities of the problem sufficiently that naive solutions could be rejected and people could go to work on the harder problems. The IDN-WG clearly accomplished that task. The beliefs that the problems were simple, and in the corresponding simplistic approaches and their promises of quick and painless deployment, effectively disappeared as the WG's efforts matured.Klensin Informational [Page 15]RFC 3467 Role of the Domain Name System (DNS) February 2003 Some of the lessons learned from increased understanding and the dissipation of naive beliefs should be taken as cautions by the wider community: the problems are not simple. Specifically, extracting small elements for solution rather than looking at whole systems, may result in obscuring the problems but not solving any problem that is worth the trouble.4.1 ASCII Isn't Just Because of English The hostname rules chosen in the mid-70s weren't just "ASCII because English uses ASCII", although that was a starting point. We have discovered that almost every other script (and even ASCII if we permit the rest of the characters specified in the ISO 646 International Reference Version) is more complex than hostname- restricted-ASCII (the "LDH" form, see section 1.1). And ASCII isn't sufficient to completely represent English -- there are several words in the language that are correctly spelled only with characters or diacritical marks that do not appear in ASCII. With a broader selection of scripts, in some examples, case mapping works from one case to the other but is not reversible. In others, there are conventions about alternate ways to represent characters (in the language, not [only] in character coding) that work most of the time, but not always. And there are issues in coding, with Unicode/10646 providing different ways to represent the same character ("character", rather than "glyph", is used deliberately here). And, in still others, there are questions as to whether two glyphs "match", which may be a distance-function question, not one with a binary answer. The IETF approach to these problems is to require pre-matching canonicalization (see the "stringprep" discussion below). The IETF has resisted the temptations to either try to specify an entirely new coded character set, or to pick and choose Unicode/10646 characters on a per-character basis rather than by using well-defined blocks. While it may appear that a character set designed to meet Internet-specific needs would be very attractive, the IETF has never had the expertise, resources, and representation from critically- important communities to actually take on that job. Perhaps more important, a new effort might have chosen to make some of the many complex tradeoffs differently than the Unicode committee did, producing a code with somewhat different characteristics. But there is no evidence that doing so would produce a code with fewer problems and side-effects. It is much more likely that making tradeoffs differently would simply result in a different set of problems, which would be equally or more difficult.Klensin Informational [Page 16]RFC 3467 Role of the Domain Name System (DNS) February 20034.2 The "ASCII Encoding" Approaches While the DNS can handle arbitrary binary strings without known internal problems (see [RFC2181]), some restrictions are imposed by the requirement that text be interpreted in a case-independent way ([RFC1034], [RFC1035]). More important, most internet applications assume the hostname-restricted "LDH" syntax that is specified in the host table RFCs and as "prudent" in RFC 1035. If those assumptions are not met, many conforming implementations of those applications may exhibit behavior that would surprise implementors and users. To avoid these potential problems, IETF internationalization work has focused on "ASCII-Compatible Encodings" (ACE). These encodings preserve the LDH conventions in the DNS itself. Implementations of applications that have not been upgraded utilize the encoded forms, while newer ones can be written to recognize the special codings and map them into non-ASCII characters. These approaches are, however, not problem-free even if human interface issues are ignored. Among other issues, they rely on what is ultimately a heuristic to determine whether a DNS label is to be considered as an internationalized name (i.e., encoded Unicode) or interpreted as an actual LDH name in its own right. And, while all determinations of whether a particular query matches a stored object are traditionally made by DNS servers, the ACE systems, when combined with the complexities of international scripts and names, require that much of the matching work be separated into a separate, client-side, canonicalization or "preparation" process before the DNS matching mechanisms are invoked [STRINGPREP].4.3 "Stringprep" and Its Complexities As outlined above, the model for avoiding problems associated with putting non-ASCII names in the DNS and elsewhere evolved into the principle that strings are to be placed into the DNS only after being passed through a string preparation function that eliminates or rejects spurious character codes, maps some characters onto others, performs some sequence canonicalization, and generally creates forms that can be accurately compared. The impact of this process on hostname-restricted ASCII (i.e., "LDH") strings is trivial and essentially adds only overhead. For other scripts, the impact is, of necessity, quite significant. Although the general notion underlying stringprep is simple, the many details are quite subtle and the associated tradeoffs are complex. A design team worked on it for months, with considerable effort placed into clarifying and fine-tuning the protocol and tables. Despite general agreement that the IETF would avoid getting into the business of defining character sets, character codings, and the associated conventions, the group several times considered and rejected specialKlensin Informational [Page 17]RFC 3467 Role of the Domain Name System (DNS) February 2003 treatment of code positions to more nearly match the distinctions made by Unicode with user perceptions about similarities and differences between characters. But there were intense temptations (and pressures) to incorporate language-specific or country-specific rules. Those temptations, even when resisted, were indicative of parts of the ongoing controversy or of the basic unsuitability of the DNS for fully internationalized names that are visible, comprehensible, and predictable for end users. There have also been controversies about how far one should go in these processes of preparation and transformation and, ultimately, about the validity of various analogies. For example, each of the following operations has been claimed to be similar to case-mapping in ASCII: o stripping of vowels in Arabic or Hebrew o matching of "look-alike" characters such as upper-case Alpha in Greek and upper-case A in Roman-based alphabets o matching of Traditional and Simplified Chinese characters that represent the same words, o matching of Serbo-Croatian words whether written in Roman-derived or Cyrillic characters A decision to support any of these operations would have implications for other scripts or languages and would increase the overall complexity of the process. For example, unless language-specific information is somehow available, performing matching between Traditional and Simplified Chinese has impacts on Japanese and Korean uses of the same "traditional" characters (e.g., it would not be appropriate to map Kanji into Simplified Chinese). Even were the IDN-WG's other work to have been abandoned completely or if it were to fail in the marketplace, the stringprep and nameprep work will continue to be extremely useful, both in identifying issues and problem code points and in providing a reasonable set of basic rules. Where problems remain, they are arguably not with nameprep, but with the DNS-imposed requirement that its results, as with all other parts of the matching and comparison process, yield a binary "match or no match" answer, rather than, e.g., a value on a similarity scale that can be evaluated by the user or by user-driven heuristic functions.Klensin Informational [Page 18]RFC 3467 Role of the Domain Name System (DNS) February 20034.4 The Unicode Stability Problem
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -