⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 rfc3467.txt

📁 bind 9.3结合mysql数据库
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   ISO 10646 basically defines only code points, and not rules for using   or comparing the characters.  This is part of a long-standing   tradition with the work of what is now ISO/IEC JTC1/SC2: they have   performed code point assignments and have typically treated the ways   in which characters are used as beyond their scope.  Consequently,   they have not dealt effectively with the broader range of   internationalization issues.  By contrast, the Unicode Technical   Committee (UTC) has defined, in annexes and technical reports (see,   e.g., [UTR15]), some additional rules for canonicalization and   comparison.  Many of those rules and conventions have been factored   into the "stringprep" and "nameprep" work, but it is not   straightforward to make or define them in a fashion that is   sufficiently precise and permanent to be relied on by the DNS.   Perhaps more important, the discussions leading to nameprep also   identified several areas in which the UTC definitions are inadequate,   at least without additional information, to make matching precise and   unambiguous.  In some of these cases, the Unicode Standard permits   several alternate approaches, none of which are an exact and obvious   match to DNS needs.  That has left these sensitive choices up to   IETF, which lacks sufficient in-depth expertise, much less any   mechanism for deciding to optimize one language at the expense of   another.   For example, it is tempting to define some rules on the basis of   membership in particular scripts, or for punctuation characters, but   there is no precise definition of what characters belong to which   script or which ones are, or are not, punctuation.  The existence of   these areas of vagueness raises two issues: whether trying to do   precise matching at the character set level is actually possible   (addressed below) and whether driving toward more precision could   create issues that cause instability in the implementation and   resolution models for the DNS.   The Unicode definition also evolves.  Version 3.2 appeared shortly   after work on this document was initiated.  It added some characters   and functionality and included a few minor incompatible code point   changes.  IETF has secured an agreement about constraints on future   changes, but it remains to be seen how that agreement will work out   in practice.  The prognosis actually appears poor at this stage,   since UTC chose to ballot a recent possible change which should have   been prohibited by the agreement (the outcome of the ballot is not   relevant, only that the ballot was issued rather than having the   result be a foregone conclusion).  However, some members of the   community consider some of the changes between Unicode 3.0 and 3.1   and between 3.1 and 3.2, as well as this recent ballot, to beKlensin                      Informational                     [Page 19]RFC 3467          Role of the Domain Name System (DNS)     February 2003   evidence of instability and that these instabilities are better   handled in a system that can be more flexible about handling of   characters, scripts, and ancillary information than the DNS.   In addition, because the systems implications of internationalization   are considered out of scope in SC2, ISO/IEC JTC1 has assigned some of   those issues to its SC22/WG20 (the Internationalization working group   within the subcommittee that deals with programming languages,   systems, and environments).  WG20 has historically dealt with   internationalization issues thoughtfully and in depth, but its status   has several times been in doubt in recent years.  However, assignment   of these matters to WG20 increases the risk of eventual ISO   internationalization standards that specify different behavior than   the UTC specifications.4.5 Audiences, End Users, and the User Interface Problem   Part of what has "caused" the DNS internationalization problem, as   well as the DNS trademark problem and several others, is that we have   stopped thinking about "identifiers for objects" -- which normal   people are not expected to see -- and started thinking about "names"   -- strings that are expected not only to be readable, but to have   linguistically-sensible and culturally-dependent meaning to non-   specialist users.   Within the IETF, the IDN-WG, and sometimes other groups, avoided   addressing the implications of that transition by taking "outside our   scope -- someone else's problem" approaches or by suggesting that   people will just become accustomed to whatever conventions are   adopted.  The realities of user and vendor behavior suggest that   these approaches will not serve the Internet community well in the   long term:   o  If we want to make it a problem in a different part of the user      interface structure, we need to figure out where it goes in order      to have proof of concept of our solution.  Unlike vendors whose      sole [business] model is the selling or registering of names, the      IETF must produce solutions that actually work, in the      applications context as seen by the end user.   o  The principle that "they will get used to our conventions and      adapt" is fine if we are writing rules for programming languages      or an API.  But the conventions under discussion are not part of a      semi-mathematical system, they are deeply ingrained in culture.      No matter how often an English-speaking American is told that the      Internet requires that the correct spelling of "colour" be used,      he or she isn't going to be convinced. Getting a French-speaker in      Lyon to use exactly the same lexical conventions as a French-Klensin                      Informational                     [Page 20]RFC 3467          Role of the Domain Name System (DNS)     February 2003      speaker in Quebec in order to accommodate the decisions of the      IETF or of a registrar or registry is just not likely.  "Montreal"      is either a misspelling or an anglicization of a similar word with      an acute accent mark over the "e" (i.e., using the Unicode      character U+00E9 or one of its equivalents). But global agreement      on a rule that will determine whether the two forms should match      -- and that won't astonish end users and speakers of one language      or the other -- is as unlikely as agreement on whether      "misspelling" or "anglicization" is the greater travesty.   More generally, it is not clear that the outcome of any conceivable   nameprep-like process is going to be good enough for practical,   user-level, use.  In the use of human languages by humans, there are   many cases in which things that do not match are nonetheless   interpreted as matching.  The Norwegian/Danish character that appears   in U+00F8 (visually, a lower case 'o' overstruck with a forward   slash) and the "o-umlaut" German character that appears in U+00F6   (visually, a lower case 'o' with diaeresis (or umlaut)) are clearly   different and no matching program should yield an "equal" comparison.   But they are more similar to each other than either of them is to,   e.g., "e".  Humans are able to mentally make the correction in   context, and do so easily, and they can be surprised if computers   cannot do so.  Worse, there is a Swedish character whose appearance   is identical to the German o-umlaut, and which shares code point   U+00F6, but that, if the languages are known and the sounds of the   letters or meanings of words including the character are considered,   actually should match the Norwegian/Danish use of U+00F8.   This text uses examples in Roman scripts because it is being written   in English and those examples are relatively easy to render.  But one   of the important lessons of the discussions about domain name   internationalization in recent years is that problems similar to   those described above exist in almost every language and script.   Each one has its idiosyncrasies, and each set of idiosyncracies is   tied to common usage and cultural issues that are very familiar in   the relevant group, and often deeply held as cultural values.  As   long as a schoolchild in the US can get a bad grade on a spelling   test for using a perfectly valid British spelling, or one in France   or Germany can get a poor grade for leaving off a diacritical mark,   there are issues with the relevant language.  Similarly, if children   in Egypt or Israel are taught that it is acceptable to write a word   with or without vowels or stress marks, but that, if those marks are   included, they must be the correct ones, or a user in Korea is   potentially offended or astonished by out-of-order sequences of Jamo,   systems based on character-at-a-time processing and simplistic   matching, with no contextual information, are not going to satisfy   user needs.Klensin                      Informational                     [Page 21]RFC 3467          Role of the Domain Name System (DNS)     February 2003   Users are demanding solutions that deal with language and culture.   Systems of identifier symbol-strings that serve specialists or   computers are, at best, a solution to a rather different (and, at the   time this document was written, somewhat ill-defined), problem.  The   recent efforts have made it ever more clear that, if we ignore the   distinction between the user requirements and narrowly-defined   identifiers, we are solving an insufficient problem.  And,   conversely, the approaches that have been proposed to approximate   solutions to the user requirement may be far more complex than simple   identifiers require.4.6 Business Cards and Other Natural Uses of Natural Languages   Over the last few centuries, local conventions have been established   in various parts of the world for dealing with multilingual   situations.  It may be helpful to examine some of these.  For   example, if one visits a country where the language is different from   ones own, business cards are often printed on two sides, one side in   each language.  The conventions are not completely consistent and the   technique assumes that recipients will be tolerant. Translations of   names or places are attempted in some situations and transliterations   in others.  Since it is widely understood that exact translations or   transliterations are often not possible, people typically smile at   errors, appreciate the effort, and move on.   The DNS situation differs from these practices in at least two ways.   Since a global solution is required, the business card would need a   number of sides approximating the number of languages in the world,   which is probably impossible without violating laws of physics.  More   important, the opportunities for tolerance don't exist:  the DNS   requires a exact match or the lookup fails.4.7 ASCII Encodings and the Roman Keyboard Assumption   Part of the argument for ACE-based solutions is that they provide an   escape for multilingual environments when applications have not been   upgraded.  When an older application encounters an ACE-based name,   the assumption is that the (admittedly ugly) ASCII-coded string will   be displayed and can be typed in.  This argument is reasonable from   the standpoint of mixtures of Roman-based alphabets, but may not be   relevant if user-level systems and devices are involved that do not   support the entry of Roman-based characters or which cannot   conveniently render such characters.  Such systems are few in the   world today, but the number can reasonably be expected to rise as the   Internet is increasingly used by populations whose primary concern is   with local issues, local information, and local languages.  It is,Klensin                      Informational                     [Page 22]RFC 3467          Role of the Domain Name System (DNS)     February 2003   for example, fairly easy to imagine populations who use Arabic or   Thai scripts and who do not have routine access to scripts or input   devices based on Roman-derived alphabets.4.8 Intra-DNS Approaches for "Multilingual Names"   It appears, from the cases above and others, that none of the intra-   DNS-based solutions for "multilingual names" are workable.  They rest   on too many assumptions that do not appear to be feasible -- that   people will adapt deeply-entrenched language habits to conventions   laid down to make the lives of computers easy; that we can make   "freeze it now, no need for changes in these areas" decisions about   Unicode and nameprep; that ACE will smooth over applications   problems, even in environments without the ability to key or render   Roman-based glyphs (or where user experience is such that such glyphs   cannot easily be distinguished from each other); that the Unicode   Consortium will never decide to repair an error in a way that creates   a risk of DNS incompatibility; that we can either deploy EDNS   [RFC2671] or that long names are not really important; that Japanese   and Chinese computer users (and others) will either give up their   local or IS 2022-based character coding solutions (for which addition   of a large fraction of a million new code points to Unicode is almost   certainly a necessary, but probably not sufficient, condition) or   build leakproof and completely accurate boundary conversion   mechanisms; that out of band or contextual information will always be   sufficient for the "map glyph onto script" problem; and so on.  In   each case, it is likely that about 80% or 90% of cases will work   satisfactorily

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -