⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 draft-ietf-idn-amc-ace-m-00.txt

📁 bind-3.2.
💻 TXT
📖 第 1 页 / 共 5 页
字号:
              the length of the code.  If the length is one and wide              style is being used, consume two more characters.              Decode the base-32 characters into an integer, add the              appropriate offset (which depends on the remembered code              length), and output the Unicode character corresponding to              the resulting code point.              If the case-flexible or case-preserving model is being              used (see section "Case sensitivity models"), the decoder              must either perform the case conversion as it is decoding,              or construct a separate record of the case information to              accompany the output string.     3) Before returning the output (be it a string or a string plus        case information), the decoder must invoke the encoder on it,        and compare the result to the input string.  The comparison        must be case-sensitive if the case-sensitive or case-flexible        model is being used, case-insensitive if the case-insensitive        or case-preserving model is being used.  If the two strings do        not match, it is an error.  This check is necessary to guarantee        the uniqueness property (there cannot be two distinct encoded        strings representing the same Unicode string).    If the decoder at any time encounters an unexpected character, or    unexpected end of input, then the input is invalid.Signature    The issue of how to distinguish ACE strings from unencoded strings    is largely orthogonal to the encoding scheme itself, and is    therefore not specified here.  In the context of domain name labels,    a standard prefix and/or suffix (chosen to be unlikely to occur    naturally) would presumably be attached to ACE labels.  (In that    case, it would probably be good to forbid the encoding of Unicode    strings that appear to match the signature, to avoid confusing    humans about whether they are looking at a Unicode string or an ACE    string.)    In order to use AMC-ACE-M in domain names, the choice of signature    must be mindful of the requirement in [RFC952] that labels never    begin or end with hyphen-minus.  The raw encoded string will never    begin with a hyphen-minus, and will end with a hyphen-minus iff the    Unicode string ends with a hyphen-minus.  The easiest solution is    to use a suffix as the signature.  Alternatively, if the Unicode    strings were forbidden from ending with a hyphen-minus, a prefix    could be used.    It appears that "---" is extremely rare in domain names; among the    four-character prefixes of all the second-level domains under .com,    .net, and .org, "---" never appears at all.  Therefore, perhaps the    signature should be of the form ?--- (prefix) or ---? (suffix),    where ? could be "u" for Unicode, or "i" for internationalized, or    "a" for ACE, or maybe "q" or "z" because they are rare.Case sensitivity models    The higher layer must choose one of the following four models.    Models suitable for domain names:      * Case-insensitive:  Before a string is encoded, all its non-LDH        characters must be case-folded so that any strings differing        only in case become the same string (for example, strings could        be forced to lowercase).  Folding LDH characters is optional.        The case of base-32 characters and literal-mode characters is        arbitrary and not significant.  Comparisons between encoded        strings must be case-insensitive.  The original case of non-LDH        characters cannot be recovered from the encoded string.      * Case-preserving:  The case of the Unicode characters is not        considered significant, but it can be preserved and recovered,        just like in non-internationalized host names.  Before a string        is encoded, all its non-LDH characters must be case-folded        as in the previous model.  LDH characters are naturally able        to retain their case attributes because they are encoded        literally.  The case attribute of a non-LDH character is        recorded in one of the base-32 characters that represent        it (section "Encoding procedure" tells which one).  If the        base-32 character is uppercase, it means the Unicode character        is caseless or should be forced to uppercase after being        decoded (which is a no-op if the case folding already forces        to uppercase).  If the base-32 character is lowercase, it        means the Unicode character is caseless or should be forced to        lowercase after being decoded (which is a no-op if the case        folding already forces to lowercase).  The case of the other        base-32 characters in a multi-quintet encoding is arbitrary        and not significant.  Only uppercase and lowercase attributes        can be recorded, not titlecase.  Comparisons between encoded        strings must be case-insensitive, and are equivalent to        case-insensitive comparisons between the Unicode strings.  The        intended mixed-case Unicode string can be recovered as long as        the encoded characters are unaltered, but altering the case of        the encoded characters is not harmful--it merely alters the case        of the Unicode characters, and such a change is not considered        significant.        In this model, the input to the encoder and the output of the        decoder can be the unfolded Unicode string (in which case the        encoder and decoder are responsible for performing the case        folding and recovery), or can be the folded Unicode string        accompanied by separate case information (in which case the        higher layer is responsible for performing the case folding and        recovery).  Whichever layer performs the case recovery must        first verify that the Unicode string is properly folded, to        guarantee the uniqueness of the encoding.        It is easy to extend the nameprep algorithm [NAMEPREP02] to        remember case information.  It merely requires an additional        bit to be associated with each output code point in the mapping        table.    The case-insensitive and case-preserving models are interoperable.    If a domain name passes from a case-preserving entity to a    case-insensitive entity, the case information will be lost, but    the domain name will still be equivalent.  This phenomenon already    occurs with non-internationalized domain names.    Models unsuitable for domain names, but possibly useful in other    contexts:      * Case-sensitive:  Unicode strings may contain both uppercase and        lowercase characters, which are not folded.  Base-32 characters        must be lowercase.  Comparisons between encoded strings must be        case-sensitive.      * Case-flexible:  Like case-preserving, except that the choice        of whether the case of the Unicode characters is considered        significant is deferred.  Therefore, base-32 characters must        be lowercase, except for those used to indicate uppercase        Unicode characters.  Comparisons between encoded strings may be        case-sensitive or case-insensitive, and such comparisons are        equivalent to the corresponding comparisons between the Unicode        strings.Comparison with RACE, BRACE, LACE, and DUDE    In this section we compare AMC-ACE-M and four other ACEs: RACE    [RACE03], BRACE [BRACE00], LACE [LACE01], and Extended DUDE    [DUDE00].  We do not include SACE [SACE], UTF-5 [UTF5], or UTF-6    [UTF6] in the comparison, because SACE appears obviously too    complex, UTF-5 appears obviously too inefficient, and UTF-6 can    never be more efficient than its similarly simple successor, DUDE.    Case preservation support:        DUDE, AMC-ACE-M:  all characters                  BRACE:  only the letters A-Z, a-z             RACE, LACE:  none    RACE, BRACE, and LACE transform the Unicode string to an    intermediate bit string, then into a base-32 string, so there is no    particular alignment between the base-32 characters and the Unicode    characters.  DUDE and AMC-ACE-M do not have this intermediate stage,    and enforce alignment between the base-32 characters and the Unicode    characters, which facilitates the case preservation.    Complexity is hard to measure.  This author would subjectively    describe the complexity of the algorithms as:        RACE, LACE, DUDE: fairly simple but not trivial               AMC-ACE-M: moderate                   BRACE: complex    The complexity of AMC-ACE-M is in the number of rules, but the    individual rules are not very complex, and they are generally    non-interacting.    The relative efficiency of the various algorithms is suggested    by the sizes of the encodings in section "Example strings".  For    each ACE there is a graph below showing a horizontal bar for    each example string, representing the ACE length divided by the    minimum length among all the ACEs for that example string (so the    ratio is at least 1).  Example R is excluded because it violates    nameprep [NAMEPREP02].  The other example strings all use different    languages, except that there are several Japanese examples.  To    avoid skewing the results, each graph collapses all the Japanese    ratios into a single bar representing the median ratio.  A ratio r    is represented by a bar of length r/0.04 characters.  Since the bar    will always be at least 1/0.04 = 25 characters long, we show the    first 25 characters as "O" and the rest as "@". The bars are sorted    so that the graph looks like a cummulative distribution.  Each bar    is labeled with the language of the corresponding example string.    (The difference between the Chinese and Taiwanese strings is that    the former uses simplified characters.)        RACE:          Hindi       OOOOOOOOOOOOOOOOOOOOOOOOO@@@          Korean      OOOOOOOOOOOOOOOOOOOOOOOOO@@@          Arabic      OOOOOOOOOOOOOOOOOOOOOOOOO@@@@          Taiwanese   OOOOOOOOOOOOOOOOOOOOOOOOO@@@@          Hebrew      OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@          Russian     OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@          Japanese    OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@          Spanish     OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@@@          Chinese     OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@@@@          Vietnamese  OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@@@@@@@@@@          Czech       OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@@@@@@@@@@@@@@@@@@@        LACE:          Korean      OOOOOOOOOOOOOOOOOOOOOOOOO@@@          Hindi       OOOOOOOOOOOOOOOOOOOOOOOOO@@@@          Taiwanese   OOOOOOOOOOOOOOOOOOOOOOOOO@@@@          Arabic      OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@          Hebrew      OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@          Chinese     OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@          Japanese    OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@          Russian     OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@          Spanish     OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@@@@          Vietnamese  OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@@@@@@@@          Czech       OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@@@@@@@@@@@@        DUDE:          Russian     OOOOOOOOOOOOOOOOOOOOOOOOO          Arabic      OOOOOOOOOOOOOOOOOOOOOOOOO          Hebrew      OOOOOOOOOOOOOOOOOOOOOOOOO@@          Vietnamese  OOOOOOOOOOOOOOOOOOOOOOOOO@@@@          Chinese     OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@          Japanese    OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@          Korean      OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@          Spanish     OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@          Czech       OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@          Hindi       OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@          Taiwanese   OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@@@@        AMC-ACE-M:          Czech       OOOOOOOOOOOOOOOOOOOOOOOOO          Hebrew      OOOOOOOOOOOOOOOOOOOOOOOOO          Japanese    OOOOOOOOOOOOOOOOOOOOOOOOO          Korean      OOOOOOOOOOOOOOOOOOOOOOOOO          Russian     OOOOOOOOOOOOOOOOOOOOOOOOO          Spanish     OOOOOOOOOOOOOOOOOOOOOOOOO          Taiwanese   OOOOOOOOOOOOOOOOOOOOOOOOO          Vietnamese  OOOOOOOOOOOOOOOOOOOOOOOOO          Chinese     OOOOOOOOOOOOOOOOOOOOOOOOO@          Arabic      OOOOOOOOOOOOOOOOOOOOOOOOO@@@          Hindi       OOOOOOOOOOOOOOOOOOOOOOOOO@@@@@        BRACE:          Chinese     OOOOOOOOOOOOOOOOOOOOOOOOO          Hindi       OOOOOOOOOOOOOOOOOOOOOOOOO          Japanese    OOOOOOOOOOOOOOOOOOOOOOOOO          Spanish     OOOOOOOOOOOOOOOOOOOOOOOOO          Taiwanese   OOOOOOOOOOOOOOOOOOOOOOOOO          Arabic      OOOOOOOOOOOOOOOOOOOOOOOOO@          Czech       OOOOOOOOOOOOOOOOOOOOOOOOO@          Vietnamese  OOOOOOOOOOOOOOOOOOOOOOOOO@          Hebrew      OOOOOOOOOOOOOOOOOOOOOOOOO@@          Korean      OOOOOOOOOOOOOOOOOOOOOOOOO@@          Russian     OOOOOOOOOOOOOOOOOOOOOOOOO@@@    These results suggest that DUDE is preferrable to RACE and LACE,    because it has similar simplicity, better support for case    preservation, and is somewhat more efficient.    The results also suggest that AMC-ACE-M is preferrable to BRACE,    because it has similar efficiency, better support for case    preservation, and is simpler.    DUDE and AMC-ACE-M have equal support for case preservation, but    AMC-ACE-M offers significantly better efficiency, at the cost of    significantly greater complexity, so choosing between them entails a    value judgement.Example strings    In the ACE encodings below, signatures (like "bq--" for RACE) are    not shown.  Non-LDH characters in the Unicode string are forced to    lowercase before being encoded using BRACE, RACE, and LACE.  For    RACE and LACE, the letters A-Z are likewise forced to lowercase.    UTF-8 and UTF-16 are included for length comparisons, with non-ASCII    bytes shown as "?". AMC-ACE-M is abbreviated AMC-M.  Backslashes    show where line breaks have been inserted in ACE strings too long    for one line.  The RACE and LACE encodings are courtesy of Mark    Davis's online UTF converter [UTFCONV] (slightly modified to remove    the length restrictions).    The first several examples are all names of Japanese music artists,    song titles, and TV programs, just because the author happens to    have them handy (but Japanese is useful for providing examples    of single-row text, two-row text, ideographic text, and various    mixtures thereof).    (A) 3<nen>B<gumi><kinpachi><sensei>  (Japanese TV program title)        <nen>              = U+5E74                       (kanji)        <gumi>             = U+7D44                       (kanji)        <kinpachi><sensei> = U+91D1 U+516B U+5148 U+751F  (kanji)        UTF-16: ????????????????        UTF-8:  3???B???????????????        AMC-M:  utk-3-8ze-B-hkenqtymwifi9        BRACE:  u-3-ygj-b-ynb6gjc7pp4k5p5w        DUDE:   j3le74G062nd44p1d1l16bk8n51f        RACE:   3aadgxtuabrh2rer2fiwwukioupq        LACE:   74adgxtuabrh2rer2fiwwukioupq    (B) <amuro><namie>-with-SUPER-MONKEYS  (Japanese music group name)        <amuro><namie> = U+5B89 U+5BA4 U+5948 U+7F8E U+6075  (kanji)        UTF-8:  ??????????????????-with-SUPER-MONKEYS        AMC-M:  u5m2j4etwif6q2zf---with--SUPER--MONKEYS        BRACE:  uvj7fuaqcahy982xa---with--SUPER--MONKEYS        DUDE:   lb89q4p48nf8em075-g077m9n4m8-N3LGM5N2-MdVURLN9J        UTF-16: ????????????????????????????????????????????????        LACE:   ajnytjablfeac74oafqhkeyafv3qm5difvzxk4dfoiww233onnsxs4y        RACE:   3bnysw5elfeh7dtaouac2adxabuqa5aanaac2adtab2qa4aamuaheab\                nabwqa3yanyagwadfab4qa4y    (C) Hello-Another-Way-<sorezore><no><basho>  (Japanese song title)        <sorezore><no> = U+305D U+308C U+305E U+308C U+306E  (hiragana)        <basho>        = U+5834 U+6240                       (kanji)        UTF-8:  Hello-Another-Way-?????????????????????        BRACE:  ji7-Hello--Another--Way---v3jhaefvd2ufj62        AMC-M:  bsk-Hello--Another--Way---p2nq2nyqx2veyuwa        DUDE:   M8lssv-Huvn4m8ln2-Nm1n9-j05docleocmel834m240        UTF-16: ??????????????????????????????????????????????????        LACE:   ciagqzlmnrxs2ylon52gqzlsfv3wc6jnauyf3dc6rrxacwbuafrea        RACE:   3aagqadfabwaa3aan4ac2adbabxaa3yaoqagqadfabzaaliao4agcad\                zaawtaxjqrqyf4memgbxfqndcia    (D) <hitotsu><yane><no><shita>2  (Japanese TV program title)        <hitotsu> = U+3072 U+3068 U+3064  (hiragana)        <yane>    = U+5C4B U+6839         (kanji)        <no>      = U+306E                (hiragana)        <shita>   = U+4E0B                (kanji)        UTF-16: ????????????????        UTF-8:  ?????????????????????2        AMC-M:  bsnzciex6wmy2vjqw8sm-2        BRACE:  ji96u56uwbhf2wqxnw4s-2        DUDE:   j072m8klc4bm839j06eke0bg032        RACE:   3ayhemdigbsfys3iheyg4tqlaaza        LACE:   74yhemdigbsfys3iheyg4tqlaaza    (E) Maji<de>Koi<suru>5<byou><mae> (Japanese song title)

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -