⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 draft-ietf-idn-dude-02.txt

📁 bind-3.2.
💻 TXT
📖 第 1 页 / 共 3 页
字号:
INTERNET-DRAFT                                               Mark Welterdraft-ietf-idn-dude-02.txt                            Brian W. SpolarichExpires 2001-Dec-07                                     Adam M. Costello                                                             2001-Jun-07              Differential Unicode Domain Encoding (DUDE)Status of this Memo    This document is an Internet-Draft and is in full conformance with    all provisions of Section 10 of RFC2026.    Internet-Drafts are working documents of the Internet Engineering    Task Force (IETF), its areas, and its working groups.  Note    that other groups may also distribute working documents as    Internet-Drafts.    Internet-Drafts are draft documents valid for a maximum of six    months and may be updated, replaced, or obsoleted by other documents    at any time.  It is inappropriate to use Internet-Drafts as    reference material or to cite them other than as "work in progress."    The list of current Internet-Drafts can be accessed at    http://www.ietf.org/ietf/1id-abstracts.txt    The list of Internet-Draft Shadow Directories can be accessed at    http://www.ietf.org/shadow.html    Distribution of this document is unlimited.  Please send comments to    the authors or to the idn working group at idn@ops.ietf.org.Abstract    DUDE is a reversible transformation from a sequence of nonnegative    integer values to a sequence of letters, digits, and hyphens (LDH    characters).  DUDE provides a simple and efficient ASCII-Compatible    Encoding (ACE) of Unicode strings [UNICODE] for use with    Internationalized Domain Names [IDN] [IDNA].Contents    1. Introduction    2. Terminology    3. Overview    4. Base-32 characters    5. Encoding procedure    6. Decoding procedure    7. Example strings    8. Security considerations    9. References    A. Acknowledgements    B. Author contact information    C. Mixed-case annotation    D. Differences from draft-ietf-idn-dude-01    E. Example implementation1. Introduction    The IDNA draft [IDNA] describes an architecture for supporting    internationalized domain names.  Each label of a domain name may    begin with a special prefix, in which case the remainder of the    label is an ASCII-Compatible Encoding (ACE) of a Unicode string    satisfying certain constraints.  For the details of the constraints,    see [IDNA] and [NAMEPREP].  The prefix has not yet been specified,    but see http://www.i-d-n.net/ for prefixes to be used for testing    and experimentation.    DUDE is intended to be used as an ACE within IDNA, and has been    designed to have the following features:      * Completeness:  Every sequence of nonnegative integers maps to an        LDH string.  Restrictions on which integers are allowed, and on        sequence length, may be imposed by higher layers.      * Uniqueness:  Every sequence of nonnegative integers maps to at        most one LDH string.      * Reversibility:  Any Unicode string mapped to an LDH string can        be recovered from that LDH string.      * Efficient encoding:  The ratio of encoded size to original size        is small.  This is important in the context of domain names        because [RFC1034] restricts the length of a domain label to 63        characters.      * Simplicity:  The encoding and decoding algorithms are reasonably        simple to implement.  The goals of efficiency and simplicity are        at odds; DUDE places greater emphasis on simplicity.    An optional feature is described in appendix C "Mixed-case    annotation".2. Terminology    The key words "must", "shall", "required", "should", "recommended",    and "may" in this document are to be interpreted as described in    RFC 2119 [RFC2119].    LDH characters are the letters A-Z and a-z, the digits 0-9, and    hyphen-minus.    A quartet is a sequence of four bits (also known as a nibble or    nybble).    A quintet is a sequence of five bits.    Hexadecimal values are shown preceeded by "0x".  For example, 0x60    is decimal 96.    As in the Unicode Standard [UNICODE], Unicode code points are    denoted by "U+" followed by four to six hexadecimal digits, while a    range of code points is denoted by two hexadecimal numbers separated    by "..", with no prefixes.    XOR means bitwise exclusive or.  Given two nonnegative integer    values A and B, A XOR B is the nonnegative integer value whose    binary representation is 1 in whichever places the binary    representations of A and B disagree, and 0 wherever they agree.    For the purpose of applying this rule, recall that an integer's    representation begins with an infinite number of unwritten zeros.    In some programming languages, care may need to be taken that A and    B are stored in variables of the same type and size.3. Overview    DUDE encodes a sequence of nonnegative integral values as a sequence    of LDH characters, although implementations will of course need to    represent the output characters somehow, typically as ASCII octets.    When DUDE is used to encode Unicode characters, the input values are    Unicode code points (integral values in the range 0..10FFFF, but not    D800..DFFF, which are reserved for use by UTF-16).    Each value in the input sequence is represented by one or more LDH    characters in the encoded string.  The value 0x2D is represented    by hyphen-minus (U+002D).  Each non-hyphen-minus character in    the encoded string represents a quintet.  A sequence of quintets    represents the bitwise XOR between each non-0x2D integer and the    previous one.4. Base-32 characters        "a" =  0 = 0x00 = 00000         "s" = 16 = 0x10 = 10000        "b" =  1 = 0x01 = 00001         "t" = 17 = 0x11 = 10001        "c" =  2 = 0x02 = 00010         "u" = 18 = 0x12 = 10010        "d" =  3 = 0x03 = 00011         "v" = 19 = 0x13 = 10011        "e" =  4 = 0x04 = 00100         "w" = 20 = 0x14 = 10100        "f" =  5 = 0x05 = 00101         "x" = 21 = 0x15 = 10101        "g" =  6 = 0x06 = 00110         "y" = 22 = 0x16 = 10110        "h" =  7 = 0x07 = 00111         "z" = 23 = 0x17 = 10111        "i" =  8 = 0x08 = 01000         "2" = 24 = 0x18 = 11000        "j" =  9 = 0x09 = 01001         "3" = 25 = 0x19 = 11001        "k" = 10 = 0x0A = 01010         "4" = 26 = 0x1A = 11010        "m" = 11 = 0x0B = 01011         "5" = 27 = 0x1B = 11011        "n" = 12 = 0x0C = 01100         "6" = 28 = 0x1C = 11100        "p" = 13 = 0x0D = 01101         "7" = 29 = 0x1D = 11101        "q" = 14 = 0x0E = 01110         "8" = 30 = 0x1E = 11110        "r" = 15 = 0x0F = 01111         "9" = 31 = 0x1F = 11111    The digits "0" and "1" and the letters "o" and "l" are not used, to    avoid transcription errors.    A decoder must accept both the uppercase and lowercase forms of    the base-32 characters (including mixtures of both forms).  An    encoder should output only lowercase forms or only uppercase forms    (unless it uses the feature described in the appendix C "Mixed-case    annotation").5. Encoding procedure    All ordering of bits, quartets, and quintets is big-endian (most    significant first).    let prev = 0x60    for each input integer n (in order) do begin      if n == 0x2D then output hyphen-minus      else begin        let diff = prev XOR n        represent diff in base 16 as a sequence of quartets,          as few as are sufficient (but at least one)        prepend 0 to the last quartet and 1 to each of the others        output a base-32 character corresponding to each quintet        let prev = n      end    end    If an encoder encounters an input value larger than expected (for    example, the largest Unicode code point is U+10FFFF, and nameprep    [NAMEPREP03] can never output a code point larger than U+EFFFD),    the encoder may either encode the value correctly, or may fail, but    it must not produce incorrect output.  The encoder must fail if it    encounters a negative input value.6. Decoding procedure    let prev = 0x60    while the input string is not exhausted do begin      if the next character is hyphen-minus      then consume it and output 0x2D      else begin        consume characters and convert them to quintets until          encountering a quintet whose first bit is 0        fail upon encountering a non-base-32 character or end-of-input        strip the first bit of each quintet        concatenate the resulting quartets to form diff        let prev = prev XOR diff        output prev      end    end    encode the output sequence and compare it to the input string    fail if they do not match (case-insensitively)    The comparison at the end is necessary to guarantee the uniqueness    property (there cannot be two distinct encoded strings representing    the same sequence of integers).  This check also frees the decoder    from having to check for overflow while decoding the base-32    characters.  (If the decoder is one step of a larger decoding    process, it may be possible to defer the re-encoding and comparison    to the end of that larger decoding process.)7. Example strings    The first several examples are nonsense strings of mostly unassigned    code points intended to exercise the corner cases of the algorithm.    (A) u+0061        DUDE: b    (B) u+2C7EF u+2C7EF        DUDE: u6z2ra    (C) u+1752B u+1752A        DUDE: tzxwmb    (D) u+63AB1 u+63ABA        DUDE: yv47bm    (E) u+261AF u+261BF        DUDE: uyt6rta    (F) u+C3A31 u+C3A8C        DUDE: 6v4xb5p    (G) u+09F44 u+0954C        DUDE: 39ue4si    (H) u+8D1A3 u+8C8A3        DUDE: 27t6dt3sa    (I) u+6C2B6 u+CC266        DUDE: y6u7g4ss7a    (J) u+002D u+002D u+002D u+E848F        DUDE: ---82w8r    (K) u+BD08E u+002D u+002D u+002D        DUDE: 57s8q---    (L) u+A9A24 u+002D u+002D u+002D u+C05B7        DUDE: 434we---y393d    (M) u+7FFFFFFF        DUDE: z999993r or explicit failure    The next several examples are realistic Unicode strings that could    be used in domain names.  They exhibit single-row text, two-row    text, ideographic text, and mixtures thereof.  These examples are    names of Japanese television programs, music artists, and songs,    merely because one of the authors happened to have them handy.    (N) 3<nen>b<gumi><kinpachi><sensei>  (Latin, kanji)        u+0033 u+5E74 u+0062 u+7D44 u+91D1 u+516B u+5148 u+751F        DUDE: xdx8whx8tgz7ug863f6s5kuduwxh    (O) <amuro><namie>-with-super-monkeys  (Latin, kanji, hyphens)        u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074        u+0068 u+002D u+0073 u+0075 u+0070 u+0065 u+0072 u+002D u+006D        u+006F u+006E u+006B u+0065 u+0079 u+0073        DUDE: x58jupu8nuy6gt99m-yssctqtptn-tmgftfth-trcbfqtnk    (P) maji<de>koi<suru>5<byou><mae>  (Latin, hiragana, kanji)        u+006D u+0061 u+006A u+0069 u+3067 u+006B u+006F u+0069 u+3059        u+308B u+0035 u+79D2 u+524D        DUDE: pnmdvssqvssnegvsva7cvs5qz38hu53r    (Q) <pafii>de<runba>  (Latin, katakana)        u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0        DUDE: vs5bezgxrvs3ibvs2qtiud    (R) <sono><supiido><de>  (hiragana, katakana)

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -