📄 draft-ietf-idn-lace-01.txt
字号:
section.2.4.1 Compressing a stringThe input string is in the UTF-16 encoding (big-endian UTF-16 with nobyte order mark).Design note: No checking is done on the input to this algorithm. It isassumed that all checking for valid ISO/IEC 10646 characters has alreadybeen done by a previous step in the conversion process.1) If the length (measured in octets) of the input is not even, or isless than 2, stop with an error.2) Set the input pointer, called IP, to the first octet of the inputstring.3) Set the variable called HIGH to the octet at IP.4) Determine the number of contiguous pairs at or after IP that haveHIGH as the first octet; call this COUNT.5) Put into an output buffer the single octet for COUNT followed by thesingle octet for HIGH, followed by all those low octets. Move IP to theend of those pairs; that is, set IP to IP+(2*COUNT).6) If IP is not at the end of the input string, go to step 3.7) If the length of the output buffer is less than or equal to thelength of the input buffer (in octets, not in characters), emit theoutput buffer. Otherwise, output the octet 0xFF followed by the inputbuffer. Note that there can only be one possible representation for aname part, so that outputting the wrong name part is a serious securityerror. Decompression schemes MUST accept only the valid form and MUSTNOT accept invalid forms.2.4.2 Decompressing a string1. Set the input pointer, called IP, to the first octet of the inputstring. If there is no first octet, stop with an error.2. If the octet at IP is 0xFF, set IP to IP+1, copy the rest of theinput buffer to the output buffer, and go to step 9.3. Get the octet at IP, call it COUNT. If COUNT equals zero or isgreater than 36, stop with an error. Set IP to IP+1. If IP is now at theend of the input string, stop with an error.4. Get the octet at IP, call it HIGH. Set IP to IP+1.5. If IP is now at the end of the input string, stop with an error. Getthe octet at IP, call it LOW. Set IP to IP+1.6. Output HIGH, then LOW, to the output buffer.7. Decrement COUNT. If COUNT is greater than 0, go to step 5.8. If IP is not at the end of the input buffer, go to step 3.9. If the length of the output buffer is odd, stop with an error.Compress the output buffer into a separate comparison buffer followingthe steps for compression above. If the contents of the comparisonbuffer does not equal the input to the compression step, stop with anerror. Otherwise, send out the output buffer and stop.2.4.3 Compression examplesThe five input characters <U+30E6 U+30CB U+30B3 U+30FC U+30C9> arerepresented in big-endian UTF-16 as the ten octets <30 E6 30 CB 30 B3 30FC 30 C9>. All the code units are in the same row (03). The outputbuffer has seven octets <05 30 E6 CB B3 FC C9>, which is shorter thanthe input string. Thus the output is <05 30 E6 CB B3 FC C9>.The four input characters <U+012F U+0111 U+0149 U+00E5> are representedin big-endian UTF-16 as the eight octets <01 2F 01 11 01 49 00 E5>. Theoutput buffer has eight octets <03 01 2F 11 49 01 00 E5>, which is thesame length as the input string. Thus, the output is <03 01 2F 11 49 0100 E5>.The three input characters <U+012F U+00E0 U+014B> are represented inbig-endian UTF-16 as the six octets <01 2F 00 E0 01 4B>. The outputbuffer is nine octets <01 01 2F 01 00 E0 01 01 4B>, which is longer thanthe input buffer. Thus, the output is <FF 01 2F 00 E0 01 4B>.2.5 Base32In order to encode non-ASCII characters in DNS-compatible host name parts,they must be converted into legal characters. This is done with Base32encoding, described here.Table 1 shows the mapping between input bits and output characters inBase32. Design note: the digits used in Base32 are "2" through "7"instead of "0" through "6" in order to avoid digits "0" and "1". Thishelps reduce errors for users who are entering a Base32 stream and maymisinterpret a "0" for an "O" or a "1" for an "l". Table 1: Base32 conversion bits char hex bits char hex 00000 a 0x61 10000 q 0x71 00001 b 0x62 10001 r 0x72 00010 c 0x63 10010 s 0x73 00011 d 0x64 10011 t 0x74 00100 e 0x65 10100 u 0x75 00101 f 0x66 10101 v 0x76 00110 g 0x67 10110 w 0x77 00111 h 0x68 10111 x 0x78 01000 i 0x69 11000 y 0x79 01001 j 0x6a 11001 z 0x7a 01010 k 0x6b 11010 2 0x32 01011 l 0x6c 11011 3 0x33 01100 m 0x6d 11100 4 0x34 01101 n 0x6e 11101 5 0x35 01110 o 0x6f 11110 6 0x36 01111 p 0x70 11111 7 0x372.5.1 Encoding octets as Base32The input is a stream of octets. However, the octets are then treatedas a stream of bits.Design note: The assumption that the input is a stream of octets(instead of a stream of bits) was made so that no padding was needed.If you are reusing this algorithm for a stream of bits, you must add apadding mechanism in order to differentiate different lengths of input.1) Set the read pointer to the beginning of the input bit stream.2) Look at the five bits after the read pointer. If there are not fivebits, go to step 5.3) Look up the value of the set of five bits in the bits column ofTable 1, and output the character from the char column (whose hex valueis in the hex column).4) Move the read pointer five bits forward. If the read pointer is atthe end of the input bit stream (that is, there are no more bits in theinput), stop. Otherwise, go to step 2.5) Pad the bits seen until there are five bits.6) Look up the value of the set of five bits in the bits column ofTable 1, and output the character from the char column (whose hex valueis in the hex column).2.5.2 Decoding Base32 as octetsThe input is octets in network byte order. The input octets MUST bevalues from the second column in Table 1.1) Count the number of octets in the input and divide it by 8; call theremainder INPUTCHECK. If INPUTCHECK is 1 or 3 or 6, stop with an error.2) Set the read pointer to the beginning of the input octet stream.3) Look up the character value of the octet in the char column (or hexvalue in hex column) of Table 1, and add the five bits from the bitscolumn to the output buffer.4) Move the read pointer one octet forward. If the read pointer is notat the end of the input octet stream (that is, there are more octets inthe input), go to step 3.5) Count the number of bits that are in the output buffer and divide itby 8; call the remainder PADDING. If the PADDING number of bits at theend of the output buffer are not all zero, stop with an error.Otherwise, emit the output buffer and stop.2.5.3 Base32 exampleAssume you want to encode the value 0x3a270f93. The bit string is:3 a 2 7 0 f 9 300111010 00100111 00001111 10010011Broken into chunks of five bits, this is:00111 01000 10011 10000 11111 00100 11Padding is added to make the last chunk five bits:00111 01000 10011 10000 11111 00100 11000The output of encoding is:00111 01000 10011 10000 11111 00100 11000 h i t q 7 e yor "hitq7ey".3. Security ConsiderationsMuch of the security of the Internet relies on the DNS. Thus, anychange to the characteristics of the DNS can change the security ofmuch of the Internet. Thus, LACE makes no changes to the DNSitself.Host names are used by users to connect to Internet servers. Thesecurity of the Internet would be compromised if a user entering asingle internationalized name could be connected to different serversbased on different interpretations of the internationalized hostname.LACE is designed so that every internationalized host name partcan be represented as one and only one DNS-compatible string. If thereis any way to follow the steps in this document and get two or moredifferent results, it is a severe and fatal error in the protocol.4. References[IDNComp] Paul Hoffman, "Comparison of Internationalized Domain Name Proposals",draft-ietf-idn-compare.[IDNReq] James Seng, "Requirements of Internationalized Domain Names",draft-ietf-idn-requirement.[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Informationtechnology -- Universal Multiple-Octet Coded Character Set (UCS) --Part 1: Architecture and Basic Multilingual Plane. Five amendments anda technical corrigendum have been published up to now. UTF-16 isdescribed in Annex Q, published as Amendment 1. 17 other amendments arecurrently at various stages of standardization. [[[ THIS REFERENCENEEDS TO BE UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]][RFC2119] Scott Bradner, "Key words for use in RFCs to IndicateRequirement Levels", March 1997, RFC 2119.[RFC2781] Paul Hoffman and Francois Yergeau, "UTF-16, an encoding of ISO10646", February 2000, RFC 2781.[STD13] Paul Mockapetris, "Domain names - implementation andspecification", November 1987, STD 13 (RFC 1035).[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version3.0", ISBN 0-201-61633-5. Described at<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.A. AcknowledgementsRick Wesson pointed out some error conditions that need to betested for. Scott Hollenbeck pointed out some errors in thecompression.Base32 is quite obviously inspired by the tried-and-true Base64Content-Transfer-Encoding from MIME.B. Sample codeThe following is sample Javascript code for the LACE algorithm.This code is believed to be correct, but there may be errors init. The code is provided as-is and comes with no warranty offitness, correctness, blah blah blah./** * Converts to LACE compression format (without Base32) from * UTF-16BE array * @parameter iArray Array of bytes in UTF16-BE * @parameter iCount Number of elements. Must be 0..63 * @parameter oArray Array for output of LACE bytes. * Must be at least 100 octets long to provide internal working space * @return Length of output array used * @parameter parseResult output error value if any * @author Mark Davis */function toLACE(iArray, iCount, oArray, parseResult) {//debugger; if (iCount < 1 || iCount > 62) 讃 parseResult.set("Lace: count out of range", iCount); return; } if ((iCount % 2) == 1) 讃 parseResult.set("Lace: odd length, can't be UTF-16", iCount); return; } var op = 0;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -