📄 draft-ietf-idn-brace-00.txt
字号:
Example strings All of these examples use Japanese text, merely because that is the only kind of non-English text that the author has lying around. Example of no-row style: An actual music group name coerced into the usual format for host name labels: AMURONAMIE-with-super-monkeys AMURONAMIE stands for five kanji whose Unicode values are (in order): U+5B89 U+5BA4 U+5948 U+7F8E U+6075 The BRACE encoding is: UVJ7FUAQCAHY982XA---with--super--monkeys-8Q9 (Note that the RACE encoding would have been 79 characters long, and hence unusable.) Example of mixed style: An actual song title coerced into the usual format for host name labels: hello-another-way-SOREZORENOBASHO SOREZORENOBASHO stands for five hiragana followed by two kanji, whose Unicode values are (in order): U+305D U+308C U+305E U+308C U+306E U+5834 U+6240 The BRACE encoding is: JI7-hello--another--way---V3JHAEFVD2UFJ62-8Q9 Example of full-row style: An actual song title, SONOSUPIIDODE, which stands for two hiragana followed by four katakana followed by one hiragana, whose Unicode values are: U+305D U+306E U+30B9 U+30D4 U+30FC U+30C9 U+3067 The BRACE encoding is: BIDPRDMP9WT7MI-8Q9 Example of half-row style: An actual song title: PAFIIdeRUNBA PAFII stands for four katakana whose Unicode values are: U+30D1 U+30D5 U+30A3 U+30FC RUNBA stands for three katakana whose Unicode values are: U+30EB U+30F3 U+30D0 The BRACE encoding is: 3IU8PAZT-de-PYGI-8Q9 Example of an ASCII string that breaks all the rules of host name labels: -> $1.00 <- The BRACE encoding is: 229--T2B4-1-W-00-I9I---8Q9Security considerations Users expect each host name in DNS to be controlled by a single authority. If a UTF-16 string could map to multiple labels, then a UTF-16 host name could map to multiple real host names, each controlled by a different authority, some of which could be spoofs that hijack service requests intended for another. Therefore BRACE is designed so that each UTF-16 string maps to a unique label. However, there can still be multiple UTF-16 representations of the "same" text, for various definitions of "same". This problem is addressed by the Unicode standard under the topic of canonicalization, but the issue needs to be studied further in the context of host names. Also, some text strings may be misleading or ambiguous to humans, such as strings containing dots, slashes, at-signs, etc. Policies for allowable Unicode strings need to be developed.References [IDN] Internationalized Domain Names (IETF working group), http://www.i-d-n.net/, idn@ops.ietf.org. [RACE01] Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding for IDN", 2000-Aug-31, draft-ietf-idn-race-01. [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host Table Specification", 1985-Oct, RFC 952. [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities", 1987-Nov, RFC 1034. [RFC1123] Internet Engineering Task Force, R. Braden (editor), "Requirements for Internet Hosts -- Application and Support", 1989-Oct, RFC 1123. [SACE] Dan Oscarsson, "Simple ASCII Compatible Encoding (SACE)", draft-ietf-idn-sace. [UNICODE] The Unicode Consortium, "The Unicode Standard", http://www.unicode.org/unicode/standard/standard.html. [UTF5] James Seng, Martin Duerst, Tin Wee Tan, "UTF-5, a Transformation Format of Unicode and ISO 10646", draft-jseng-utf5.Author Adam M. Costello <amc@cs.berkeley.edu> http://www.cs.berkeley.edu/~amc/Example implementation/* brace.c 0.1.1 (2000-Sep-09-Sat) *//* Adam M. Costello <amc@cs.berkeley.edu> *//* This is ANSI C code implementing BRACE version 0.1.*. *//* Public interface (would normally go in its own .h file): */enum { brace_encoder_in_max = 63, brace_encoder_out_max = 4 + (6 + 16 * brace_encoder_in_max) / 5 + 1, brace_decoder_in_max = 63 + 1, brace_decoder_out_max = brace_decoder_in_max - 1}; /* The above constants are the maximum array sizes */ /* that the encoder/decoder will accept/produce */ /* (including null terminators for ASCII strings). */void brace_encode( unsigned int input_length, unsigned short *input, char output[brace_encoder_out_max] ); /* brace_encode() converts UTF-16 input to null-terminated */ /* BRACE-encoded ASCII output. The input_length must not */ /* exceed brace_encoder_in_max, and the output array must */ /* have at least the size indicated below. Under those */ /* constraints, this function never fails. */int brace_decode( char *input, unsigned int *output_length, unsigned short output[brace_decoder_out_max] ); /* brace_decode() converts null-terminated BRACE-encoded ASCII */ /* input to UTF-16 output. The input length (including the null */ /* terminator) must not exceed brace_encoder_in_max, and output */ /* array must have at least the size indicated below. Returns 1 */ /* on success, 0 if the input was malformed. If 0 is returned */ /* the output array may contain garbage, but *output_length will */ /* not have been affected. *//* Implementation (would normally go in its own .c file): */#include <assert.h>static const char base32[] = { 50, 51, 52, 53, 54, 55, 56, 57, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 77, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90};/* We can't use string literals for ASCII characters because *//* an ANSI C compiler does not necessarily use ASCII. */enum encoding_style { half_row_style = 0, full_row_style = 1, mixed_style = 2, no_row_style = 3};/* is_ldh(code) returns 1 if the UTF-16 code represents an LDH *//* character (ASCII letter, digit, or hyphen), 0 otherwise. */static int is_ldh(unsigned short code){ if (code == 45) return 1; if (code < 48) return 0; if (code <= 57) return 1; if (code < 65) return 0; if (code <= 90) return 1; if (code < 97) return 0; if (code <= 122) return 1; return 0;}void brace_encode( unsigned int input_length, unsigned short *input, char output[brace_encoder_out_max] ){ unsigned long queue; enum encoding_style style; unsigned short half_rows[brace_encoder_in_max], half_row_counts[brace_encoder_in_max]; unsigned int num_nonldh, num_half_rows, i, half_row, j, queue_length, best_half_row, next_literal_position, non_hyphen_flag, next_base32_position, code; assert(input_length <= brace_encoder_in_max); /* Count the non-LDH codes and half-rows: */ num_nonldh = 0; num_half_rows = 0; for (i = 0; i < input_length; ++i) { if (is_ldh(input[i])) continue; ++num_nonldh; half_row = input[i] >> 7; for (j = 0; j < num_half_rows; ++j) { if (half_rows[j] == half_row) { ++half_row_counts[j]; break; } } if (j == num_half_rows) { half_rows[num_half_rows] = half_row; half_row_counts[num_half_rows] = 1; ++num_half_rows; } } /* If the input is already a valid label and does not end */ /* with the BRACE signature, output it and we're done: */ if (num_nonldh == 0 && /* all codes are LDH and */ input[0] != 45 && /* first not hyphen and */ input[input_length - 1] != 45 && /* last not hyphen and */ !( input[input_length - 1] == 57 && /* last four not -8Q9 */ ( input[input_length - 2] == 81 || input[input_length - 2] == 113 ) && /* (or -8q9) */ input[input_length - 3] == 56 && input[input_length - 4] == 45 ) ) { for (i = 0; i < input_length; ++i) output[i] = input[i]; output[input_length] = 0; /* null terminator */ return; } /* Choose an encoding style and initialize the bit queue: */ if (num_half_rows == 1) { style = half_row_style; queue_length = 11; queue = half_rows[0]; } else if ( num_half_rows == 2 && (half_rows[0] >> 1) == (half_rows[1] >> 1) ) { style = full_row_style; queue_length = 10; queue = (1 << 8) | (half_rows[0] >> 1); } else { unsigned int M, H, C, Mprime, best_M = 230; /* M is always < 230 */ /* Find the best half-row for mixed style: */ best_half_row = 512; /* half_row is always < 512 */ for (i = 0; i < num_half_rows; ++i) { half_row = half_rows[i]; H = half_row_counts[i]; C = 0; for (j = 0; j < num_half_rows; ++j) { if (j != i && (half_rows[j] >> 1) == (half_row >> 1)) { C = half_row_counts[j]; break; } } M = 3 + (18 * num_nonldh - 10*H - 9*C) / 5; if (M < best_M || (M == best_M && half_row < best_half_row)) { best_M = M; best_half_row = half_row; } } /* Compare mixed style to no-row style: */ Mprime = (6 + 16 * num_nonldh) / 5; if (Mprime <= best_M) { style = no_row_style; queue_length = 2; queue = 3; } else { style = mixed_style; queue_length = 11; queue = (1 << 10) | best_half_row; } } /* Flush the bit queue: */ next_base32_position = 0; while (queue_length >= 5) { queue_length -= 5; output[next_base32_position++] = base32[(queue >> queue_length) & 0x1f]; } /* To avoid unnecessary copies, we use the output */ /* array itself for the LDH buffer. The following */ /* equalities should hold whenever the buffer is empty: */ next_literal_position = next_base32_position + (queue_length > 0); non_hyphen_flag = 0; /* set whenever buffer contains a non-hyphen */ /* Main encoding loop: */ for (i = 0; i < input_length; ++i) { code = input[i]; if (code == 45) { /* Encode a hyphen as two hyphens into the buffer: */ output[next_literal_position++] = 45; output[next_literal_position++] = 45; } else if (is_ldh(code)) { if (!non_hyphen_flag) { /* Indicate a change to literal mode: */
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -