📄 draft-ietf-idn-brace-00.txt

📁 bind-3.2.
💻 TXT
📖 第 1 页 / 共 3 页
字号:
Example strings    All of these examples use Japanese text, merely because that is the    only kind of non-English text that the author has lying around.    Example of no-row style:        An actual music group name coerced into the usual format for        host name labels:            AMURONAMIE-with-super-monkeys        AMURONAMIE stands for five kanji whose Unicode values are (in        order):            U+5B89 U+5BA4 U+5948 U+7F8E U+6075        The BRACE encoding is:            UVJ7FUAQCAHY982XA---with--super--monkeys-8Q9        (Note that the RACE encoding would have been 79 characters long,        and hence unusable.)    Example of mixed style:        An actual song title coerced into the usual format for host name        labels:            hello-another-way-SOREZORENOBASHO        SOREZORENOBASHO stands for five hiragana followed by two kanji,        whose Unicode values are (in order):            U+305D U+308C U+305E U+308C U+306E U+5834 U+6240        The BRACE encoding is:            JI7-hello--another--way---V3JHAEFVD2UFJ62-8Q9    Example of full-row style:        An actual song title, SONOSUPIIDODE, which stands for two        hiragana followed by four katakana followed by one hiragana,        whose Unicode values are:            U+305D U+306E U+30B9 U+30D4 U+30FC U+30C9 U+3067        The BRACE encoding is:            BIDPRDMP9WT7MI-8Q9    Example of half-row style:        An actual song title:            PAFIIdeRUNBA        PAFII stands for four katakana whose Unicode values are:            U+30D1 U+30D5 U+30A3 U+30FC        RUNBA stands for three katakana whose Unicode values are:            U+30EB U+30F3 U+30D0        The BRACE encoding is:            3IU8PAZT-de-PYGI-8Q9    Example of an ASCII string that breaks all the rules of host name    labels:        -> $1.00 <-    The BRACE encoding is:        229--T2B4-1-W-00-I9I---8Q9Security considerations    Users expect each host name in DNS to be controlled by a single    authority.  If a UTF-16 string could map to multiple labels, then    a UTF-16 host name could map to multiple real host names, each    controlled by a different authority, some of which could be spoofs    that hijack service requests intended for another.  Therefore BRACE    is designed so that each UTF-16 string maps to a unique label.    However, there can still be multiple UTF-16 representations    of the "same" text, for various definitions of "same".  This    problem is addressed by the Unicode standard under the topic of    canonicalization, but the issue needs to be studied further in the    context of host names.    Also, some text strings may be misleading or ambiguous to humans,    such as strings containing dots, slashes, at-signs, etc.  Policies    for allowable Unicode strings need to be developed.References    [IDN] Internationalized Domain Names (IETF working group),    http://www.i-d-n.net/, idn@ops.ietf.org.    [RACE01] Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding    for IDN", 2000-Aug-31, draft-ietf-idn-race-01.    [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host    Table Specification", 1985-Oct, RFC 952.    [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities",    1987-Nov, RFC 1034.    [RFC1123] Internet Engineering Task Force, R. Braden (editor),    "Requirements for Internet Hosts -- Application and Support",    1989-Oct, RFC 1123.    [SACE] Dan Oscarsson, "Simple ASCII Compatible Encoding (SACE)",    draft-ietf-idn-sace.    [UNICODE] The Unicode Consortium, "The Unicode Standard",    http://www.unicode.org/unicode/standard/standard.html.    [UTF5] James Seng, Martin Duerst, Tin Wee Tan, "UTF-5, a    Transformation Format of Unicode and ISO 10646", draft-jseng-utf5.Author    Adam M. Costello <amc@cs.berkeley.edu>    http://www.cs.berkeley.edu/~amc/Example implementation/* brace.c 0.1.1 (2000-Sep-09-Sat)        *//* Adam M. Costello <amc@cs.berkeley.edu> *//* This is ANSI C code implementing BRACE version 0.1.*. *//* Public interface (would normally go in its own .h file): */enum {  brace_encoder_in_max = 63,  brace_encoder_out_max = 4 + (6 + 16 * brace_encoder_in_max) / 5 + 1,  brace_decoder_in_max = 63 + 1,  brace_decoder_out_max = brace_decoder_in_max - 1};    /* The above constants are the maximum array sizes */    /* that the encoder/decoder will accept/produce    */    /* (including null terminators for ASCII strings). */void brace_encode(  unsigned int input_length,  unsigned short *input,  char output[brace_encoder_out_max] );    /* brace_encode() converts UTF-16 input to null-terminated */    /* BRACE-encoded ASCII output.  The input_length must not  */    /* exceed brace_encoder_in_max, and the output array must  */    /* have at least the size indicated below.  Under those    */    /* constraints, this function never fails.                 */int brace_decode(  char *input,  unsigned int *output_length,  unsigned short output[brace_decoder_out_max] );    /* brace_decode() converts null-terminated BRACE-encoded ASCII   */    /* input to UTF-16 output.  The input length (including the null */    /* terminator) must not exceed brace_encoder_in_max, and output  */    /* array must have at least the size indicated below.  Returns 1 */    /* on success, 0 if the input was malformed.  If 0 is returned   */    /* the output array may contain garbage, but *output_length will */    /* not have been affected.                                       *//* Implementation (would normally go in its own .c file): */#include <assert.h>static const char base32[] = {  50, 51, 52, 53, 54, 55, 56, 57, 65, 66, 67, 68, 69, 70, 71, 72,  73, 74, 75, 77, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90};/* We can't use string literals for ASCII characters because *//* an ANSI C compiler does not necessarily use ASCII.        */enum encoding_style {  half_row_style = 0,  full_row_style = 1,  mixed_style    = 2,  no_row_style   = 3};/* is_ldh(code) returns 1 if the UTF-16 code represents an LDH *//* character (ASCII letter, digit, or hyphen), 0 otherwise.    */static int is_ldh(unsigned short code){  if (code ==  45) return 1;  if (code <   48) return 0;  if (code <=  57) return 1;  if (code <   65) return 0;  if (code <=  90) return 1;  if (code <   97) return 0;  if (code <= 122) return 1;  return 0;}void brace_encode(  unsigned int input_length,  unsigned short *input,  char output[brace_encoder_out_max] ){  unsigned long queue;  enum encoding_style style;  unsigned short half_rows[brace_encoder_in_max],                 half_row_counts[brace_encoder_in_max];  unsigned int num_nonldh, num_half_rows, i, half_row, j, queue_length,               best_half_row, next_literal_position, non_hyphen_flag,               next_base32_position, code;  assert(input_length <= brace_encoder_in_max);  /* Count the non-LDH codes and half-rows: */  num_nonldh = 0;  num_half_rows = 0;  for (i = 0;  i < input_length;  ++i) {    if (is_ldh(input[i])) continue;    ++num_nonldh;    half_row = input[i] >> 7;    for (j = 0;  j < num_half_rows;  ++j) {      if (half_rows[j] == half_row) {        ++half_row_counts[j];        break;      }    }    if (j == num_half_rows) {      half_rows[num_half_rows] = half_row;      half_row_counts[num_half_rows] = 1;      ++num_half_rows;    }  }  /* If the input is already a valid label and does not end */  /* with the BRACE signature, output it and we're done:    */  if (num_nonldh == 0 &&                     /* all codes are LDH and */      input[0] != 45 &&                      /* first not hyphen and  */      input[input_length - 1] != 45 &&       /* last not hyphen and   */      !( input[input_length - 1] == 57 &&    /* last four not -8Q9    */         ( input[input_length - 2] == 81 ||           input[input_length - 2] == 113  ) &&   /* (or -8q9) */         input[input_length - 3] == 56 &&         input[input_length - 4] == 45 ) ) {    for (i = 0;  i < input_length;  ++i) output[i] = input[i];    output[input_length] = 0;  /* null terminator */    return;  }  /* Choose an encoding style and initialize the bit queue: */  if (num_half_rows == 1) {    style = half_row_style;    queue_length = 11;    queue = half_rows[0];  }  else if ( num_half_rows == 2 &&            (half_rows[0] >> 1) == (half_rows[1] >> 1) ) {    style = full_row_style;    queue_length = 10;    queue = (1 << 8) | (half_rows[0] >> 1);  }  else {    unsigned int M, H, C, Mprime, best_M = 230;  /* M is always < 230 */    /* Find the best half-row for mixed style: */    best_half_row = 512;  /* half_row is always < 512 */    for (i = 0;  i < num_half_rows;  ++i) {      half_row = half_rows[i];      H = half_row_counts[i];      C = 0;      for (j = 0;  j < num_half_rows;  ++j) {        if (j != i && (half_rows[j] >> 1) == (half_row >> 1)) {          C = half_row_counts[j];          break;        }      }      M = 3 + (18 * num_nonldh - 10*H - 9*C) / 5;      if (M < best_M || (M == best_M && half_row < best_half_row)) {        best_M = M;        best_half_row = half_row;      }    }    /* Compare mixed style to no-row style: */    Mprime = (6 + 16 * num_nonldh) / 5;    if (Mprime <= best_M) {      style = no_row_style;      queue_length = 2;      queue = 3;    }    else {      style = mixed_style;      queue_length = 11;      queue = (1 << 10) | best_half_row;    }  }  /* Flush the bit queue: */  next_base32_position = 0;  while (queue_length >= 5) {    queue_length -= 5;    output[next_base32_position++] =      base32[(queue >> queue_length) & 0x1f];  }  /* To avoid unnecessary copies, we use the output       */  /* array itself for the LDH buffer.  The following      */  /* equalities should hold whenever the buffer is empty: */  next_literal_position = next_base32_position + (queue_length > 0);  non_hyphen_flag = 0;  /* set whenever buffer contains a non-hyphen */  /* Main encoding loop: */  for (i = 0;  i < input_length;  ++i) {    code = input[i];    if (code == 45) {      /* Encode a hyphen as two hyphens into the buffer: */      output[next_literal_position++] = 45;      output[next_literal_position++] = 45;    }    else if (is_ldh(code)) {      if (!non_hyphen_flag) {        /* Indicate a change to literal mode: */
💿 文件大小 4447 K
👤 上传用户 kzdai22
📂 所属分类网络
🏷️ 相关标签

#bind
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -