📄 draft-ietf-idn-amc-ace-m-00.txt
字号:
<de> = U+3067 (hiragana) <suru> = U+3059 U+308B (hiragana) <byou><mae> = U+79D2 U+524D (kanji) UTF-8: Maji???Koi??????5?????? UTF-16: ?????????????????????????? AMC-M: bsm-Maji-r-Koi-b2m-5-z37cxuwp BRACE: ji8-Maji-g-Koi-qe7x-5-wx7p6ma DUDE: Mdhqpj067G06bvpj059obg035n9d2l24d RACE: 3aag2adbabvaa2jqm4agwadpabutawjqrmadk6oskjgq LACE: 74ag2adbabvaa2jqm4agwadpabutawjqrmadk6oskjgq (F) <pafii>de<runba> (Japanese song title) <pafii> = U+30D1 U+30D5 U+30A3 U+30FC (katakana) <runba> = U+30EB U+30F3 U+30D0 (katakana) UTF-16: ?????????????? BRACE: 3iu8pazt-de-pygi AMC-M: bs3jp4d9n-de-8m9di RACE: gdi5li7475sp6zpl6pia DUDE: j0d1lq3vcg064lj0ebv3t0 UTF-8: ????????????de????????? LACE: aqyndvnd7qbaazdfamyox46q (G) <sono><supiido><de> (Japanese song title) <sono> = U+305D U+306E (hiragana) <supiido> = U+30B9 U+30D4 U+30FC U+30C9 (katakana) <de> = U+3067 (hiragana) RACE: gbow5oou7tewo UTF-16: ?????????????? BRACE: bidprdmp9wt7mi LACE: a4yf23vz2t6mszy AMC-M: bsmfyq5j7e9n6jr DUDE: j05dmer9t4vcs9m7 UTF-8: ????????????????????? The next several examples are all translations of the sentence "Why can't they just speak in <language>?" (courtesy of Michael Kaplan's "provincial" page [PROVINCIAL]). Word breaks and punctuation have been removed, as is often done in domain names. (H) Arabic (Egyptian): U+0644 U+064A U+0647 U+0645 U+0627 U+0628 U+062A U+0643 U+0644 U+0645 U+0648 U+0634 U+0639 U+0631 U+0628 U+064A U+061F DUDE: m44qnli7oqk3kloj4phi8kahf BRACE: 28akcjwcmp3ciwb4t3ngd4nbaz AMC-M: agiekhfuhuiukdefivevjvbuiktr RACE: azceur2fe4ucuq2eivediojrfbfb6 LACE: cedeisshiutsqksdircuqnbzgeueuhy UTF-16: ?????????????????????????????????? UTF-8: ?????????????????????????????????? (I) Chinese (simplified): U+4ED6 U+4EEC U+4E3A U+4EC0 U+4E48 U+4E0D U+8BF4 U+4E2D U+6587 UTF-16: ?????????????????? BRACE: kgcqqsgp26i5h4zn7req5i AMC-M: uqj7g8nvk6awispn9wupdnh DUDE: ked6ucjas0k8gdobf4ke2dm587 UTF-8: ??????????????????????????? LACE: azhnn3b2ybea2aml6qau4libmwdq RACE: 3bhnmtxmjy5e5qcojbha3c7ujywwlby (J) Czech: Pro<ccaron>prost<ecaron>nemluv<iacute><ccaron>esky <ccaron> = U+010D <ecaron> = U+011B <iacute> = U+00ED UTF-8: Pro??prost??nemluv????esky AMC-M: g26-Pro-p-prost-9m-nemluv-6pp-esky BRACE: i32-Pro-u-prost-8y-nemluv-29f3n-esky DUDE: N0imfh0dg70imfn3kh1bg6eltsn5mudh0dg65n3mbn9 UTF-16: ???????????????????????????????????????????? LACE: amaha4tpaeaq2biaobzg643uaearwbyanzsw23dvo3wqcainaqagk43\ lpe RACE: ah7xb73s75xq373q75zp6377op7xig77n37wl73n75wp65p7o3762dp\ 7mx7xh73l754q (K) Hebrew: U+05DC U+05DE U+05D4 U+05D4 U+05DD U+05E4 U+05E9 U+05D5 U+05D8 U+05DC U+05D0 U+05DE U+05D3 U+05D1 U+05E8 U+05D9 U+05DD U+05E2 U+05D1 U+05E8 U+05D9 U+05EA AMC-M: af4nqeep8e8jfinaqdb8ijp8cb8ij8k DUDE: ldcukktu4pt5osgujhu8t9tu2t1u8t9ua BRACE: 27vkyp7bgwmbpfjgc4ynx5nd8xsp5nd9c RACE: axon5vgu3xsotvoy3tin5u6r5dm53ywr5dm6u LACE: cyc5zxwu2to6j2ov3donbxwt2huntxpc2hunt2q UTF-8: ???????????????????????????????????????????? UTF-16: ???????????????????????????????????????????? (L) Hindi: U+092F U+0939 U+0932 U+094B U+0917 U+0939 U+093F U+0928 U+094D U+0926 U+0940 U+0915 U+094D U+092F U+094B U+0902 U+0928 U+0939 U+0940 U+0902 U+092C U+094B U+0932 U+0938 U+0915 U+0924 U+0947 U+0939 U+0948 U+0902 (Devanagari) BRACE: 2b7xtenqdr7zc6uma2pmcz7ibage237kdemicnk9gei32 RACE: bextsmslc44t6kcnezabktjpjmbcqokaaiwewmrycuseookiai LACE: dyes6ojsjmltspzijuteafknf5fqekbziabcyszshaksirzzjaba AMC-M: ajhurbvcwmthbhuiwpugitfwpurwmscuibiscunwmvcatfuerbwisc DUDE: p2fj9ikbh7j9vi8kdi6k0h5kdifkbg2i8j9k0g2ickbj2oh5i4k7j9k\ 8g2 UTF-16: ???????????????????????????????????????????????????????\ ????? UTF-8: ???????????????????????????????????????????????????????\ ??????????????????????????????????? (M) Korean: U+C138 U+ACC4 U+C758 U+BAA8 U+B4E0 U+C0AC U+B78C U+B4E4 U+C774 U+D55C U+AD6D U+C5B4 U+B97C U+C774 U+D574 U+D55C U+B2E4 U+BA74 U+C5BC U+B9C8 U+B098 U+C88B U+C744 U+AE4C (Hangul syllables) UTF-16: ???????????????????????????????????????????????? UTF-8: ???????????????????????????????????????????????????????\ ????????????????? AMC-M: yhxcj2w6exiaxi68acfn92n68ezehk6xypdpwam6zehmwhk648eavwd\ p6aqi23ieemweywn BRACE: y394qebjusrcndbs82pkvstf96sxufcr7ffr4vbgdwsxufcx8pdktgb\ gmnsqydmk7im56arju6pt82 LACE: 77atrlgey5mlvkfu4dakzn4mwtsmo5gvlsww3rnuxf6mo5gvotkvzmx\ exj2mlpfzzcyjrsely5ck4ta RACE: 3datrlgey5mlvkfu4dakzn4mwtsmo5gvlsww3rnuxf6mo5gvotkvzmx\ exj2mlpfzzcyjrsely5ck4ta DUDE: s138qcc4s758raa8ke0s0acr78cke4s774t55cqd6ds5b4r97cs774t\ 574lcr2e4q74s5bcr9c8g98s88bn44qe4c (N) Russian: U+041F U+043E U+0447 U+0435 U+043C U+0443 U+0436 U+0435 U+043E U+043D U+0438 U+043D U+0435 U+0433 U+043E U+0432 U+043E U+0440 U+044F U+0442 U+043F U+043E U+0440 U+0443 U+0441 U+0441 U+043A U+0438 (Cyrillic) DUDE: K3fuk7j5sk3j6lutotljuiuk0vijfuk0jhhjao AMC-M: aehHgrvfemvgvfgfafvfvdgvcgiwrkhgimjjca BRACE: 269xyjvcyafqfdwyr3xfd8z8byi6z39xyi692s7ug2 RACE: aq7t4rzvhrbtmnj6hu4d2njthyzd4qcpii7t4qcdifatuoa LACE: dqcd6pshgu6egnrvhy6tqpjvgm7depsaj5bd6psainaucory UTF-16: ???????????????????????????????????????????????????????\ ??? UTF-8: ??????????????????????????????????????????????????????? ??? (O) Spanish: Porqu<eacute>nopuedensimplementehablarenEspa<ntilde>ol <eacute> = U+00E9 <ntilde> = U+00F1 UTF-8: Porqu??nopuedensimplementehablarenEspa??ol AMC-M: aa7-Porqu-b-nopuedensimplementehablarenEspa-j-ol BRACE: 22x-Porqu-9-nopuedensimplementehablarenEspa-j-ol DUDE: N0mfn2hlu9mevn0lm5klun3m9tn0mcltlun4m5ohishn2m5uLn3gm1v\ 1mfs RACE: abyg64troxuw433qovswizloonuw24dmmvwwk3tumvugcytmmfzgk3t\ fonygd4lpnq LACE: faaha33sof26s3tpob2wkzdfnzzws3lqnrsw2zloorswqylcnrqxezl\ omvzxayprn5wa UTF-16: ???????????????????????????????????????????????????????\ ????????????????????????? (P) Taiwanese: U+4ED6 U+5011 U+7232 U+4EC0 U+9EBD U+4E0D U+8AAA U+4E2D U+6587 UTF-16: ?????????????????? UTF-8: ??????????????????????????? AMC-M: uqj7g2tbgtu6a385pspnxkupdnh BRACE: kgcqui49gatc2wyrn8y7cndgte9 RACE: 3bhnmuaroize5qe6xvha3cvkjywwlby LACE: 75hnmuaroize5qe6xvha3cvkjywwlby DUDE: ked6l011n232kec0pebdke0doaaake2dm587 (Q) Vietnamese: Ta<dotbelow>isaoho<dotbelow>kh<ocirc>ngth<ecirc><hookabove>chi\ <hookabove>no<acute>iti<ecirc><acute>ngVi<ecirc><dotbelow>t <dotbelow> = U+0323 <ocirc> = U+00F4 <ecirc> = U+00EA <hookabove> = U+0309 <acute> = U+0301 UTF-8: Ta??isaoho??kh??ngth????chi??no??iti????ngVi????t AMC-M: ada-Ta-ud-isaoho-ud-kh-s9e-ngth-s8kj-chi-j-no-b-iti-s8k\ b-ngVi-s8kud-t BRACE: i54-Ta-8-isaoho-ay-kh-29n-ngth-s2xa6i-chi-k-no-2g-iti-2\ 9c29-ngVi-25p48-t UTF-16: ???????????????????????????????????????????????????????\ ????????????????????? DUDE: N4m1j23g69n3m1vovj23g6bov4menn4m8uaj09g63opj09g6evj01g6\ 9n4m9uaj01g6enN6m9uaj23g74 LACE: aiahiyibamrqmadjonqw62dpaebsgcaannupi3thoruouaidbebqay3\ ineaqgcicabxg6aidaecaa2lunhvacaybauag4z3wnhvacazdaeahi RACE: ap7xj73bep7wt73t75q76377nd7w6i77np7wr77u75xp6z77ot7wr77\ kbh7wh73i75uqt73o75xqd73j752p62p75ia763x7m77xn73j77vch7\ 3u The last example is an ASCII string that breaks not only the existing rules for host name labels but also the rules proposed in [NAMEPREP02] for internationalized domain names. (R) -> $1.00 <- UTF-8: -> $1.00 <- DUDE: -jei0kj1iej0gi0jc- RACE: aawt4ibegexdambahqwq LACE: bmac2praeqys4mbqea6c2 UTF-16: ?????????????????????? AMC-M: aae--vqae-1-q-00-avn-- BRACE: 229--t2b4-1-w-00-i9i--Security considerations Users expect each domain name in DNS to be controlled by a single authority. If a Unicode string intended for use as a domain label could map to multiple ACE labels, then an internationalized domain name could map to multiple ACE domain names, each controlled by a different authority, some of which could be spoofs that hijack service requests intended for another. Therefore AMC-ACE-M is designed so that each Unicode string has a unique encoding. However, there can still be multiple Unicode representations of the "same" text, for various definitions of "same". This problem is addressed to some extent by the Unicode standard under the topic of canonicalization, but some text strings may be misleading or ambiguous to humans when used as domain names, such as strings containing dots, slashes, at-signs, etc. These issues are being further studied under the topic of "nameprep" [NAMEPREP02].References [ACEID01] Yoshiro Yoneya, Naomasa Maruyama, "Proposal for a determining process of ACE identifier", 2000-Dec-19, draft-ietf-idn-aceid-01. [BRACE00] Adam Costello, "BRACE: Bi-mode Row-based ASCII-Compatible Encoding for IDN version 0.1.2", 2000-Sep-19, draft-ietf-idn-brace-00. [DUDE00] Brian Spolarich, Mark Welter, "DUDE: Differential Unicode Domain Encoding", 2000-Nov-21, draft-ietf-idn-dude-00. [IDN] Internationalized Domain Names (IETF working group), http://www.i-d-n.net/, idn@ops.ietf.org. [LACE01] Paul Hoffman, Mark Davis, "LACE: Length-based ASCII Compatible Encoding for IDN", 2001-Jan-05, draft-ietf-idn-lace-01. [NAMEPREP02] Paul Hoffman, Marc Blanchet, "Preparation of Internationalized Host Names", 2001-Jan-17, draft-ietf-idn-nameprep-02. [PROVINCIAL] Michael Kaplan, "The 'anyone can be provincial!' page", http://www.trigeminal.com/samples/provincial.html. [RACE03] Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding for IDN", 2000-Nov-28, draft-ietf-idn-race-03. [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host Table Specification", 1985-Oct, RFC 952. [RFC1034] P. Mockapetris, "Domain Names - Concepts and Facilities", 1987-Nov, RFC 1034. [RFC1123] Internet Engineering Task Force, R. Braden (editor), "Requirements for Internet Hosts -- Application and Support", 1989-Oct, RFC 1123. [SACE] Dan Oscarsson, "Simple ASCII Compatible Encoding (SACE)", draft-ietf-idn-sace-*. [UNICODE] The Unicode Consortium, "The Unicode Standard", http://www.unicode.org/unicode/standard/standard.html. [UTF5] James Seng, Martin Duerst, Tin Wee Tan, "UTF-5, a Transformation Format of Unicode and ISO 10646", draft-jseng-utf5-*. [UTF6] Mark Welter, Brian W. Spolarich, "UTF-6 - Yet Another ASCII-Compatible Encoding for IDN", draft-ietf-idn-utf6-*. [UTFCONV] Mark Davis, "UTF Converter", http://www.macchiato.com/unicode/convert.html.Author Adam M. Costello <amc@cs.berkeley.edu> http://www.cs.berkeley.edu/~amc/Example implementation/******************************************//* amc-ace-m.c 0.1.0 (2001-Feb-12-Mon) *//* Adam M. Costello <amc@cs.berkeley.edu> *//******************************************//* This is ANSI C code implementing AMC-ACE-M version 0.1.*. *//************************************************************//* Public interface (would normally go in its own .h file): */#include <limits.h>enum amc_ace_status { amc_ace_success, amc_ace_invalid_input, amc_ace_output_too_big};enum case_sensitivity { case_sensitive, case_insensitive };#if UINT_MAX >= 0x10FFFFtypedef unsigned int u_code_point;#elsetypedef unsigned long u_code_point;#endifint amc_ace_m_encode( unsigned int input_length, const u_code_point *input, const unsigned char *uppercase_flags, unsigned int *output_size, unsigned char *output ); /* amc_ace_m_encode() converts Unicode to AMC-ACE-M. The input */ /* must be represented as an array of Unicode code points */ /* (not code units; surrogate pairs are not allowed), and the */ /* output will be represented as null-terminated ASCII. The */ /* input_length is the number of code points in the input. The */ /* output_size is an in/out argument: the caller must pass */ /* in the maximum number of characters that may be output */ /* (including the terminating null), and on successful return */ /* it will contain the number of characters actually output */ /* (including the terminating null, so it will be one more than */
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -