📄 notes
字号:
Q: Why does libiconv support encoding XXX? Why does libiconv not support encoding ZZZ?A: libiconv, as an internationalization library, supports those character sets and encodings which are in wide-spread use in at least one territory of the world. Hint1: On http://www.w3c.org/International/O-charset-lang.html you find a page "Languages, countries, and the charsets typically used for them". From this table, we can conclude that the following are in active use: ISO-8859-1, CP1252 Afrikaans, Albanian, Basque, Catalan, Danish, Dutch, English, Faroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Scottish, Spanish, Swedish ISO-8859-2 Croatian, Czech, Hungarian, Polish, Romanian, Slovak, Slovenian ISO-8859-3 Esperanto, Maltese ISO-8859-5 Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian ISO-8859-6 Arabic ISO-8859-7 Greek ISO-8859-8 Hebrew ISO-8859-9, CP1254 Turkish ISO-8859-10 Inuit, Lapp ISO-8859-13 Latvian, Lithuanian ISO-8859-15 Estonian KOI8-R Russian SHIFT_JIS Japanese ISO-2022-JP Japanese EUC-JP Japanese Ordered by frequency on the web (1997): ISO-8859-1, CP1252 96% SHIFT_JIS 1.6% ISO-2022-JP 1.2% EUC-JP 0.4% CP1250 0.3% CP1251 0.2% CP850 0.1% MACINTOSH 0.1% ISO-8859-5 0.1% ISO-8859-2 0.0% Hint2: The character sets mentioned in the XFree86 4.0 locale.alias file. ISO-8859-1 Afrikaans, Basque, Breton, Catalan, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, German, Greenlandic, Icelandic, Indonesian, Irish, Italian, Lithuanian, Norwegian, Occitan, Portuguese, Scottish, Spanish, Swedish, Walloon, Welsh ISO-8859-2 Albanian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak, Slovenian ISO-8859-3 Esperanto ISO-8859-4 Estonian, Latvian, Lithuanian ISO-8859-5 Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian ISO-8859-6 Arabic ISO-8859-7 Greek ISO-8859-8 Hebrew ISO-8859-9 Turkish ISO-8859-14 Breton, Irish, Scottish, Welsh ISO-8859-15 Basque, Breton, Catalan, Danish, Dutch, Estonian, Faroese, Finnish, French, Galician, German, Greenlandic, Icelandic, Irish, Italian, Lithuanian, Norwegian, Occitan, Portuguese, Scottish, Spanish, Swedish, Walloon, Welsh KOI8-R Russian KOI8-U Russian, Ukrainian EUC-JP (alias eucJP) Japanese ISO-2022-JP (alias JIS7) Japanese SHIFT_JIS (alias SJIS) Japanese U90 Japanese S90 Japanese EUC-CN (alias eucCN) Chinese EUC-TW (alias eucTW) Chinese BIG5 Chinese EUC-KR (alias eucKR) Korean ARMSCII-8 Armenian GEORGIAN-ACADEMY Georgian GEORGIAN-PS Georgian TIS-620 (alias TACTIS) Thai MULELAO-1 Laothian IBM-CP1133 Laothian VISCII Vietnamese TCVN Vietnamese NUNACOM-8 Inuktitut Hint3: The character sets supported by Netscape Communicator 4. Where is this documented? For the complete picture, I had to use "strings netscape" and then a lot of guesswork. For a quick take, look at the "View - Character set" menu of Netscape Communicator 4.6: ISO-8859-{1,2,5,7,9,15} WINDOWS-{1250,1251,1253} KOI8-R Cyrillic CP866 Cyrillic Autodetect Japanese (EUC-JP, ISO-2022-JP, ISO-2022-JP-2, SJIS) EUC-JP Japanese SHIFT_JIS Japanese GB2312 Chinese BIG5 Chinese EUC-TW Chinese Autodetect Korean (EUC-KR, ISO-2022-KR, but not JOHAB) UTF-8 UTF-7 Hint4: The character sets supported by Microsoft Internet Explorer 4. ISO-8859-{1,2,3,4,5,6,7,8,9} WINDOWS-{1250,1251,1252,1253,1254,1255,1256,1257} KOI8-R Cyrillic KOI8-RU Ukrainian ASMO-708 Arabic EUC-JP Japanese ISO-2022-JP Japanese SHIFT_JIS Japanese GB2312 Chinese HZ-GB-2312 Chinese BIG5 Chinese EUC-KR Korean ISO-2022-KR Korean WINDOWS-874 Thai WINDOWS-1258 Vietnamese UTF-8 UTF-7 UNICODE actually UNICODE-LITTLE UNICODEFEFF actually UNICODE-BIG and various DOS character sets: DOS-720, DOS-862, IBM852, CP866. We take the union of all these four sets. The result is: European and Semitic languages * ASCII. We implement this because it is occasionally useful to know or to check whether some text is entirely ASCII (i.e. if the conversion ISO-8859-x -> UTF-8 is trivial). * ISO-8859-{1,2,3,4,5,6,7,8,9,10} We implement this because they are widely used. Except ISO-8859-4 which appears to have been superseded by ISO-8859-13 in the baltic countries. But it's an ISO standard anyway. * ISO-8859-13 We implement this because it's a standard in Lithuania and Latvia. * ISO-8859-14 We implement this because it's an ISO standard. * ISO-8859-15 We implement this because it's increasingly used in Europe, because of the Euro symbol. * ISO-8859-16 We implement this because it's an ISO standard. * KOI8-R, KOI8-U We implement this because it appears to be the predominant encoding on Unix in Russia and Ukraine, respectively. * KOI8-RU We implement this because MSIE4 supports it. * KOI8-T We implement this because it is the locale encoding in glibc's Tajik locale. * CP{1250,1251,1252,1253,1254,1255,1256,1257} We implement these because they are the predominant Windows encodings in Europe. * CP850 We implement this because it is mentioned as occurring in the web in the aforementioned statistics. * CP862 We implement this because Ron Aaron says it is sometimes used in web pages and emails. * CP866 We implement this because Netscape Communicator does. * Mac{Roman,CentralEurope,Croatian,Romania,Cyrillic,Greek,Turkish} and Mac{Hebrew,Arabic} We implement these because the Sun JDK does, and because Mac users don't deserve to be punished. * Macintosh We implement this because it is mentioned as occurring in the web in the aforementioned statistics. Japanese * EUC-JP, SHIFT_JIS, ISO-2022-JP We implement these because they are widely used. EUC-JP and SHIFT_JIS are more used for files, whereas ISO-2022-JP is recommended for email. * CP932 We implement this because it is the Microsoft variant of SHIFT_JIS, used on Windows. * ISO-2022-JP-2 We implement this because it's the common way to represent mails which make use of JIS X 0212 characters. * ISO-2022-JP-1 We implement this because it's in the RFCs, but I don't think it is really used. * U90, S90 We DON'T implement this because I have no informations about what it is or who uses it.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -