notes

来自「libiconv是一个很不错的字符集转换库。程序接口也很简单」· 代码 · 共 393 行 · 第 1/2 页
TXT
393 行
Q: Why does libiconv support encoding XXX? Why does libiconv not support   encoding ZZZ?A: libiconv, as an internationalization library, supports those character   sets and encodings which are in wide-spread use in at least one territory   of the world.   Hint1: On http://www.w3c.org/International/O-charset-lang.html you find a   page "Languages, countries, and the charsets typically used for them".   From this table, we can conclude that the following are in active use:     ISO-8859-1, CP1252   Afrikaans, Albanian, Basque, Catalan, Danish, Dutch,                          English, Faroese, Finnish, French, Galician, German,                          Icelandic, Irish, Italian, Norwegian, Portuguese,                          Scottish, Spanish, Swedish     ISO-8859-2           Croatian, Czech, Hungarian, Polish, Romanian, Slovak,                          Slovenian     ISO-8859-3           Esperanto, Maltese     ISO-8859-5           Bulgarian, Byelorussian, Macedonian, Russian,                          Serbian, Ukrainian     ISO-8859-6           Arabic     ISO-8859-7           Greek     ISO-8859-8           Hebrew     ISO-8859-9, CP1254   Turkish     ISO-8859-10          Inuit, Lapp     ISO-8859-13          Latvian, Lithuanian     ISO-8859-15          Estonian     KOI8-R               Russian     SHIFT_JIS            Japanese     ISO-2022-JP          Japanese     EUC-JP               Japanese   Ordered by frequency on the web (1997):     ISO-8859-1, CP1252   96%     SHIFT_JIS             1.6%     ISO-2022-JP           1.2%     EUC-JP                0.4%     CP1250                0.3%     CP1251                0.2%     CP850                 0.1%     MACINTOSH             0.1%     ISO-8859-5            0.1%     ISO-8859-2            0.0%   Hint2: The character sets mentioned in the XFree86 4.0 locale.alias file.     ISO-8859-1           Afrikaans, Basque, Breton, Catalan, Danish, Dutch,                          English, Estonian, Faroese, Finnish, French,                          Galician, German, Greenlandic, Icelandic,                          Indonesian, Irish, Italian, Lithuanian, Norwegian,                          Occitan, Portuguese, Scottish, Spanish, Swedish,                          Walloon, Welsh     ISO-8859-2           Albanian, Croatian, Czech, Hungarian, Polish,                          Romanian, Serbian, Slovak, Slovenian     ISO-8859-3           Esperanto     ISO-8859-4           Estonian, Latvian, Lithuanian     ISO-8859-5           Bulgarian, Byelorussian, Macedonian, Russian,                          Serbian, Ukrainian     ISO-8859-6           Arabic     ISO-8859-7           Greek     ISO-8859-8           Hebrew     ISO-8859-9           Turkish     ISO-8859-14          Breton, Irish, Scottish, Welsh     ISO-8859-15          Basque, Breton, Catalan, Danish, Dutch, Estonian,                          Faroese, Finnish, French, Galician, German,                          Greenlandic, Icelandic, Irish, Italian, Lithuanian,                          Norwegian, Occitan, Portuguese, Scottish, Spanish,                          Swedish, Walloon, Welsh     KOI8-R               Russian     KOI8-U               Russian, Ukrainian     EUC-JP (alias eucJP)      Japanese     ISO-2022-JP (alias JIS7)  Japanese     SHIFT_JIS (alias SJIS)    Japanese     U90                       Japanese     S90                       Japanese     EUC-CN (alias eucCN)      Chinese     EUC-TW (alias eucTW)      Chinese     BIG5                      Chinese     EUC-KR (alias eucKR)      Korean     ARMSCII-8                 Armenian     GEORGIAN-ACADEMY          Georgian     GEORGIAN-PS               Georgian     TIS-620 (alias TACTIS)    Thai     MULELAO-1                 Laothian     IBM-CP1133                Laothian     VISCII                    Vietnamese     TCVN                      Vietnamese     NUNACOM-8                 Inuktitut   Hint3: The character sets supported by Netscape Communicator 4.     Where is this documented? For the complete picture, I had to use     "strings netscape" and then a lot of guesswork. For a quick take,     look at the "View - Character set" menu of Netscape Communicator 4.6:     ISO-8859-{1,2,5,7,9,15}     WINDOWS-{1250,1251,1253}     KOI8-R               Cyrillic     CP866                Cyrillic     Autodetect           Japanese  (EUC-JP, ISO-2022-JP, ISO-2022-JP-2, SJIS)     EUC-JP               Japanese     SHIFT_JIS            Japanese     GB2312               Chinese     BIG5                 Chinese     EUC-TW               Chinese     Autodetect           Korean    (EUC-KR, ISO-2022-KR, but not JOHAB)     UTF-8     UTF-7   Hint4: The character sets supported by Microsoft Internet Explorer 4.     ISO-8859-{1,2,3,4,5,6,7,8,9}     WINDOWS-{1250,1251,1252,1253,1254,1255,1256,1257}     KOI8-R               Cyrillic     KOI8-RU              Ukrainian     ASMO-708             Arabic     EUC-JP               Japanese     ISO-2022-JP          Japanese     SHIFT_JIS            Japanese     GB2312               Chinese     HZ-GB-2312           Chinese     BIG5                 Chinese     EUC-KR               Korean     ISO-2022-KR          Korean     WINDOWS-874          Thai     WINDOWS-1258         Vietnamese     UTF-8     UTF-7     UNICODE             actually UNICODE-LITTLE     UNICODEFEFF         actually UNICODE-BIG     and various DOS character sets: DOS-720, DOS-862, IBM852, CP866.   We take the union of all these four sets. The result is:   European and Semitic languages     * ASCII.       We implement this because it is occasionally useful to know or to       check whether some text is entirely ASCII (i.e. if the conversion       ISO-8859-x -> UTF-8 is trivial).     * ISO-8859-{1,2,3,4,5,6,7,8,9,10}       We implement this because they are widely used. Except ISO-8859-4       which appears to have been superseded by ISO-8859-13 in the baltic       countries. But it's an ISO standard anyway.     * ISO-8859-13       We implement this because it's a standard in Lithuania and Latvia.     * ISO-8859-14       We implement this because it's an ISO standard.     * ISO-8859-15       We implement this because it's increasingly used in Europe, because       of the Euro symbol.     * ISO-8859-16       We implement this because it's an ISO standard.     * KOI8-R, KOI8-U       We implement this because it appears to be the predominant encoding       on Unix in Russia and Ukraine, respectively.     * KOI8-RU       We implement this because MSIE4 supports it.     * KOI8-T       We implement this because it is the locale encoding in glibc's Tajik       locale.     * CP{1250,1251,1252,1253,1254,1255,1256,1257}       We implement these because they are the predominant Windows encodings       in Europe.     * CP850       We implement this because it is mentioned as occurring in the web       in the aforementioned statistics.     * CP862       We implement this because Ron Aaron says it is sometimes used in web       pages and emails.     * CP866       We implement this because Netscape Communicator does.     * Mac{Roman,CentralEurope,Croatian,Romania,Cyrillic,Greek,Turkish} and       Mac{Hebrew,Arabic}       We implement these because the Sun JDK does, and because Mac users       don't deserve to be punished.     * Macintosh       We implement this because it is mentioned as occurring in the web       in the aforementioned statistics.   Japanese     * EUC-JP, SHIFT_JIS, ISO-2022-JP       We implement these because they are widely used. EUC-JP and SHIFT_JIS       are more used for files, whereas ISO-2022-JP is recommended for email.     * CP932       We implement this because it is the Microsoft variant of SHIFT_JIS,       used on Windows.     * ISO-2022-JP-2       We implement this because it's the common way to represent mails which       make use of JIS X 0212 characters.     * ISO-2022-JP-1       We implement this because it's in the RFCs, but I don't think it is       really used.     * U90, S90       We DON'T implement this because I have no informations about what it       is or who uses it.
notes - 源码说明

本页面展示了「libiconv是一个很不错的字符集转换库。程序接口也很简单」中的 notes 源码文件，采用编程语言编写，共 393 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫开发者社区收录了大量与libiconv相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?