iconv.tex

来自「Newlib 嵌入式 C库标准实现代码」· TEX 代码 · 共 1,711 行 · 第 1/3 页
TEX
1,711 行
@node Iconv@chapter Encoding conversions (@file{iconv.h})This chapter describes the Newlib iconv library.The iconv functions declarations are in@file{iconv.h}.@menu* iconv::                           Encoding conversion routines* Introduction::                    Introduction to iconv and encodings* Supported encodings::             The list of currently supported encodings* iconv design decisions::          General iconv library design issues* iconv configuration::             iconv-related configure script options* Encoding names::                  How encodings are named.* CCS tables::                      CCS tables format and 'mktbl.pl' Perl script* CES converters::                  CES converters description* The encodings description file::  The 'encoding.deps' file and 'mkdeps.pl'* How to add new encoding::         The steps to add new encoding support* The locale support interfaces::   Locale-related iconv interfaces* Contact::                         The author contact@end menu@page@include iconv/iconv.def@page@node Introduction@section Introduction@findex encoding@findex character set@findex charset@findex CES@findex CCS@*The iconv library is intended to convert characters from one encoding toanother. It implements iconv(), iconv_open() and iconv_close()calls, which are defined by the Single Unix Specification.@*In addition to these user-level interfaces, the iconv library also hasseveral useful interfaces which are needed to support codingcapabilities of the Newlib Locale infrastructure.  Since Locale support also needs toconvert various character sets to and from the @emph{wide charactersset}, the iconv library shares it's capabilities with the Newlib Localesubsystem. Moreover, the iconv library supports several features which areonly needed for the Locale infrastructure (for example, the MB_CUR_MAX value).@*The Newlib iconv library was created using concepts from another iconvlibrary implemented by Konstantin Chuguev (ver 2.0). The Newlib iconv librarywas rewritten from scratch and contains a lot of improvements with respect tothe original iconv library. @*Terms like @dfn{encoding} or @dfn{character set} aren't well defined andare often used with various meanings. The following are the definitions of termswhich are used in this documentation as well as in the iconv libraryimplementation:@itemize @bullet@item@dfn{encoding} - a machine representation of characters by means of bits;@item@dfn{Character Set} or @dfn{Charset} - just a collection ofcharacters, i.e. the encoding is the machine representation of the character set; @item@dfn{CCS} (@dfn{Coded Character Set}) - a mapping from an character set to aset of integers @dfn{character codes};@item@dfn{CES} (@dfn{Character Encoding Scheme}) - a mapping from a set of charactercodes to a sequence of bytes;@end itemize@*Users usually deal with encodings, for example, KOI8-R, Unicode, UTF-8,ASCII, etc. Encodings are formed by the following chain of steps:@enumerate@itemUser has a set of characters which are specific to his or her language (character set).@itemEach character from this set is uniquely numbered, resulting in an CCS.@itemEach number from the CCS is converted to a sequence of bits or bytes by meansof a CES and form some encoding. Thus, CES may be considered as afunction of CCS which produces some encoding. Note, that CES may beapplied to more than one CCS.@end enumerate@*Thus, an encoding may be considered as one or more CCS + CES.@*Sometimes, there is no CES and in such cases encoding is equivalentto CCS, e.g. KOI8-R or ASCII.@*An example of a more complicated encoding is UTF-8 which is the UCS(or Unicode) CCS plus the UTF-8 CES.@*The following is a brief list of iconv library features:@itemize@itemGeneric architecture;@itemLocale infrastructure support;@itemAutomatic generation of the program code which handlesCES/CCS/Encoding/Names/Aliases dependencies;@itemThe ability to choose size- or speed-optimazedconfiguration;@itemThe ability to exclude a lot of unneeded code and data from the linking step.@end itemize@page@node Supported encodings@section Supported encodings@findex big5@findex cp775@findex cp850@findex cp852@findex cp855@findex cp866@findex euc_jp@findex euc_kr@findex euc_tw@findex iso_8859_1@findex iso_8859_10@findex iso_8859_11@findex iso_8859_13@findex iso_8859_14@findex iso_8859_15@findex iso_8859_2@findex iso_8859_3@findex iso_8859_4@findex iso_8859_5@findex iso_8859_6@findex iso_8859_7@findex iso_8859_8@findex iso_8859_9@findex iso_ir_111@findex koi8_r@findex koi8_ru@findex koi8_u@findex koi8_uni@findex ucs_2@findex ucs_2_internal@findex ucs_2be@findex ucs_2le@findex ucs_4@findex ucs_4_internal@findex ucs_4be@findex ucs_4le@findex us_ascii@findex utf_16@findex utf_16be@findex utf_16le@findex utf_8@findex win_1250@findex win_1251@findex win_1252@findex win_1253@findex win_1254@findex win_1255@findex win_1256@findex win_1257@findex win_1258@*The following is the list of currently supported encodings. The first columncorresponds to the encoding name, the second column is the list of aliases,the third column is its CES and CCS components names, and the fourth columnis a short description.@multitable @columnfractions .20 .26 .24 .30@itemName@tabAliases@tabCES/CCS@tabShort description@item@tab@tab@tab@itembig5@tabcsbig5, big_five, bigfive, cn_big5, cp950@tabtable_pcs / big5, us_ascii @tabThe encoding for the Traditional Chinese.@itemcp775@tabibm775, cspc775baltic@tabtable / cp775@tabThe updated version of CP 437 that supports the balitic languages.@itemcp850@tabibm850, 850, cspc850multilingual@tabtable / cp850@tabIBM 850 - the updated version of CP 437 where several Latin 1 characters have beenadded instead of some less-often used characters like the line-drawingand the greek ones.@itemcp852@tabibm852, 852, cspcp852@tab@tabIBM 852 - the updated version of CP 437 where several Latin 2 characters have been addedinstead of some less-often used characters like the line-drawing and the greek ones.@itemcp855@tabibm855, 855, csibm855@tabtable / cp855@tabIBM 855 - the updated version of CP 437 that supports Cyrillic.@itemcp866@tab866, IBM866, CSIBM866@tabtable / cp866@tabIBM 866 - the updated version of CP 855 which follows more the logical Russian alphabet ordering of the alternative variant that is preferred by many Russian users.@itemeuc_jp@tabeucjp@tabeuc / jis_x0208_1990, jis_x0201_1976, jis_x0212_1990@tabEUC-JP - The EUC for Japanese.@itemeuc_kr@tabeuckr@tabeuc / ksx1001@tabEUC-KR - The EUC for Korean.@itemeuc_tw@tabeuctw@tabeuc / cns11643_plane1, cns11643_plane2, cns11643_plane14@tabEUC-TW - The EUC for Traditional Chinese.@itemiso_8859_1@tabiso8859_1, iso88591, iso_8859_1:1987, iso_ir_100, latin1, l1, ibm819, cp819, csisolatin1@tabtable / iso_8859_1@tabISO 8859-1:1987 - Latin 1, West European.@itemiso_8859_10@tabiso_8859_10:1992, iso_ir_157, iso885910, latin6, l6, csisolatin6, iso8859_10@tabtable / iso_8859_10@tabISO 8859-10:1992 - Latin 6, Nordic.@itemiso_8859_11@tabiso8859_11, iso885911@tabtable / iso_8859_11@tabISO 8859-11 - Thai.@itemiso_8859_13@tabiso_8859_13:1998, iso8859_13, iso885913@tabtable / iso_8859_13@tabISO 8859-13:1998 - Latin 7, Baltic Rim.@itemiso_8859_14@tabiso_8859_14:1998, iso885914, iso8859_14@tabtable / iso_8859_14@tabISO 8859-14:1998 - Latin 8, Celtic.@itemiso_8859_15@tabiso885915, iso_8859_15:1998, iso8859_15, @tabtable / iso_8859_15@tabISO 8859-15:1998 - Latin 9, West Europe, successor of Latin 1.@itemiso_8859_2@tabiso8859_2, iso88592, iso_8859_2:1987, iso_ir_101, latin2, l2, csisolatin2@tabtable / iso_8859_2@tabISO 8859-2:1987 - Latin 2, East European.@itemiso_8859_3@tabiso_8859_3:1988, iso_ir_109, iso8859_3, latin3, l3, csisolatin3, iso88593@tabtable / iso_8859_3@tabISO 8859-3:1988 - Latin 3, South European.@itemiso_8859_4@tabiso8859_4, iso88594, iso_8859_4:1988, iso_ir_110, latin4, l4, csisolatin4@tabtable / iso_8859_4@tabISO 8859-4:1988 - Latin 4, North European.@itemiso_8859_5@tabiso8859_5, iso88595, iso_8859_5:1988, iso_ir_144, cyrillic, csisolatincyrillic@tabtable / iso_8859_5@tabISO 8859-5:1988 - Cyrillic.@itemiso_8859_6@tabiso_8859_6:1987, iso_ir_127, iso8859_6, ecma_114, asmo_708, arabic, csisolatinarabic, iso88596@tabtable / iso_8859_6@tabISO i8859-6:1987 - Arabic.@itemiso_8859_7@tabiso_8859_7:1987, iso_ir_126, iso8859_7, elot_928, ecma_118, greek, greek8, csisolatingreek, iso88597@tabtable / iso_8859_7@tabISO 8859-7:1987 - Greek.@itemiso_8859_8@tabiso_8859_8:1988, iso_ir_138, iso8859_8, hebrew, csisolatinhebrew, iso88598@tabtable / iso_8859_8@tabISO 8859-8:1988 - Hebrew.@itemiso_8859_9@tabiso_8859_9:1989, iso_ir_148, iso8859_9, latin5, l5, csisolatin5, iso88599@tabtable / iso_8859_9@tabISO 8859-9:1989 - Latin 5, Turkish.@itemiso_ir_111@tabecma_cyrillic, koi8_e, koi8e, csiso111ecmacyrillic@tabtable / iso_ir_111@tabISO IR 111/ECMA Cyrillic.@itemkoi8_r@tabcskoi8r, koi8r, koi8@tabtable / koi8_r@tabRFC 1489 Cyrillic.@itemkoi8_ru@tabkoi8ru@tabtable / koi8_ru@tabThe obsolete Ukrainian.@itemkoi8_u@tabkoi8u@tabtable / koi8_u@tabRFC 2319 Ukrainian.@itemkoi8_uni@tabkoi8uni@tabtable / koi8_uni@tabKOI8 Unified.@itemucs_2@tabucs2, iso_10646_ucs_2, iso10646_ucs_2, iso_10646_ucs2, iso10646_ucs2, iso10646ucs2, csUnicode@tabucs_2 / (UCS)@tabISO-10646-UCS-2. Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).@itemucs_2_internal@tabucs2_internal, ucs_2internal, ucs2internal@tabucs_2_internal / (UCS)@tabISO-10646-UCS-2 in system byte order.NBSP is always interpreted as NBSP (BOM isn't supported).@itemucs_2be@tabucs2be@tabucs_2 / (UCS)@tabBig Endian version of ISO-10646-UCS-2 (in fact, equivalent to ucs_2).Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).@itemucs_2le@tabucs2le@tabucs_2 / (UCS)@tabLittle Endian version of ISO-10646-UCS-2.Little Endian, NBSP is always interpreted as NBSP (BOM isn't supported).@itemucs_4@tabucs4, iso_10646_ucs_4, iso10646_ucs_4, iso_10646_ucs4, iso10646_ucs4, iso10646ucs4@tabucs_4 / (UCS)@tabISO-10646-UCS-4. Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).@itemucs_4_internal@tabucs4_internal, ucs_4internal, ucs4internal@tabucs_4_internal / (UCS)@tabISO-10646-UCS-4 in system byte order.NBSP is always interpreted as NBSP (BOM isn't supported).@itemucs_4be@tabucs4be@tabucs_4 / (UCS)@tabBig Endian version of ISO-10646-UCS-4 (in fact, equivalent to ucs_4).Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported).@itemucs_4le@tabucs4le@tabucs_4 / (UCS)@tabLittle Endian version of ISO-10646-UCS-4.Little Endian, NBSP is always interpreted as NBSP (BOM isn't supported).@itemus_ascii
iconv.tex - 源码说明

本页面展示了「Newlib 嵌入式 C库标准实现代码」中的 iconv.tex 源码文件，采用 TEX 编程语言编写，共 1,711 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Newlib相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?