📄 iconv.tex
字号:
@node Iconv@chapter Character-set conversions (@file{iconv.h})This chapter describes the Newlib iconv library.The iconv functions declarations are in@file{iconv.h}.@menu* iconv:: Character set conversion routines* iconv architecture:: Architecture of Newlib iconv library* iconv configuration:: Newlib iconv-specific configure options* Generating CCS tables:: How to generate CCS tables* Adding new converter:: Steps on adding a new converter@end menu@page@include iconv/iconv.def@page@node iconv architecture@section iconv architecture@findex iconv architecture@findex encoding@findex CCS@findex CES@findex iconv converter@*@itemize @bullet@itemEncoding - a rule to represent computer text by means of bits and bytes.@itemCCS (Coded Character Set) - a mapping from an abstract character setto a set of non-negative integers (character codes).@itemCES (Character Encoding Scheme) - a mapping from a set of character codesunits to a sequence of bytes.@end itemize@*Examples of CCS: ASCII, ISO-8859-x, KOI8-R, KSX-1001, GB-2312.@*Examples of CES: UTF-8, UTF-16, EUC-JP, ISO-2022-JP.@*The iconv library is used to convert an array of characters in one encodingto array in another encoding.@*From a user's point of view, the iconv library is a set of converters. Each convertercorresponds to one encoding (e.g., KOI8-R converter, UTF-8 converter).Internally the meaning of converter is different.@*The iconv library always performs conversions through UCS-32: i.e., to convertfrom A to B, iconv library first converts A to UCS-32, and then USC-32 to B.@*Each encoding consists of CES and CCS. CCS may be represented as data tablesbut CES always implies some code (algorithm). Iconv uses CCS tables to map from some encoding to UCS-32. CCS tables are placed intothe iconv/ccs subdirectory of newlib. The iconv code also uses CES modules which can convert some CCS to and from UCS-32. CES modules are placed in the iconv/ces subdirectory.@*Some encodings have CES = CCS (e.g., KOI8-R). For such encodings iconv usesspecial subroutines which perform simple table conversions (ccs_table.c).@*Among specialized CES modules, the iconv library hasgeneric support for EUC and ISO-2022-family encodings (ces_euc.c andces_iso2022.c).@*To enable iconv to work with CCS or CES-based encodings, the correspondentCES table or CCS module should be linked with Newlib. The iconv supportcan also load CCS tables dynamically from external files (.cct files fromiconv/ccs/binary subdirectory). CES modules, on the other-hand, can't be dynamically loaded.@*Each iconv converter has one name and a set of aliases. The list ofaliases for each converter's name is in the iconv/charset.aliases file.Note: iconv always normalizes converter names and aliases before using.@page@node iconv configuration@section iconv configuration@findex iconv configuration@findex iconv converter@*To enable iconv, the --enable-newlib-iconv configuration option should beused when configuring newlib.@*To link a specific converter (CCS table or CES module) into Newlib, the---enable-newlib-builtin-converters option should be used. A comma-separated list of converters can be passed with this option(e.g., ---enable-newlib-builtin-converters=koi8-r,euc-jp to link KOI8-Rand EUC-JP converters). Either converter names or aliases may be used.@*If the target system has a file system accessible by Newlib, table-basedconverters may be loaded dynamically from external files. The iconv code tries to load files from the iconv_data subdirectory of the directory specified by the NLSPATH environment variable.@*Since Newlib has no generic dynamic module load support, CES-based converterscan't be dynamically loaded and should be linked-in.@page@node Generating CCS tables@section Generating CCS tables@*CCS tables are placed in the ccs subdirectory of the iconv directory. This subdirectory contains .cct and .c files. The .cct files are for dynamic loading whereas the .c files are for static linking with Newlib. Both .c and .cct files are generated by the 'iconv_mktbl' perl script from special source files (call them.txt files). The 'iconv_mktbl' script can be found in the iconv/ccssubdirectory. Input .txt files can be found at the Unicode.org site orother locations found on the web.@*The .c files are linked with Newlib if the correspondent 'configure' script option was given. This is needed to use iconv on targets without file system support. If a CCS table isn't configured to be linked, the iconv library tries to load it dynamically from a corresponding .cct file.@*The following are commands to build .c and .cct CCS table files from .txt files for several supported encodings.@*@itemize@itemcp775:@*iconv_mktbl -Co cp775.c cp775.txt@*iconv_mktbl -o cp775.cct cp775.txt@end itemize@itemize@itemcp850:@*iconv_mktbl -Co cp850.c cp850.txt@*iconv_mktbl -o cp850.cct cp850.txt@end itemize@itemize@itemcp852:@*iconv_mktbl -Co cp852.c cp852.txt@*iconv_mktbl -o cp852.cct cp852.txt@end itemize@itemize@itemcp855:@*iconv_mktbl -Co cp855.c cp855.txt@*iconv_mktbl -o cp855.cct cp855.txt@end itemize@itemize@itemcp866@*iconv_mktbl -Co cp866.c cp866.txt@*iconv_mktbl -o cp866.cct cp866.txt@end itemize@itemize@itemiso-8859-1@*iconv_mktbl -Co iso-8859-1.c iso-8859-1.txt@*iconv_mktbl -o iso-8859-1.cct iso-8859-1.txt@end itemize@itemize@itemiso-8859-4@*iconv_mktbl -Co iso-8859-4.c iso-8859-4.txt@*iconv_mktbl -o iso-8859-4.cct iso-8859-4.txt@end itemize@itemize@itemiso-8859-5@*iconv_mktbl -Co iso-8859-5.c iso-8859-5.txt@*iconv_mktbl -o iso-8859-5.cct iso-8859-5.txt@end itemize@itemize@itemiso-8859-2@*iconv_mktbl -Co iso-8859-2.c iso-8859-2.txt@*iconv_mktbl -o iso-8859-2.cct iso-8859-2.txt@end itemize@itemize@itemiso-8859-15@*iconv_mktbl -Co iso-8859-15.c iso-8859-15.txt@*iconv_mktbl -o iso-8859-15.cct iso-8859-15.txt@end itemize@itemize@itembig5@*iconv_mktbl -Co big5.c big5.txt@*iconv_mktbl -o big5.cct big5.txt@end itemize@itemize@itemksx1001@*iconv_mktbl -Co ksx1001.c ksx1001.txt@*iconv_mktbl -o ksx1001.cct ksx1001.txt@end itemize@itemize@itemgb_2312@*iconv_mktbl -Co gb_2312-80.c gb_2312-80.txt@*iconv_mktbl -o gb_2312-80.cct gb_2312-80.txt@end itemize@itemize@itemjis_x0201@*iconv_mktbl -Co jis_x0201.c jis_x0201.txt@*iconv_mktbl -o jis_x0201.cct jis_x0201.txt@end itemize@itemize@itemiconv_mktbl -Co shift_jis.c shift_jis.txt@*iconv_mktbl -o shift_jis.cct shift_jis.txt@end itemize@itemize@itemjis_x0208@*iconv_mktbl -C -c 1 -u 2 -o jis_x0208-1983.c jis_x0208-1983.txt@*iconv_mktbl -c 1 -u 2 -o jis_x0208-1983.cct jis_x0208-1983.txt@end itemize@itemize@itemjis_x0212@*iconv_mktbl -Co jis_x0212-1990.c jis_x0212-1990.txt@*iconv_mktbl -o jis_x0212-1990.cct jis_x0212-1990.txt@end itemize@itemize@itemcns11643-plane1@*iconv_mktbl -C -p 0x1 -o cns11643-plane1.c cns11643.txt@*iconv_mktbl -p 0x1 -o cns11643-plane1.cct cns11643.txt@end itemize@itemize@itemcns11643-plane2@*iconv_mktbl -C -p 0x2 -o cns11643-plane2.c cns11643.txt@*iconv_mktbl -p 0x2 -o cns11643-plane2.cct cns11643.txt@end itemize@itemize@itemcns11643-plane14@*iconv_mktbl -C -p 0xE -o cns11643-plane14.c cns11643.txt@*iconv_mktbl -p 0xE -o cns11643-plane14.cct cns11643.txt@end itemize@itemize@itemkoi8-r@*iconv_mktbl -Co koi8-r.c koi8-r.txt@*iconv_mktbl -o koi8-r.cct koi8-r.txt@end itemize@itemize@itemkoi8-u@*iconv_mktbl -Co koi8-u.c koi8-u.txt@*iconv_mktbl -o koi8-u.cct koi8-u.txt@end itemize@itemize@itemus-ascii@*iconv_mktbl -Cao us-ascii.c iso-8859-1.txt@*iconv_mktbl -ao us-ascii.cct iso-8859-1.txt@end itemize@*Source files for CCS tables can be taken from at least two places:@*@enumerate@itemhttp://www.unicode.org/Public/MAPPINGS/ contains a lot of encoding map files.@itemhttp://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains original iconv sources and encoding map files.@end enumerate@*The following are URLs where source files for some of the CCS tables are found:@itemize@itembig5:@* http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT@end itemize@itemize@itemcns11643_plane14, cns11643_plane1 and cns11643_plane2:@*http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/CNS11643.TXT@end itemize@itemize@itemcp775, cp850, cp852, cp855, cp866:@*http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/@end itemize@itemize@itemgb_2312_80:@*http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT@end itemize@itemize@itemiso_8859_15, iso_8859_1, iso_8859_2, iso_8859_4, iso_8859_5:@*http://www.unicode.org/Public/MAPPINGS/ISO8859/@end itemize@itemize@itemjis_x0201, jis_x0208_1983, jis_x0212_1990, shift_jis@*http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT@end itemize@itemize@itemkoi8_r@*http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT@end itemize@itemize@itemksx1001@*http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSX1001.TXT@end itemize@itemize@itemkoi8-u can be given from original FreeBSD iconv library distributionhttp://www.dante.net/staff/konstantin/FreeBSD/iconv/@end itemize@*Moreover, http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains a lot of additional CCS tables that you can use with Newlib (iso-2022 andRFC1345 encodings).@page@node Adding new converter@section Adding a new iconv converter@*The following steps should be taken to add a new iconv converter: @*@enumerate@itemConverter's name and aliases list should be added to the iconv/charset.aliases file@itemAll iconv converters are protected by a _ICONV_CONVERTER_XXXmacro, where XXX is converter name. This protection macro should be added tonewlib/newlib.hin file.@itemConverter's name and aliases should be also registered in _iconv_builtin_aliasestable in iconv/lib/bialiasesi.c. The list should be protected bythe corresponding macro mentioned above.@itemIf a new converter is just a CCS table, the corresponding .cct and .c filesshould be added to the iconv/ccs/ subdirectory. The name of the files should be equivalent to the normalized encoding name. The 'iconv_mktbl' Perl script (found in iconv/ccs) maybe used to generate such files. The file's name should be added toiconv/ccs/Makefile.am and iconv/ccs/binary/Makefile.am files and thenautomake should be used to regenerate the Makefile.in files.@itemIf a new converter has a CES algorithm, the appropriate file should be added to theiconv/ces/ subdirectory. The name of the file again should be equivalent to the normalizedencoding name.@itemIf a converter is EUC or ISO-2022-family CES, then the converteris just an array with a list of used CCS (See ccs/euc-jp.c for example). Thisis because iconv already has EUC and ISO-2022 support. Used CCS tables shouldbe provided in iconv/ccs/.@itemIf a converter isn't EUC or ISO-2022-based CCS, the following two functionsshould be provided (see utf-8.c for example):@enumerate -@item A function to convert from new CES to UCS-32;@item A function to convert from UCS-32 to new CES;@item An 'init' function;@item A 'close' function;@item A 'reset' function to reset shift state for stateful CES.@end enumerate@*All these functions are registered into a 'struct iconv_ces_desc' object.The name of the object should be _iconv_ces_module_XXX, where XXX is thename of the converter.@itemFor CES converters the correspondent 'struct iconv_ces_desc' reference shouldbe added into iconv/lib/bices.c file.@*For CCS converters, the corresponding table reference should be added intothe iconv/lib/biccs.c file.@end enumerate
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -