⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 iconv.tex

📁 KPIT GNU Tools is a set of GNU development tools for Renesas microcontrollers.
💻 TEX
📖 第 1 页 / 共 3 页
字号:
@*-------------------------------------@*Ranges@*-------------------------------------@*Unranged codes array@*-------------------------------------@*The @dfn{Unranged codes array index} @emph{size_arr} section helps to findthe offset of the needed range in the @emph{size_arr} and hasthe following format (triads):@*the first code in range, the last code in range, range offset.@*The array of these triads is sorted by the firs element, therefore it ispossible to quickly find the needed range index.@*Each range has the corresponding sub-array containing the "to" codes. Thesesub-arrays are stored in the place marked as "Ranges" in the layoutdiagram. @*The "Unranged codes array" contains pairs ("from" code, "to" code") foreach unranged code. The array of these pairs is sorted by "from" codevalues, therefore it is possible to find the needed pair quickly.@*Note, that each range requires 6 bytes to form its index. If, forexample, there are two ranges (1 - 5 and 9 - 10), and one unranged code(7), 12 bytes are needed for two range indexes and 4 bytes for the unrangedcode (total 16). But it is better to join both ranges as 1 - 10 andmark codes 6 and 8 as absent. In this case, only 6 additional bytes for therange index and 4 bytes to mark codes 6 and 8 as absent are needed(total 10 bytes). This optimization is done in the size-optimized tables.Thus, ranges may contain small gaps. The absent codes in ranges are markedas 0xFFFF.@*Note, a pair of "from" codes is stored by means of unranged codes sincethe number of bytes which are needed to form the range is greater thanthe number of bytes to store two unranged codes (5 against 4).@*The algorithm of searching of the CCS code@emph{X} which corresponds to the UCS-2 code @emph{Y} (input) in the "UCS-2 ->CCS" size-optimized table is as follows.@*@enumerate@item Try to find the corresponding triad in the "Unranged codes arrayindex". Since we are searching in the sorted array, we can do it quickly(divide by 2, compare, etc).@item If the triad is found, fetch the @emph{X} code from the correspondingrange array. If it is 0xFFFF, return an error.@item If there is no corresponding triad, search the @emph{X} code among thesorted unranged codes. Return error, if noting was found.@end enumerate@subsection .cct ant .c CCS Table files@*The .c source files for 8-bit CCS tables have "to_ucs" and "from_ucs"speed-optimized tables. The .c source files for 16-bit CCS tables have"to_ucs_speed", "to_ucs_size", "from_ucs_speed" and "from_ucs_size"tables.@*When .c files are compiled and used, all the 16-bit and 32-bit valueshave the native endian format (Big Endian for the BE systems and LittleEndian for the LE systems) since they are compile for the system beforethey are used.@*In case of .cct files, which are intended for dynamic CCS tablesloading, the CCS tables are stored either in LE or BE format. Since the.cct files are generated by the 'mktbl.pl' Perl script, it is possibleto choose the endianess of the tables. It is also possible to store twocopies (both LE and BE) of the CCS tables in one .cct file. The default.cct files (which come with the Newlib sources) have both LE and BE CCStables. The Newlib iconv library automatically chooses the needed CCS tables(with appropriate endianess).@*Note, the .cct files are only used when the@option{--enable-newlib-iconv-external-ccs} is used.@subsection The 'mktbl.pl' Perl script@*The 'mktbl.pl' script is intended to generate .cct and .c CCS tablefiles from the @dfn{CCS source files}.@*The CCS source files are just text files which has one or more colonswith CCS <-> UCS-2 codes mapping. To see an example of the CCS tablesource files see one of them using URL-s which will be given bellow.@*The following table describes where the source files for CCS table filesprovided by the Newlib distribution are located.@multitable @columnfractions .25 .75@itemName@tabURL@item@tab@itembig5@tabhttp://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT@itemcns11643_plane1cns11643_plane14cns11643_plane2@tabhttp://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/CNS11643.TXT@itemcp775cp850cp852cp855cp866@tabhttp://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/@itemiso_8859_1iso_8859_2iso_8859_3iso_8859_4iso_8859_5iso_8859_6iso_8859_7iso_8859_8iso_8859_9iso_8859_10iso_8859_11iso_8859_13iso_8859_14iso_8859_15@tabhttp://www.unicode.org/Public/MAPPINGS/ISO8859/@itemiso_ir_111@tabhttp://crl.nmsu.edu/~mleisher/csets/ISOIR111.TXT@itemjis_x0201_1976jis_x0208_1990jis_x0212_1990@tabhttp://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT@itemkoi8_r@tabhttp://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT@itemkoi8_ru@tabhttp://crl.nmsu.edu/~mleisher/csets/KOI8RU.TXT@itemkoi8_u@tabhttp://crl.nmsu.edu/~mleisher/csets/KOI8U.TXT@itemkoi8_uni@tabhttp://crl.nmsu.edu/~mleisher/csets/KOI8UNI.TXT@itemksx1001@tabhttp://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSX1001.TXT@itemwin_1250win_1251win_1252win_1253win_1254win_1255win_1256win_1257win_1258@tabhttp://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/@end multitableThe CCS source files aren't distributed with Newlib because of Licenserestrictions in most Unicode.org's files.The following are 'mktbl.pl' options which were used to generate .cctfiles. Note, to generate CCS tables source files @option{-s} optionshould be added.@enumerate@item For the iso_8859_10.cct, iso_8859_13.cct, iso_8859_14.cct, iso_8859_15.cct,iso_8859_1.cct, iso_8859_2.cct, iso_8859_3.cct, iso_8859_4.cct,iso_8859_5.cct, iso_8859_6.cct, iso_8859_7.cct, iso_8859_8.cct,iso_8859_9.cct, iso_8859_11.cct, win_1250.cct, win_1252.cct, win_1254.cctwin_1256.cct, win_1258.cct, win_1251.cct,win_1253.cct, win_1255.cct, win_1257.cct,koi8_r.cct, koi8_ru.cct, koi8_u.cct, koi8_uni.cct, iso_ir_111.cct,big5.cct, cp775.cct, cp850.cct, cp852.cct, cp855.cct, cp866.cct, cns11643.cctfiles, only the @option{-i <SRC_FILE_NAME>} option were used.@item To generate the jis_x0208_1990.cct file, the@option{-i jis_x0208_1990.txt -x 2 -y 3} options were used.@item To generate the cns11643_plane1.cct file, the@option{-i cns11643.txt -p1 -N cns11643_plane1  -o cns11643_plane1.cct}options were used.@item To generate the cns11643_plane2.cct file, the@option{-i cns11643.txt -p2 -N cns11643_plane2  -o cns11643_plane2.cct}options were used.@item To generate the cns11643_plane14.cct file, the@option{-i cns11643.txt -p0xE -N cns11643_plane14  -o cns11643_plane14.cct}options were used.@end enumerate@*For more info about the 'mktbl.pl' options, see the 'mktbl.pl -h' output.@*It is assumed that CCS codes are 16 or less bits wide. If there are wider CCS codesin the CCS source file, the bits which are higher then 16 defines plane (see thecns11643.txt CCS source file).@*Sometimes, it is impossible to map some CCS codes to the 16-bit UCS if, for example,several different CCS codes are mapped to one UCS-2 code or one CCS code is mapped tothe pair of UCS-2 codes. In these cases, such CCS codes (@dfn{lostcodes}) aren't just rejected but instead, they are mapped to the defaultUCS-2 code (which is currently the @kbd{?} character's code).@page@node CES converters@section CES converters@findex PCS@*Similar to the CCS tables, CES converters are also split into "from UCS"and "to UCS" parts. Depending on the iconv library configuration, theseparts are enabled or disabled. @*The following it the list of CES converters which are currently presentin the Newlib iconv library.@itemize @bullet@item@emph{euc} - supports the @emph{euc_jp}, @emph{euc_kr} and @emph{euc_tw}encodings. The @emph{euc} CES converter uses the @emph{table} and the@emph{us_ascii} CES converters.@item@emph{table} - this CES converter corresponds to "null" and just performs tables-based conversion using 8- and 16-bit CCS tables. This converteris also used by any other CES converter which needs the CCS table-basedconversions. The @emph{table} converter is also responsible for .cct filesloading.@item@emph{table_pcs} - this is the wrapper over the @emph{table} converterwhich is intended for 16-bit encodings which also use the @dfn{PortableCharacter Set} (@dfn{PCS}) which is the same as the @emph{US-ASCII}.This means, that if the first byte the CCS code is in range of [0x00-0x7f],this is the 7-bit PCS code. Else, this is the 16-bit CCS code. Of course,the 16-bit codes must not contain bytes in the range of [0x00-0x7f].The @emph{big5} encoding uses the @emph{table_pcs} CES converter and the@emph{table_pcs} CES converter depends on the @emph{table} CES converter.@item@emph{ucs_2} - intended for the @emph{ucs_2}, @emph{ucs_2be} and@emph{ucs_2le} encodings support.@item@emph{ucs_4} - intended for the @emph{ucs_4}, @emph{ucs_4be} and@emph{ucs_4le} encodings support.@item@emph{ucs_2_internal} - intended for the @emph{ucs_2_internal} encoding support.@item@emph{ucs_4_internal} - intended for the @emph{ucs_4_internal} encoding support.@item@emph{us_ascii} - intended for the @emph{us_ascii} encoding support. Inprinciple, the most natural way to support the @emph{us_ascii} encodingis to define the @emph{us_ascii} CCS and use the @emph{table} CESconverter. But for the optimization purposes, the specialized@emph{us_ascii} CES converter was created.@item@emph{utf_16} - intended for the @emph{utf_16}, @emph{utf_16be} and@emph{utf_16le} encodings support.@item@emph{utf_8} - intended for the @emph{utf_8} encoding support.@end itemize@page@node The encodings description file@section The encodings description file@findex encoding.deps description file@findex mkdeps.pl Perl script@*To simplify the process of adding new encodings support allowing toautomatically generate a lot of "glue" files.@*There is the 'encoding.deps' file in the @emph{lib/} subdirectory whichis used to describe encoding's properties. The 'mkdeps.pl' Perl scriptuses 'encoding.deps' to generates the "glue" files.@*The 'encoding.deps' file is composed of sections, each section consistsof entries, each entry contains some encoding/CES/CCS description. @*The 'encoding.deps' file's syntax is very simple. Currently only twosections are defined: @emph{ENCODINGS} and @emph{CES_DEPENDENCIES}.@*Each @emph{ENCODINGS} section's entry describes one encoding andcontains the following information.@itemize @bullet@itemEncoding name (the @emph{ENCODING} field). The name shouldbe unique and only one name is possible.@itemThe encoding's CES converter name (the @emph{CES} field). Only one CESconverter is allowed.@itemThe whitespace-separated list of CCS table names which are used by theencoding (the @emph{CCS} field).@itemThe whitespace-separated list of aliases names (the @emph{ENCODING}field).@end itemize@*Note all names in the 'encoding.deps' file have to have the normalizedform.@*Each @emph{CES_DEPENDENCIES} section's entry describes dependencies ofone CES converted. For example, the @emph{euc} CES converter depends onthe @emph{table} and the @emph{us_ascii} CES converter since the@emph{euc} CES converter uses them. This means, that both @emph{table}and @emph{us_ascii} CES converters should be linked if the @emph{euc}CES converter is enabled.@*The @emph{CES_DEPENDENCIES} section defines the following:@itemize @bullet@itemthe CES converter name for which the dependencies are defined in thisentry (the @emph{CES} field);@itemthe whitespace-separated list of CES converters which are needed forthis CES converter (the @emph{USED_CES} field).@end itemize@*The 'mktbl.pl' Perl script automatically solves the following tasks.@itemize @bullet@itemUser works with the iconv library in terms of encodings and doesn't knowanything about CES converters and CCS tables. The script automaticallygenerates code which enables all needed CES converters and CCS tablesfor all encodings, which were enabled by the user.@itemThe CES converters may have dependencies and the script automaticallygenerates the code which handles these dependencies.@itemThe list of encoding's aliases is also automatically generated.@itemThe script uses a lot of macros in order to enable only the minimum setof code/data which is needed to support the requested encodings in therequested directions.@end itemize@*The 'mktbl.pl' Perl script is intended to interpret the 'encoding.deps'file and generates the following files.@itemize @bullet@item@emph{lib/encnames.h} - this header files contains macro definitions for allencoding names@item@emph{lib/aliasesbi.c} - the array of encoding names and aliases. The arrayis used to find the name of requested encoding by it's alias.@item@emph{ces/cesbi.c} - this file defines two arrays(@code{_iconv_from_ucs_ces} and @code{_iconv_to_ucs_ces}) which containdescription of enabled "to UCS" and "from UCS" CES converters and thenames of encodings which are supported by these CES converters.@item@emph{ces/cesbi.h} - this file contains the set of macros which definesthe set of CES converters which should be enabled if only the set ofenabled encodings is given (through macros defined in the@emph{newlib.h} file). Note, that one CES converter may handle severalencodings.@item@emph{ces/cesdeps.h} - the CES converters dependencies are handled inthis file.@item@emph{ccs/ccsdeps.h} - the array of linked-in CCS tables is definedhere.@item@emph{ccs/ccsnames.h} - this header files contains macro definitions for allCCS names.@item@emph{encoding.aliases} - the list of supported encodings and theiraliases which is intended for the Newlib configure scripts in order tohandle the iconv-related configure script options.@end itemize@page@node How to add new encoding@section How to add new encoding@*At first, the new encoding should be broken down to CCS and CES. Then,the process of adding new encoding is split to the following activities.@enumerate@item Generate the .cct CCS file and the .c source file for the newencoding's CCS (if it isn't already present). To do this, the CCS sourcefile should be had and the 'mktbl.pl' script should be used.@item Write the corresponding CES converter (if it isn't alreadypresent). Use the existing CES converters as an example.@itemAdd the corresponding entries to the 'encoding.deps' file and regeneratethe autogenerated "glue" files using the 'mkdeps.pl' script.@itemDon't forget to add entries to the newlib/newlib.hin file.@itemOf course, the 'Makefile.am'-s should also be updated (if new files wereadded) and the 'Makefile.in'-s should be regenerated using the correctversion of 'automake'.@itemDon't forget to update the documentation (the list ofsupported encodings and CES converters).@end enumerateIn case a new encoding doesn't fit to the CES/CCS decomposition model orit is desired to add the specialized (non UCS-based) conversion support,the Newlib iconv library code should be upgraded.@page@node The locale support interfaces@section The locale support interfaces@*The newlib iconv library also has some interface functions (besides the@code{iconv}, @code{iconv_open} and @code{iconv_close} interfaces) whichare intended for the Locale subsystem. All the locale-related code isplaced in the @emph{lib/iconvnls.c} file.@*The following is the description of the locale-related interfaces:@itemize @bullet@item@code{_iconv_nls_open} - opens two iconv descriptors for "CCS ->wchar_t" and "wchar_t -> CCS" conversions. The normalized CCS name ispassed in the function parameters. The @emph{wchar_t} characters encoding iseither ucs_2_internal or ucs_4_internal depending on size of@emph{wchar_t}.@item@code{_iconv_nls_conv} - the function is similar to the @code{iconv}functions, but if there is no character in the output encoding whichcorresponds to the character in the input encoding, the defaultconversion isn't performed (the @code{iconv} function sets such outputcharacters to the @kbd{?} symbol and this is the behavior, which isspecified in SUSv3).@item@code{_iconv_nls_get_state} - returns the current encoding's shift state(the @code{mbstate_t} object).@item@code{_iconv_nls_set_state} sets the current encoding's shift state (the@code{mbstate_t} object).@item@code{_iconv_nls_is_stateful} - checks whether the encoding is statefulor stateless.@item@code{_iconv_nls_get_mb_cur_max} - returns the maximum length (themaximum bytes number) of the encoding's characters.@end itemize@page@node Contact@section Contact@*The author of the original BSD iconv library (Alexander Chuguev) no longersupports that code.@*Any questions regarding the iconv library may be forwarded toArtem B. Bityuckiy (dedekind@@oktetlabs.ru or dedekind@@mail.ru) aswell as to the public Newlib mailing list.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -