📄 ug_ch9.htm
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"><html><head><meta name="generator" content="HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org"><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"><meta name="Generator" content="Microsoft Word 97"><title>db.* User's Guide Chapter 9</title></head><body><h1><a name="Unicode"></a>Chapter 9<br><i>db.*</i> Unicode Support</h1><h2><a name="Compiler"></a>9.1 Unicode Support at the Compiler andOS Level</h2><p><font size="2">Support for Unicode varies widely for differentoperating systems; consequently, these differences affect thelevels of support available in <b><i>db.*</i></b>. Currently,Unicode support in Linux and UNIX is limited.</font></p><p>On Linux and UNIX, Unicode string manipulation functions areprovided, but I/O functions are not. Some display functions, suchas <b>puts</b> and <b>gets</b>, have Unicode equivalents, but notall. This level of support makes it unlikely that as a<b><i>db.*</i></b> user, you will write applications that use onlyUnicode strings, or that you will require a <b><i>db.*</i></b>library with Unicode string arguments.</p><p>The most common standard for storing international charactersets on UNIX and Linux is UTF-8 (see note below). This encodingstandard provides a method for mapping Unicode characters tomulti-byte sequences, so that they can be stored as conventional8-bit character strings. <b><i>db.*</i></b> supports UTF-8 onplatforms where the standard is available.</p><blockquote><b><i>Note:</i></b> The acronym UCS stands for"Universal Character Set". The set was developed jointly by theUnicode Consortium and the International Organization forStandardization (ISO). UTF is an acronym for "UCS TransformationFormat".</blockquote><h3><a name="String"></a>String Fields in the Database</h3><p><font size="2">Regardless of whether the <b><i>db.*</i></b>library accepts Unicode or ANSI string arguments, or does internalstring manipulation in Unicode, a need may exist for Unicode stringfields in a database on any platform that supports Unicode at anylevel.</font></p><p>Although Unicode strings are really just arrays of unsigned<b>short</b>s, <b>int</b>s, or <b>long</b>s (according toplatform), they cannot be handled exactly the same way as theseunderlying data types. If Unicode string fields are defined askeys, <b><i>db.*</i></b> must sort them using the string collationfunctions provided in the OS for doing locale-specific collation.This differs from the binary sorting used for fields defined as<b>short</b>, <b>int</b>, or <b>long</b>. Thus, <b><i>db.*</i></b>must support Unicode string fields as a separate type from theirunderlying data types. Apart from their size, Unicode string fieldsare treated in much the same way as regular, <b>char</b> stringfields.</p><h3>Definitions</h3><p><font size="2">The data type used for Unicode characters iscalled <b>wchar_t</b> on all the systems supported by<b><i>db.*</i></b>. (Most of these systems also define<b>_WCHAR_T</b>, which is the equivalent of <b>wchar_t</b>.)Unfortunately, the definition of <b>wchar_t</b> is not the sameacross all platforms:</font></p><p align="center"><b>Table 9-1. OS Definitions for wchar_t</b></p><table cellspacing="0" border="0" cellpadding="7" width="542"><tr><td width="36%" valign="top"><p><b><font size="2">Platform</font></b></p></td><td width="64%" valign="top"><p><b><font size="2">wchar_t Definition</font></b></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">Solaris</font></p></td><td width="64%" valign="top"><p><font size="2">long</font></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">Unixware</font></p></td><td width="64%" valign="top"><p><font size="2">long</font></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">QNX</font></p></td><td width="64%" valign="top"><p><font size="2">long</font></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">HPUX</font></p></td><td width="64%" valign="top"><p><font size="2">unsigned int</font></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">Linux</font></p></td><td width="64%" valign="top"><p><font size="2">unsigned long</font></p></td></tr></table><p><font size="2">All of the platforms in the table above provideUnicode equivalents of string handling functions such as<b>strcpy</b>, <b>strcmp</b>, etc. Typically, these functions arenamed <b>wcscpy</b>, <b>wcscmp</b>, etc.</font></p><h2><a name="Implementation"></a>9.2 <i>db.*</i>Implementation</h2><p><font size="2">Support in <b><i>db.*</i></b> for Unicode isimplemented by Unicode data fields (data type <b>wchar_t</b>),which are supported on all platforms.</font></p><p>To enable support, the preprocessor symbol UNICODE_DATA must bedefined when <b><i>db.*</i></b> is compiled. This symbol is definedautomatically in <b>db.star.h</b>, on all platforms.</p><p>The <b><i>db.*</i></b> product also allows Unicode strings to bestored in <b>char</b> fields, using UTF-8 format. Provided theruntime locale is set correctly, and the <b><i>db.*</i></b> runtimeoption <b>MBSSORT</b> is enabled, key fields containing UTF-8strings will be sorted correctly.</p><h3><a name="Fields"></a>Unicode Data Fields</h3><p><font size="2">In <b><i>db.*</i></b> (on all platforms), the<b>ddlp</b> utility recognizes the data type <b>wchar_t</b> fordatabase fields. In the database dictionary (DBD file), thesefields are represented by the character "C" similar to lowercase"c" for <b>char</b> fields. The size of these fields issystem-dependent. On Linux and UNIX, its size is 4bytes.</font></p><p>The Country Table is not used in sorting Unicode data. Sincethese characters are sorted correctly by the Unicode stringcollation functions, there is no need for any mechanism such as theCountry Table. The "ignorecase" option (specified in file<b>db.star.ctb</b>, or through function <b>d_on_opt</b>) is,however, recognized when Unicode database fields are sorted.</p><p>Note that the Country Table is still usable with <b>char</b>fields in <b><i>db.*</i></b>.</p><h2><a name="Prototypes"></a>9.3 Unicode Prototypes for SpecificFunctions</h2><p><font size="2">The prototypes for approximately one-third of the<b><i>db.*</i></b> library functions are different in the Unicodeversion than in the standard version. These are all functionscontaining <b>char*</b> data types. In the majority of these,<b>char*</b> becomes <b>wchar_t*</b>.</font></p><p><a href="UG_Ch10.htm">Next Page</a></p></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -