ug_ch9.htm

来自「db.* (pronounced dee-be star) is an adva」· HTM 代码 · 共 157 行

HTM

157 行

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"><html><head><meta name="generator" content="HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org"><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"><meta name="Generator" content="Microsoft Word 97"><title>db.* User's Guide Chapter 9</title></head><body><h1><a name="Unicode"></a>Chapter 9<br><i>db.*</i> Unicode Support</h1><h2><a name="Compiler"></a>9.1 Unicode Support at the Compiler andOS Level</h2><p><font size="2">Support for Unicode varies widely for differentoperating systems; consequently, these differences affect thelevels of support available in <b><i>db.*</i></b>. Currently,Unicode support in Linux and UNIX is limited.</font></p><p>On Linux and UNIX, Unicode string manipulation functions areprovided, but I/O functions are not. Some display functions, suchas <b>puts</b> and <b>gets</b>, have Unicode equivalents, but notall. This level of support makes it unlikely that as a<b><i>db.*</i></b> user, you will write applications that use onlyUnicode strings, or that you will require a <b><i>db.*</i></b>library with Unicode string arguments.</p><p>The most common standard for storing international charactersets on UNIX and Linux is UTF-8 (see note below). This encodingstandard provides a method for mapping Unicode characters tomulti-byte sequences, so that they can be stored as conventional8-bit character strings. <b><i>db.*</i></b> supports UTF-8 onplatforms where the standard is available.</p><blockquote><b><i>Note:</i></b> The acronym UCS stands for"Universal Character Set". The set was developed jointly by theUnicode Consortium and the International Organization forStandardization (ISO). UTF is an acronym for "UCS TransformationFormat".</blockquote><h3><a name="String"></a>String Fields in the Database</h3><p><font size="2">Regardless of whether the <b><i>db.*</i></b>library accepts Unicode or ANSI string arguments, or does internalstring manipulation in Unicode, a need may exist for Unicode stringfields in a database on any platform that supports Unicode at anylevel.</font></p><p>Although Unicode strings are really just arrays of unsigned<b>short</b>s, <b>int</b>s, or <b>long</b>s (according toplatform), they cannot be handled exactly the same way as theseunderlying data types. If Unicode string fields are defined askeys, <b><i>db.*</i></b> must sort them using the string collationfunctions provided in the OS for doing locale-specific collation.This differs from the binary sorting used for fields defined as<b>short</b>, <b>int</b>, or <b>long</b>. Thus, <b><i>db.*</i></b>must support Unicode string fields as a separate type from theirunderlying data types. Apart from their size, Unicode string fieldsare treated in much the same way as regular, <b>char</b> stringfields.</p><h3>Definitions</h3><p><font size="2">The data type used for Unicode characters iscalled <b>wchar_t</b> on all the systems supported by<b><i>db.*</i></b>. (Most of these systems also define<b>_WCHAR_T</b>, which is the equivalent of <b>wchar_t</b>.)Unfortunately, the definition of <b>wchar_t</b> is not the sameacross all platforms:</font></p><p align="center"><b>Table 9-1. OS Definitions for wchar_t</b></p><table cellspacing="0" border="0" cellpadding="7" width="542"><tr><td width="36%" valign="top"><p><b><font size="2">Platform</font></b></p></td><td width="64%" valign="top"><p><b><font size="2">wchar_t Definition</font></b></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">Solaris</font></p></td><td width="64%" valign="top"><p><font size="2">long</font></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">Unixware</font></p></td><td width="64%" valign="top"><p><font size="2">long</font></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">QNX</font></p></td><td width="64%" valign="top"><p><font size="2">long</font></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">HPUX</font></p></td><td width="64%" valign="top"><p><font size="2">unsigned int</font></p></td></tr><tr><td width="36%" valign="top"><p><font size="2">Linux</font></p></td><td width="64%" valign="top"><p><font size="2">unsigned long</font></p></td></tr></table><p><font size="2">All of the platforms in the table above provideUnicode equivalents of string handling functions such as<b>strcpy</b>, <b>strcmp</b>, etc. Typically, these functions arenamed <b>wcscpy</b>, <b>wcscmp</b>, etc.</font></p><h2><a name="Implementation"></a>9.2 <i>db.*</i>Implementation</h2><p><font size="2">Support in <b><i>db.*</i></b> for Unicode isimplemented by Unicode data fields (data type <b>wchar_t</b>),which are supported on all platforms.</font></p><p>To enable support, the preprocessor symbol UNICODE_DATA must bedefined when <b><i>db.*</i></b> is compiled. This symbol is definedautomatically in <b>db.star.h</b>, on all platforms.</p><p>The <b><i>db.*</i></b> product also allows Unicode strings to bestored in <b>char</b> fields, using UTF-8 format. Provided theruntime locale is set correctly, and the <b><i>db.*</i></b> runtimeoption <b>MBSSORT</b> is enabled, key fields containing UTF-8strings will be sorted correctly.</p><h3><a name="Fields"></a>Unicode Data Fields</h3><p><font size="2">In <b><i>db.*</i></b> (on all platforms), the<b>ddlp</b> utility recognizes the data type <b>wchar_t</b> fordatabase fields. In the database dictionary (DBD file), thesefields are represented by the character "C" similar to lowercase"c" for <b>char</b> fields. The size of these fields issystem-dependent. On Linux and UNIX, its size is 4bytes.</font></p><p>The Country Table is not used in sorting Unicode data. Sincethese characters are sorted correctly by the Unicode stringcollation functions, there is no need for any mechanism such as theCountry Table. The "ignorecase" option (specified in file<b>db.star.ctb</b>, or through function <b>d_on_opt</b>) is,however, recognized when Unicode database fields are sorted.</p><p>Note that the Country Table is still usable with <b>char</b>fields in <b><i>db.*</i></b>.</p><h2><a name="Prototypes"></a>9.3 Unicode Prototypes for SpecificFunctions</h2><p><font size="2">The prototypes for approximately one-third of the<b><i>db.*</i></b> library functions are different in the Unicodeversion than in the standard version. These are all functionscontaining <b>char*</b> data types. In the majority of these,<b>char*</b> becomes <b>wchar_t*</b>.</font></p><p><a href="UG_Ch10.htm">Next Page</a></p></body></html>

ug_ch9.htm - 源码说明

本页面展示了「db.* (pronounced dee-be star) is an advanced, high performance, small footprint embedded database fo」中的 ug_ch9.htm 源码文件，采用 HTM 编程语言编写，共 157 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与performance相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?