📄 xbd_chap06.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta name="generator" content="HTML Tidy, see www.w3.org"><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group's rhtm tool v1.2.1 --><!-- Copyright (c) 2001-2003 The Open Group, All Rights Reserved --><title>Rationale</title></head><body><basefont size="3"> <center><font size="2">The Open Group Base Specifications Issue 6<br>IEEE Std 1003.1, 2003 Edition<br>Copyright © 2001-2003 The IEEE and The Open Group</font></center><hr size="2" noshade><h3><a name="tag_01_06"></a>Character Set</h3><h4><a name="tag_01_06_01"></a>Portable Character Set</h4><p>The portable character set is listed in full so there is no dependency on the ISO/IEC 646:1991 standard (or historicallyASCII) encoded character set, although the set is identical to the characters defined in the International Reference version of theISO/IEC 646:1991 standard.</p><p>IEEE Std 1003.1-2001 poses no requirement that multiple character sets or codesets be supported, leaving this as amarketing differentiation for implementors. Although multiple charmap files are supported, it is the responsibility of theimplementation to provide the file(s); if only one is provided, only that one will be accessible using the <a href="../utilities/localedef.html"><i>localedef</i></a> <b>-f</b> option.</p><p>The statement about invariance in codesets for the portable character set is worded to avoid precluding implementations wheremultiple incompatible codesets are available (for instance, ASCII and EBCDIC). The standard utilities cannot be expected to producepredictable results if they access portable characters that vary on the same implementation.</p><p>Not all character sets need include the portable character set, but each locale must include it. For example, a Japanese-basedlocale might be supported by a mixture of character sets: JIS X 0201 Roman (a Japanese version of theISO/IEC 646:1991 standard), JIS X 0208, and JIS X 0201 Katakana. Not all of these character sets includethe portable characters, but at least one does (JIS X 0201 Roman).</p><h4><a name="tag_01_06_02"></a>Character Encoding</h4><p>Encoding mechanisms based on single shifts, such as the EUC encoding used in some Asian and other countries, can be supportedvia the current charmap mechanism. With single-shift encoding, each character is preceded by a shift code (SS2 or SS3). A completeEUC code, consisting of the portable character set (G0) and up to three additional character sets (G1, G2, G3), can be describedusing the current charmap mechanism; the encoding for each character in additional character sets G2 and G3 must then include theirsingle-shift code. Other mechanisms to support locales based on encoding mechanisms such as locking shift are not addressed by thisvolume of IEEE Std 1003.1-2001.</p><h4><a name="tag_01_06_03"></a>C Language Wide-Character Codes</h4><p>There is no additional rationale provided for this section.</p><h4><a name="tag_01_06_04"></a>Character Set Description File</h4><p>IEEE PASC Interpretation 1003.2 #196 is applied, removing three lines of text dealing with ranges of symbolic names usingposition constant values which had been erroneously included in the final IEEE P1003.2b draft standard.</p><h5><a name="tag_01_06_04_01"></a>State-Dependent Character Encodings</h5><p>A requirement was considered that would force utilities to eliminate any redundant locking shifts, but this was left as aquality of implementation issue.</p><p>This change satisfies the following requirement from the ISO POSIX-2:1993 standard, Annex H.1:</p><blockquote><i>The support of state-dependent (shift encoding) character sets should be addressed fully. See descriptions of thesein the Base Definitions volume of IEEE Std 1003.1-2001, Section 6.2, Character Encoding. If such character encodings aresupported, it is expected that this will impact the Base Definitions volume of IEEE Std 1003.1-2001, Section 6.2,Character Encoding, the Base Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap07.html">Chapter 7,Locale</a>, the Base Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap09.html">Chapter 9, RegularExpressions</a> , and the <a href="../utilities/comm.html"><i>comm</i></a>, <a href="../utilities/cut.html"><i>cut</i></a>, <ahref="../utilities/diff.html"><i>diff</i></a>, <a href="../utilities/grep.html"><i>grep</i></a>, <a href="../utilities/head.html"><i>head</i></a>, <a href="../utilities/join.html"><i>join</i></a>, <a href="../utilities/paste.html"><i>paste</i></a>, and <a href="../utilities/tail.html"><i>tail</i></a> utilities.</i></blockquote><p>The character set description file provides:</p><ul><li><p>The capability to describe character set attributes (such as collation order or character classes) independent of character setencoding, and using only the characters in the portable character set. This makes it possible to create generic <a href="../utilities/localedef.html"><i>localedef</i></a> source files for all codesets that share the portable character set (such as theISO 8859 family or IBM Extended ASCII).</p></li><li><p>Standardized symbolic names for all characters in the portable character set, making it possible to refer to any such characterregardless of encoding.</p></li></ul><p>Implementations are free to choose their own symbolic names, as long as the names identified by the Base Definitions volume ofIEEE Std 1003.1-2001 are also defined; this provides support for already existing "character names".</p><p>The names selected for the members of the portable character set follow the ISO/IEC 8859-1:1998 standard and theISO/IEC 10646-1:2000 standard. However, several commonly used UNIX system names occur as synonyms in the list:</p><ul><li><p>The historical UNIX system names are used for control characters.</p></li><li><p>The word "slash" is given in addition to "solidus".</p></li><li><p>The word "backslash" is given in addition to "reverse-solidus".</p></li><li><p>The word "hyphen" is given in addition to "hyphen-minus".</p></li><li><p>The word "period" is given in addition to "full-stop".</p></li><li><p>For digits, the word "digit" is eliminated.</p></li><li><p>For letters, the words "Latin Capital Letter" and "Latin Small Letter" are eliminated.</p></li><li><p>The words "left brace" and "right brace" are given in addition to "left-curly-bracket" and "right-curly-bracket".</p></li><li><p>The names of the digits are preferred over the numbers to avoid possible confusion between <tt>'0'</tt> and <tt>'O'</tt> , andbetween <tt>'1'</tt> and <tt>'l'</tt> (one and the letter ell).</p></li></ul><p>The names for the control characters in the Base Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap06.html">Chapter 6, Character Set</a> were taken from the ISO/IEC 4873:1991 standard.</p><p>The charmap file was introduced to resolve problems with the portability of, especially, <a href="../utilities/localedef.html"><i>localedef</i></a> sources. IEEE Std 1003.1-2001 assumes that the portable character setis constant across all locales, but does not prohibit implementations from supporting two incompatible codings, such as both ASCIIand EBCDIC. Such dual-support implementations should have all charmaps and <a href="../utilities/localedef.html"><i>localedef</i></a> sources encoded using one portable character set, in effect cross-compiling forthe other environment. Naturally, charmaps (and <a href="../utilities/localedef.html"><i>localedef</i></a> sources) are onlyportable without transformation between systems using the same encodings for the portable character set. They can, however, betransformed between two sets using only a subset of the actual characters (the portable character set). However, the particularcoded character set used for an application or an implementation does not necessarily imply different characteristics or collation;on the contrary, these attributes should in many cases be identical, regardless of codeset. The charmap provides the capability todefine a common locale definition for multiple codesets (the same <a href="../utilities/localedef.html"><i>localedef</i></a> sourcecan be used for codesets with different extended characters; the ability in the charmap to define empty names allows for charactersmissing in certain codesets).</p><p>The <b><escape_char></b> declaration was added at the request of the international community to ease the creation ofportable charmap files on terminals not implementing the default backslash escape. The <b><comment_char></b> declaration wasadded at the request of the international community to eliminate the potential confusion between the number sign and the poundsign.</p><p>The octal number notation with no leading zero required was selected to match those of <a href="../utilities/awk.html"><i>awk</i></a> and <a href="../utilities/tr.html"><i>tr</i></a> and is consistent with that used by <ahref="../utilities/localedef.html"><i>localedef</i></a>. To avoid confusion between an octal constant and the back-references usedin <a href="../utilities/localedef.html"><i>localedef</i></a> source, the octal, hexadecimal, and decimal constants must contain atleast two digits. As single-digit constants are relatively rare, this should not impose any significant hardship. Provision is madefor more digits to account for systems in which the byte size is larger than 8 bits. For example, a Unicode(ISO/IEC 10646-1:2000 standard) system that has defined 16-bit bytes may require six octal, four hexadecimal, and five decimaldigits.</p><p>The decimal notation is supported because some newer international standards define character values in decimal, rather than inthe old column/row notation.</p><p>The charmap identifies the coded character sets supported by an implementation. At least one charmap must be provided, but noimplementation is required to provide more than one. Likewise, implementations can allow users to generate new charmaps (forinstance, for a new version of the ISO 8859 family of coded character sets), but does not have to do so. If users are allowedto create new charmaps, the system documentation describes the rules that apply (for instance, "only coded character sets that aresupersets of the ISO/IEC 646:1991 standard IRV, no multi-byte characters").</p><p>This addition of the <b>WIDTH</b> specification satisfies the following requirement from the ISO POSIX-2:1993 standard,Annex H.1:</p><blockquote><dl compact><dt>(9)</dt><dd><i>The definition of column position relies on the implementation's knowledge of the integral width of the characters. Thecharmap or</i> LC_CTYPE locale definitions should be enhanced to allow application specification of these widths.</dd></dl></blockquote><p>The character "width" information was first considered for inclusion under <i>LC_CTYPE</i> but was moved because it is moreclosely associated with the information in the charmap than information in the locale source (cultural conventions information).Concerns were raised that formalizing this type of information is moving the locale source definition from the codeset-independententity that it was designed to be to a repository of codeset-specific information. A similar issue occurred with the<b><code_set_name></b>, <b><mb_cur_max></b>, and <b><mb_cur_min></b> information, which was resolved to reside inthe charmap definition.</p><p>The width definition was added to the IEEE P1003.2b draft standard with the intent that the <a href="../functions/wcswidth.html"><i>wcswidth</i>()</a> and/or <a href="../functions/wcwidth.html"><i>wcwidth</i>()</a> functions(currently specified in the System Interfaces volume of IEEE Std 1003.1-2001) be the mechanism to retrieve the characterwidth information.</p><hr size="2" noshade><center><font size="2"><!--footer start-->UNIX ® is a registered Trademark of The Open Group.<br>POSIX ® is a registered Trademark of The IEEE.<br>[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href="../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>]</font></center><!--footer end--><hr size="2" noshade></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -