📄 xbd_chap06.html
字号:
</tr><tr valign="top"><td align="left"><p class="tent"><left-curly-bracket></p></td><td align="center"><p class="tent">{</p></td><td align="left"><p class="tent"><U007B></p></td><td align="left"><p class="tent">LEFT CURLY BRACKET</p></td></tr><tr valign="top"><td align="left"><p class="tent"><vertical-line></p></td><td align="center"><p class="tent">|</p></td><td align="left"><p class="tent"><U007C></p></td><td align="left"><p class="tent">VERTICAL LINE</p></td></tr><tr valign="top"><td align="left"><p class="tent"><right-brace></p></td><td align="center"><p class="tent">}</p></td><td align="left"><p class="tent"><U007D></p></td><td align="left"><p class="tent">RIGHT CURLY BRACKET</p></td></tr><tr valign="top"><td align="left"><p class="tent"><right-curly-bracket></p></td><td align="center"><p class="tent">}</p></td><td align="left"><p class="tent"><U007D></p></td><td align="left"><p class="tent">RIGHT CURLY BRACKET</p></td></tr><tr valign="top"><td align="left"><p class="tent"><tilde></p></td><td align="center"><p class="tent">˜</p></td><td align="left"><p class="tent"><U007E></p></td><td align="left"><p class="tent">TILDE</p></td></tr></table></center><p>IEEE Std 1003.1-2001 uses character names other than the above, but only in an informative way; for example, inexamples to illustrate the use of characters beyond the portable character set with the facilities ofIEEE Std 1003.1-2001.</p><p><a href="#tagtcjh_3">Portable Character Set</a> defines the characters in the portable character set and the correspondingsymbolic character names used to identify each character in a character set description file. The table contains more than onesymbolic character name for characters whose traditional name differs from the chosen name. Characters defined in <a href="#tagtcjh_4">Control Character Set</a> may also be used in character set description files.</p><p>IEEE Std 1003.1-2001 places only the following requirements on the encoded values of the characters in the portablecharacter set:</p><ul><li><p>If the encoded values associated with each member of the portable character set are not invariant across all locales supportedby the implementation, if an application accesses any pair of locales where the character encodings differ, or accesses data froman application running in a locale which has different encodings from the application's current locale, the results areunspecified.</p></li><li><p>The encoded values associated with the digits 0 to 9 shall be such that the value of each character after 0 shall be one greaterthan the value of the previous character.</p></li><li><p>A null character, NUL, which has all bits set to zero, shall be in the set of characters.</p></li><li><p>The encoded values associated with the members of the portable character set are each represented in a single byte. Moreover, ifthe value is stored in an object of C-language type <b>char</b>, it is guaranteed to be positive (except the NUL, which is alwayszero).</p></li></ul><p>Conforming implementations shall support certain character and character set attributes, as defined in <a href="xbd_chap07.html#tag_07_02"><i>POSIX Locale</i></a> .</p><h3><a name="tag_06_02"></a>Character Encoding</h3><p>The POSIX locale contains the characters in <a href="#tagtcjh_3">Portable Character Set</a> , which have the properties listedin <a href="xbd_chap07.html#tag_07_03_01"><i>LC_CTYPE</i></a> . In other locales, the presence, meaning, and representation of anyadditional characters are locale-specific.</p><p>In locales other than the POSIX locale, a character may have a state-dependent encoding. There are two types of theseencodings:</p><ul><li><p>A single-shift encoding (where each character not in the initial shift state is preceded by a shift code) can be defined if eachshift-code and character sequence is considered a multi-byte character. This is done using the concatenated-constant format in acharacter set description file, as described in <a href="#tag_06_04">Character Set Description File</a> . If the implementationsupports a character encoding of this type, all of the standard utilities in the Shell and Utilities volume ofIEEE Std 1003.1-2001 shall support it. Use of a single-shift encoding with any of the functions in the System Interfacesvolume of IEEE Std 1003.1-2001 that do not specifically mention the effects of state-dependent encoding isimplementation-defined.</p></li><li><p>A locking-shift encoding (where the state of the character is determined by a shift code that may affect more than the singlecharacter following it) cannot be defined with the current character set description file format. Use of a locking-shift encodingwith any of the standard utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 or with any of the functionsin the System Interfaces volume of IEEE Std 1003.1-2001 that do not specifically mention the effects of state-dependentencoding is implementation-defined.</p></li></ul><p>While in the initial shift state, all characters in the portable character set shall retain their usual interpretation and shallnot alter the shift state. The interpretation for subsequent bytes in the sequence shall be a function of the current shift state.A byte with all bits zero shall be interpreted as the null character independent of shift state. Thus a byte with all bits zeroshall never occur in the second or subsequent bytes of a character.</p><p>The maximum allowable number of bytes in a character in the current locale shall be indicated by {MB_CUR_MAX}, defined in the <ahref="stdlib.h.html"><i><stdlib.h></i></a> header and by the <b><mb_cur_max></b> value in a character set descriptionfile; see <a href="#tag_06_04">Character Set Description File</a> . The implementation's maximum number of bytes in a charactershall be defined by the C-language macro {MB_LEN_MAX}.</p><h3><a name="tag_06_03"></a>C Language Wide-Character Codes</h3><p>In the shell, the standard utilities are written so that the encodings of characters are described by the locale's<i>LC_CTYPE</i> definition (see <a href="xbd_chap07.html#tag_07_03_01"><i>LC_CTYPE</i></a> ) and there is no differentiationbetween characters consisting of single octets (8-bit bytes) or multiple bytes. However, in the C language, a differentiation ismade. To ease the handling of variable length characters, the C language has introduced the concept of wide-character codes.</p><p>All wide-character codes in a given process consist of an equal number of bits. This is in contrast to characters, which canconsist of a variable number of bytes. The byte or byte sequence that represents a character can also be represented as awide-character code. Wide-character codes thus provide a uniform size for manipulating text data. A wide-character code having allbits zero is the null wide-character code (see <a href="xbd_chap03.html#tag_03_246"><i>Null Wide-Character Code</i></a> ), andterminates wide-character strings (see <a href="xbd_chap03.html#tag_03_432"><i>Wide-Character Code (C Language)</i></a> ). Thewide-character value for each member of the portable character set shall equal its value when used as the lone character in aninteger character constant. Wide-character codes for other characters are locale and implementation-defined. State shift bytesshall not have a wide-character code representation.</p><h3><a name="tag_06_04"></a>Character Set Description File</h3><p>Implementations shall provide a character set description file for at least one coded character set supported by theimplementation. These files are referred to elsewhere in IEEE Std 1003.1-2001 as <i>charmap</i> files. It isimplementation-defined whether or not users or applications can provide additional character set description files.</p><p>IEEE Std 1003.1-2001 does not require that multiple character sets or codesets be supported. Although multiple charmapfiles are supported, it is the responsibility of the implementation to provide the file or files; if only one is provided, onlythat one is accessible using the <a href="../utilities/localedef.html"><i>localedef</i></a> utility's <b>-f</b> option.</p><p>Each character set description file, except those that use the ISO/IEC 10646-1:2000 standard position values as theencoding values, shall define characteristics for the coded character set and the encoding for the characters specified in <a href="#tagtcjh_3">Portable Character Set</a> , and may define encoding for additional characters supported by the implementation. Otherinformation about the coded character set may also be in the file. Coded character set character values shall be defined usingsymbolic character names followed by character encoding values.</p><p>Each symbolic name specified in <a href="#tagtcjh_3">Portable Character Set</a> shall be included in the file and shall bemapped to a unique coding value, except as noted below. The glyphs <tt>'{'</tt> , <tt>'}'</tt> , <tt>'_'</tt> , <tt>'-'</tt> ,<tt>'/'</tt> , <tt>'\'</tt> , <tt>'.'</tt> , and <tt>'^'</tt> have more than one symbolic name; all symbolic names for each suchglyph shall be included, each with identical encoding. If some or all of the control characters identified in <a href="#tagtcjh_4">Control Character Set</a> are supported by the implementation, the symbolic names and their corresponding encodingvalues shall be included in the file. Some of the encodings associated with the symbolic names in <a href="#tagtcjh_4">ControlCharacter Set</a> may be the same as characters found in <a href="#tagtcjh_3">Portable Character Set</a> ; both names shall beprovided for each encoding.<br></p><center><b><a name="tagtcjh_4"></a> Table: Control Character Set</b></center><center><table border="1" cellpadding="3" align="center"><tr valign="top"><td align="left"><p class="tent"><ACK></p></td><td align="left"><p class="tent"><DC2></p></td><td align="left"><p class="tent"><ENQ></p></td><td align="left"><p class="tent"><FS></p></td><td align="left"><p class="tent"><IS4></p></td><td align="left"><p class="tent"><SOH></p></td></tr><tr valign="top"><td align="left"><p class="tent"><BEL></p></td><td align="left"><p class="tent"><DC3></p></td><td align="left"><p class="tent"><EOT></p></td><td align="left"><p class="tent"><GS></p></td><td align="left"><p class="tent"><LF></p></td><td align="left"><p class="tent"><STX></p></td></tr><tr valign="top"><td align="left"><p class="tent"><BS></p></td><td align="left"><p class="tent"><DC4></p></td><td align="left"><p class="tent"><ESC></p></td><td align="left"><p class="tent"><HT></p></td><td align="left"><p class="tent"><NAK></p></td><td align="left"><p class="tent"><SUB></p></td></tr><tr valign="top"><td align="left"><p class="tent"><CAN></p></td><td align="left"><p class="tent"><DEL></p></td><td align="left"><p class="tent"><ETB></p></td><td align="left"><p class="tent"><IS1></p></td><td align="left"><p class="tent"><RS></p></td><td align="left"><p class="tent"><SYN></p></td></tr><tr valign="top"><td align="left"><p class="tent"><CR></p></td><td align="left"><p class="tent"><DLE></p></td><td align="left"><p class="tent"><ETX></p></td><td align="left"><p class="tent"><IS2></p></td><td align="left"><p class="tent"><SI></p></td><td align="left"><p class="tent"><US></p></td></tr><tr valign="top"><td align="left"><p class="tent"><DC1></p></td><td align="left"><p class="tent"><EM></p></td><td align="left"><p class="tent"><FF></p></td><td align="left"><p class="tent"><IS3></p></td><td align="left"><p class="tent"><SO></p></td><td align="left"><p class="tent"><VT></p></td></tr></table></center><p>The following declarations can precede the character definitions. Each shall consist of the symbol shown in the following list,starting in column 1, including the surrounding brackets, followed by one or more <blank>s, followed by the value to beassigned to the symbol.</p><dl compact><dt><b><code_set_name></b></dt><dd>The name of the coded character set for which the character set description file is defined. The characters of the name shallbe taken from the set of characters with visible glyphs defined in <a href="#tagtcjh_3">Portable Character Set</a> .</dd><dt><b><mb_cur_max></b></dt><dd>The maximum number of bytes in a multi-byte character. This shall default to 1.</dd><dt><b><mb_cur_min></b></dt><dd>An unsigned positive integer value that defines the minimum number of bytes in a character for the encoded character set.<sup>[<a href="javascript:open_code('XSI')">XSI</a>]</sup> <img src="../images/opt-start.gif" alt="[Option Start]" border="0"> On XSI-conformant systems, <b><mb_cur_min></b> shall always be 1. <img src="../images/opt-end.gif" alt="[Option End]"border="0"></dd><dt><b><escape_char></b></dt><dd>The character used to indicate that the characters following shall be interpreted in a special way, as defined later in thissection. This shall default to backslash ( <tt>'\'</tt> ), which is the character used in all the following text and examples,unless otherwise noted.</dd><dt><b><comment_char></b></dt><dd>The character that, when placed in column 1 of a charmap line, is used to indicate that the line shall be ignored. The defaultcharacter shall be the number sign ( <tt>'#'</tt> ).</dd></dl><p>The character set mapping definitions shall be all the lines immediately following an identifier line containing the string<tt>"CHARMAP"</tt> starting in column 1, and preceding a trailer line containing the string <tt>"END CHARMAP"</tt> starting incolumn 1. Empty lines and lines containing a <b><comment_char></b> in the first column shall be ignored. Each non-commentline of the character set mapping definition (that is, between the <tt>"CHARMAP"</tt> and <tt>"END CHARMAP"</tt> lines of the file)shall be in either of two forms:</p><blockquote><pre><tt>"%s %s %s\n", <</tt><i>symbolic-name</i><tt>>, <</tt><i>encoding</i><tt>>, <</tt><i>comments</i><tt>></tt></pre></blockquote><p>or:</p><blockquote><pre><tt>"%s...%s %s %s\n", <</tt><i>symbolic-name</i><tt>>, <</tt><i>symbolic-name</i><tt>>, <</tt><i>encoding</i><tt>>, <</tt><i>comments</i><tt>></tt></pre></blockquote><p>In the first format, the line in the character set mapping definition shall define a single symbolic name and a correspondingencoding. A symbolic name is one or more characters from the set shown with visible glyphs in <a href="#tagtcjh_3">PortableCharacter Set</a> , enclosed between angle brackets. A character following an escape character is interpreted as itself; forexample, the sequence <tt>"<\\\>>"</tt> represents the symbolic name <tt>"\>"</tt> enclosed between angle brackets.</p><p>In the second format, the line in the character set mapping definition shall define a range of one or more symbolic names. Inthis form, the symbolic names shall consist of zero or more non-numeric characters from the set shown with visible glyphs in <ahref="#tagtcjh_3">Portable Character Set</a> , followed by an integer formed by one or more decimal digits. Both integers shallcontain the same number of digits. The characters preceding the integer shall be identical in the two symbolic names, and theinteger formed by the digits in the second symbolic name shall be equal to or greater than the integer formed by the digits in thefirst name. This shall be interpreted as a series of symbolic names formed from the common part and each of the integers betweenthe first and the second integer, inclusive. As an example, <j0101>...<j0104> is interpreted as the symbolic names<j0101>, <j0102>, <j0103>, and <j0104>, in that order.</p><p>A character set mapping definition line shall exist for all symbolic names specified in <a href="#tagtcjh_3">Portable CharacterSet</a> , and shall define the coded character value that corresponds to the character indicated in the table, or the codedcharacter value that corresponds to the control character symbolic name. If the control characters commonly associated with thesymbolic names in <a href="#tagtcjh_4">Control Character Set</a> are supported by the implementation, the symbolic name and thecorresponding encoding value shall be included in the file. Additional unique symbolic names may be included. A coded charactervalue can be represented by more than one symbolic name.</p><p>The encoding part is expressed as one (for single-byte character values) or more concatenated decimal, octal, or hexadecimalconstants in the following formats:</p><blockquote><pre><tt>"%cd%u", <</tt><i>escape_char</i><tt>>, <</tt><i>decimal byte value</i><tt>>"%cx%x", <</tt><i>escape_char</i><tt>>, <</tt><i>hexadecimal byte value</i><tt>>"%c%o", <</tt><i>escape_char</i><tt>>, <</tt><i>octal byte value</i><tt>></tt></pre></blockquote><p>Decimal constants shall be represented by two or three decimal digits, preceded by the escape character and the lowercase letter<tt>'d'</tt> ; for example, <tt>"\d05"</tt> , <tt>"\d97"</tt> , or <tt>"\d143"</tt> . Hexadecimal constants shall be represented bytwo hexadecimal digits, preceded by the escape character and the lowercase letter <tt>'x'</tt> ; for example, <tt>"\x05"</tt> ,<tt>"\x61"</tt> , or <tt>"\x8f"</tt> . Octal constants shall be represented by two or three octal digits, preceded by the escapecharacter; for example, <tt>"\05"</tt> , <tt>"\141"</tt> , or <tt>"\217"</tt> . In a portable charmap file, each constantrepresents an 8-bit byte. When constants are concatenated for multi-byte character values, they shall be of the same type, andinterpreted in byte order from first to last with the least significant byte of the multi-byte character specified by the lastconstant. The manner in which these constants are represented in the character stored in the system is implementation-defined.(This notation was chosen for reasons of portability. There is no requirement that the internal representation in the computermemory be in this same order.) Omitting bytes from a multi-byte character definition produces undefined results.</p><p>In lines defining ranges of symbolic names, the encoded value shall be the value for the first symbolic name in the range (thesymbolic name preceding the ellipsis). Subsequent symbolic names defined by the range shall have encoding values in increasingorder. Bytes shall be treated as unsigned octets, and carry shall be propagated between the bytes as necessary to represent therange. For example, the line:</p><blockquote><pre><tt><j0101>...<j0104> \d129\d254</tt></pre></blockquote><p>is interpreted as:</p><blockquote><pre><tt><j0101> \d129\d254<j0102> \d129\d255<j0103> \d130\d0<j0104> \d130\d1</tt></pre></blockquote><p>The comment is optional.</p><p>The following declarations can follow the character set mapping definitions (after the <tt>"END CHARMAP"</tt> statement). Eachshall consist of the keyword shown in the following list, starting in column 1, followed by the value(s) to be associated to thekeyword, as defined below.</p><dl compact><dt><b>WIDTH</b></dt><dd>An unsigned positive integer value defining the column width (see <a href="xbd_chap03.html#tag_03_103"><i>ColumnPosition</i></a> ) for the printable characters in the coded character set specified in <a href="#tagtcjh_3">Portable CharacterSet</a> and <a href="#tagtcjh_4">Control Character Set</a> . Coded character set character values shall be defined using symboliccharacter names followed by column width values. Defining a character with more than one <b>WIDTH</b> produces undefined results.The <b>END WIDTH</b> keyword shall be used to terminate the <b>WIDTH</b> definitions. Specifying the width of a non-printablecharacter in a <b>WIDTH</b> declaration produces undefined results.</dd><dt><b>WIDTH_DEFAULT</b></dt><dd><br>An unsigned positive integer value defining the default column width for any printable character not listed by one of the<b>WIDTH</b> keywords. If no <b>WIDTH_DEFAULT</b> keyword is included in the charmap, the default character width shall be 1.</dd></dl><hr><div class="box"><em>The following sections are informative.</em></div><h5><a name="tag_06_04_00_01"></a>Example</h5><p>After the <tt>"END CHARMAP"</tt> statement, a syntax for a width definition would be:</p><pre><tt>WIDTH<A> 1<B> 1<C>...<Z> 1...<foo1>...<foon> 2...END WIDTH</tt></pre><p>In this example, the numerical code point values represented by the symbols <b><A></b> and <b><B></b> are assigned awidth of 1. The code point values <b><C></b> to <b><Z></b> inclusive ( <b><C></b>, <b><D></b>,<b><E></b>, and so on) are also assigned a width of 1. Using <b><A></b>... <b><Z></b> would have required fewerlines, but the alternative was shown to demonstrate flexibility. The keyword <b>WIDTH_DEFAULT</b> could have been added asappropriate.</p><div class="box"><em>End of informative text.</em></div><hr><h4><a name="tag_06_04_01"></a>State-Dependent Character Encodings</h4><p>This section addresses the use of state-dependent character encodings (that is, those in which the encoding of a character isdependent on one or more shift codes that may precede it).</p><p>A single-shift encoding (where each character not in the initial shift state is preceded by a shift code) can be defined in thecharmap format if each shift-code/character sequence is considered a multi-byte character, defined using the concatenated-constantformat described in <a href="#tag_06_04">Character Set Description File</a> . If the implementation supports a character encodingof this type, all of the standard utilities shall support it. A locking-shift encoding (where the state of the character isdetermined by a shift code that may affect more than the single character following it) could be defined with an extension to thecharmap format described in <a href="#tag_06_04">Character Set Description File</a> . If the implementation supports a characterencoding of this type, any of the standard utilities that describe character (<i>versus</i> byte) or text-file manipulation shallhave the following characteristics:</p><ol><li><p>The utility shall process the statefully encoded data as a concatenation of state-independent characters. The presence ofredundant locking shifts shall not affect the comparison of two statefully encoded strings.</p></li><li><p>A utility that divides, truncates, or extracts substrings from statefully encoded data shall produce output that containslocking shifts at the beginning or end of the resulting data, if appropriate, to retain correct state information.</p></li></ol><hr size="2" noshade><center><font size="2"><!--footer start-->UNIX ® is a registered Trademark of The Open Group.<br>POSIX ® is a registered Trademark of The IEEE.<br>[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href="../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>]</font></center><!--footer end--><hr size="2" noshade></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -