📄 charset.html
字号:
C Language Wide-character Codes</a></xref>).The wide-character value for each member of thePortable Character Set will equal itsvalue when used as the lone character in aninteger character constant.Wide-character codes for other charactersare locale- and implementation-dependent.State shift bytes do not have a wide-character code representation.<h3><a name = "tag_001_004"> </a>Character Set Description File</h3><xref type="2" name="charmap"></xref>Implementations provide a character setdescription file for at least one coded character setsupported by the implementation.These files are referredto elsewhere in this specification set as<i>charmap</i>files.It is implementation-dependent whether or not users or applications canprovide additional character set description files.<p>This specification set does not require that multiple character setsor codesets be supported.Although multiple charmapfiles are supported, it is the responsibility of the implementationto provide the file or files;if only one is provided, only that one will be accessible using the<i><a href="../xcu/localedef.html">localedef</a></i>utility's<b>-f</b>option (although in the case of just one file on the system,<b>-f</b>is not useful).<p>Each character set description file definescharacteristics for the coded character set andthe encoding for the charactersspecified in<xref href=portchar><a href="#tagt_1">Portable Character Set</a></xref>and may define encoding for additionalcharacters supported by the implementation.Other information about the coded character set may also be in the file.Coded character set character values are definedusing symbolic character names followed by character encoding values.<p>The character set description file provides:<ul><p><li>The capability to describe character set attributes (such as collationorder or character classes) independent of character set encoding, andusing only the characters in the portable character set.This makes itpossible to create generic<i><a href="../xcu/localedef.html">localedef</a></i>source files for all codesets that share the portable character set(such as the ISO 8859 family or IBM Extended ASCII).<p><li>Standardised symbolic names for all characters in the portable characterset, making it possible to refer to any such characterregardless of encoding.<p></ul><p>The charmap file was introduced to resolve problems with theportability of, especially,<i><a href="../xcu/localedef.html">localedef</a></i>sources.This specification set assumes that the portable character set is constantacross all locales, but does not prohibit implementations from supportingtwo incompatible codings, such asboth ASCII and EBCDIC.Such dual-support implementationsshould have all charmaps and<i><a href="../xcu/localedef.html">localedef</a></i>sources encoded using oneportable character set, in effect cross-compiling for the otherenvironment.Naturally, charmaps (and<i><a href="../xcu/localedef.html">localedef</a></i>sources) areonly portable without transformation between systems using the sameencodings for the portable character set.They can, however, be transformedbetween two sets using only a subset of the actual characters(the portable set).However, the particular codedcharacter set used for an application or an implementationdoes not necessarily imply different characteristics or collation;on the contrary, these attributes should in manycases be identical, regardless of codeset.The charmap provides the capability to define a commonlocale definition for multiple codesets (the same<i><a href="../xcu/localedef.html">localedef</a></i>source can be used for codesets with different extendedcharacters;the ability in the charmap to define emptynames allows for characters missing in certain codesets).<p>Each symbolic name specified in<xref href=portchar><a href="#tagt_1">Portable Character Set</a></xref>is included inthe file and is mapped to a unique encoding value(except for those symbolic names that are shownwith identical glyphs).If the control characters commonly associated withthe symbolic names in the following tableare supported bythe implementation, the symbolic names and theircorresponding encoding values are included in the file.Some of theencodings associated with thesymbolic names in this table may bethe same as characters inthe portable character set table.<pre><center><table bordercolor=#000000 border=1 align=center><tr valign=top><td align=center><ACK><td align=center><DC2><td align=center><ENQ><td align=center><FS><td align=center><IS4><td align=center><SOH><tr valign=top><td align=center><BEL><td align=center><DC3><td align=center><EOT><td align=center><GS><td align=center><LF><td align=center><STX><tr valign=top><td align=center><BS><td align=center><DC4><td align=center><ESC><td align=center><HT><td align=center><NAK><td align=center><SUB><tr valign=top><td align=center><CAN><td align=center><DEL><td align=center><ETB><td align=center><IS1><td align=center><RS><td align=center><SYN><tr valign=top><td align=center><CR><td align=center><DLE><td align=center><ETX><td align=center><IS2><td align=center><SI><td align=center><US><tr valign=top><td align=center><DC1><td align=center><EM><td align=center><FF><td align=center><IS3><td align=center><SO><td align=center><VT></table><h6 align=center><xref table="Control Character Set"><a name="tagt_2"> </a></xref>Table: Control Character Set</h6><xref type="7" name="cntlchar"></xref></center></pre><p>The following declarations can precede the character definitions.Each must consist of the symbol shown in the following list,starting in column 1,including the surrounding brackets, followed by one or moreblank characters,followed by the value to be assigned to the symbol.<dl compact><dt><b><code_set_name></b><dd>The name of the coded character set for whichthe character set description file is defined.The characters of the name must be taken from theset of characterswith visible glyphs defined in<xref href=portchar><a href="#tagt_1">Portable Character Set</a></xref>.<dt><b><mb_cur_max></b><dd>The maximum number of bytes in a multi-byte character.This defaults to 1.<dt><b><mb_cur_min></b><dd>An unsigned positive integer value thatdefines the minimum number of bytes in acharacter for the encoded character set.On XSI-conformant systems,<b><mb_cur_min></b>is always 1.<dt><b><escape_char></b><dd>The escape character used to indicate that thecharacters following will be interpreted in aspecial way, as defined later in this section.This defaults to backslash(\),which is the character glyph used in all the following text and examples,unless otherwise noted.<dt><b><comment_char></b><dd>The character that when placed in column 1 of acharmapline, is used to indicate that the line is to be ignored.The default character is the number sign (#).</dl><p>The character set mapping definitions will be all the linesimmediately following an identifier line containing the string<b>CHARMAP</b>starting in column 1, and preceding a trailerline containing the string<b>END</b>CHARMAPstarting in column 1.Empty lines and lines containing a<b><comment_char></b>in the first column will be ignored.Each non-comment line of the character set mappingdefinition (that is, between the<b>CHARMAP</b>and<b>END</b>CHARMAPlines of the file) must be in either of two forms:<p><dl compact><dt> <dd><p><tt>"%s %s %s\n"</tt>, <<i>symbolic-name</i>>,<<i>encoding</i>>,<<i>comments</i>></p></dl></p>or:<p><dl compact><dt> <dd><p><tt>"%s...%s %s %s\n"</tt>, <<i>symbolic-name</i>>,<<i>symbolic-name</i>>,<<i>encoding</i>>,<<i>comments</i>></p></dl></p><p>In the first format, the line in the character set mapping definitiondefines a single symbolic name and a corresponding encoding.A symbolic name is one or more characters from the set shownwith visible glyphs in<xref href=portchar><a href="#tagt_1">Portable Character Set</a></xref>,enclosed between angle brackets.A character following an escape character is interpreted as itself;for example, the sequence<\\\>>represents the symbolic name\>enclosed between angle brackets.<p>In the second format, the line in the character set mapping definitiondefines a range of one or more symbolic names.In this form, the symbolicnames must consist of zero or more non-numeric characters from the setshown with visible glyphs in<xref href=portchar><a href="#tagt_1">Portable Character Set</a></xref>,followed by an integer formed by one or more decimal digits.The characters preceding the integermust beidentical in the two symbolic names, and the integer formed by the digitsin the second symbolic name must be equal to or greater than the integerformed by the digits in the first name.This is interpretedas a series of symbolic names formed from the common part and each ofthe integers between the first and the second integer, inclusive.As an example,<j0101>...<j0104>is interpreted as the symbolic names<j0101>,<j0102>,<j0103>and<j0104>,in that order.<p>A character set mapping definition line must exist for all symbolicnames specified in<xref href=portchar><a href="#tagt_1">Portable Character Set</a></xref>,and must define the coded character valuethat corresponds to the character glyph indicated in the table, orthe coded character value that corresponds with the control charactersymbolic name.If the control characters commonly associated with thesymbolic names in <xref href=cntlchar><a href="#tagt_2">Control Character Set</a></xref>are supported by the implementation,the symbolic name and the corresponding encoding value must be includedin the file.Additional unique symbolic names may be included.A coded character value can be represented by more than one symbolic name.<p>The encoding part is expressed as one (for single-byte charactervalues) or more concatenated decimal, octal or hexadecimalconstants in the following formats:<p><dl compact><dt> <dd><tt>"%cd%d"</tt>, <<i>escape_char</i>>,<<i>decimal byte value</i>></dl></p><p><dl compact><dt> <dd><tt>"%cx%x"</tt>, <<i>escape_char</i>>,<<i>hexadecimal byte value</i>></dl></p><p><dl compact><dt> <dd><tt>"%c%o"</tt>, <<i>escape_char</i>>,<<i>octal byte value</i>></dl></p><p>Decimal constants must be represented by two or three decimaldigits, preceded by the escape character and the lower-case letterd;for example,\d05,\d97or\d143.Hexadecimal constants must be represented bytwo hexadecimal digits, preceded by the escapecharacter and the lower-case letterx;for example,\x05,\x61or\x8f.Octal constants must be represented by two or three octaldigits, preceded by the escape character; for example,\05,\141or\217.In a portable charmap file, each constant must represent an 8-bit byte.Implementations supporting otherbyte sizes may allow constants to represent values larger than thosethat can be represented in 8-bit bytes, and to allow additionaldigits in constants.When constants are concatenated for multi-byte character values,they must be of the same type, andinterpreted in byte order fromfirst to last with the least significant byte of the multi-byte characterspecified by the last constant.The manner in which theseconstants are represented in the characterstored in the system is implementation-dependent.(This big endian notation was chosen for reasons of portability.There is no requirement that the internal representationin the computer memory be in this same order.)Omitting bytes from a multi-byte character definitionproduces undefined results.<p>In lines defining ranges of symbolic names, the encoded value is thevalue for the first symbolic name in the range (the symbolic namepreceding the ellipsis).Subsequent symbolic names defined by the rangewill have encoding values in increasing order.For example, the line:<code><pre><j0101>...<j0104> \d129\d254</code></pre><p>will be interpreted as:<code><pre><j0101> \d129\d254<j0102> \d129\d255<j0103> \d130\d0<j0104> \d130\d1</code></pre><p>Note that this line will be interpreted as the example even onsystems with bytes larger than 8 bits.<p>The comment is optional.<p>For the interpretation of the dollar sign and the number sign, see<xref href=dollarsign><a href="glossary.html#tag_004_000_079">dollar sign</a></xref>and<xref href=numbersign><a href="glossary.html#tag_004_000_180">number sign</a></xref>.<hr size=2 noshade><center><font size=2>UNIX ® is a registered Trademark of The Open Group.<br>Copyright © 1997 The Open Group<br> [ <a href="../index.html">Main Index</a> | <a href="../xshix.html">XSH</a> | <a href="../xcuix.html">XCU</a> | <a href="../xbdix.html">XBD</a> | <a href="../cursesix.html">XCURSES</a> | <a href="../xnsix.html">XNS</a> ]</font></center><hr size=2 noshade></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -