📄 xbd_chap07.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta name="generator" content="HTML Tidy, see www.w3.org"><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group's rhtm tool v1.2.1 --><!-- Copyright (c) 2001-2003 The Open Group, All Rights Reserved --><title>Locale</title></head><body bgcolor="white"><script type="text/javascript" language="JavaScript" src="../jscript/codes.js"></script><basefont size="3"> <!--header start--><center><font size="2">The Open Group Base Specifications Issue 6<br>IEEE Std 1003.1, 2003 Edition<br>Copyright © 2001-2003 The IEEE and The Open Group, All Rights reserved.</font></center><!--header end--><hr size="2" noshade><h2><a name="tag_07"></a>Locale</h2><h3><a name="tag_07_01"></a>General</h3><p>A locale is the definition of the subset of a user's environment that depends on language and cultural conventions. It is madeup from one or more categories. Each category is identified by its name and controls specific aspects of the behavior of componentsof the system. Category names correspond to the following environment variable names:</p><dl compact><dt><i>LC_CTYPE</i></dt><dd>Character classification and case conversion.</dd><dt><i>LC_COLLATE</i></dt><dd>Collation order.</dd><dt><i>LC_MONETARY</i></dt><dd>Monetary formatting.</dd><dt><i>LC_NUMERIC</i></dt><dd>Numeric, non-monetary formatting.</dd><dt><i>LC_TIME</i></dt><dd>Date and time formats.</dd><dt><i>LC_MESSAGES</i></dt><dd>Formats of informative and diagnostic messages and interactive responses.</dd></dl><p>The standard utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 shall base their behavior on thecurrent locale, as defined in the ENVIRONMENT VARIABLES section for each utility. The behavior of some of the C-language functionsdefined in the System Interfaces volume of IEEE Std 1003.1-2001 shall also be modified based on the current locale, asdefined by the last call to <a href="../functions/setlocale.html"><i>setlocale</i>()</a>.</p><p>Locales other than those supplied by the implementation can be created via the <a href="../utilities/localedef.html"><i>localedef</i></a> utility, provided that the _POSIX2_LOCALEDEF symbol is defined on the system.Even if <a href="../utilities/localedef.html"><i>localedef</i></a> is not provided, all implementations conforming to the SystemInterfaces volume of IEEE Std 1003.1-2001 shall provide one or more locales that behave as described in this chapter. Theinput to the utility is described in <a href="#tag_07_03">Locale Definition</a> . The value that is used to specify a locale whenusing environment variables shall be the string specified as the <i>name</i> operand to the <a href="../utilities/localedef.html"><i>localedef</i></a> utility when the locale was created. The strings <tt>"C"</tt> and<tt>"POSIX"</tt> are reserved as identifiers for the POSIX locale (see <a href="#tag_07_02">POSIX Locale</a> ). When the value of alocale environment variable begins with a slash ( <tt>'/'</tt> ), it shall be interpreted as the pathname of the locale definition;the type of file (regular, directory, and so on) used to store the locale definition is implementation-defined. If the value doesnot begin with a slash, the mechanism used to locate the locale is implementation-defined.</p><p>If different character sets are used by the locale categories, the results achieved by an application utilizing these categoriesare undefined. Likewise, if different codesets are used for the data being processed by interfaces whose behavior is dependent onthe current locale, or the codeset is different from the codeset assumed when the locale was created, the result is alsoundefined.</p><p>Applications can select the desired locale by invoking the <a href="../functions/setlocale.html"><i>setlocale</i>()</a> function(or equivalent) with the appropriate value. If the function is invoked with an empty string, such as:</p><blockquote><pre><tt>setlocale(LC_ALL, "");</tt></pre></blockquote><p>the value of the corresponding environment variable is used. If the environment variable is unset or is set to the empty string,the implementation shall set the appropriate environment as defined in <a href="xbd_chap08.html#tag_08"><i>EnvironmentVariables</i></a> .</p><h3><a name="tag_07_02"></a>POSIX Locale</h3><p>Conforming systems shall provide a POSIX locale, also known as the C locale. The behavior of standard utilities and functions inthe POSIX locale shall be as if the locale was defined via the <a href="../utilities/localedef.html"><i>localedef</i></a> utilitywith input data from the POSIX locale tables in <a href="#tag_07_03">Locale Definition</a> .</p><p>The tables in <a href="#tag_07_03">Locale Definition</a> describe the characteristics and behavior of the POSIX locale for dataconsisting entirely of characters from the portable character set and the control character set. For other characters, the behavioris unspecified. For C-language programs, the POSIX locale shall be the default locale when the <a href="../functions/setlocale.html"><i>setlocale</i>()</a> function is not called.</p><p>The POSIX locale can be specified by assigning to the appropriate environment variables the values <tt>"C"</tt> or<tt>"POSIX"</tt> .</p><p>All implementations shall define a locale as the default locale, to be invoked when no environment variables are set, or set tothe empty string. This default locale can be the POSIX locale or any other implementation-defined locale. Some implementations mayprovide facilities for local installation administrators to set the default locale, customizing it for each location.IEEE Std 1003.1-2001 does not require such a facility.</p><h3><a name="tag_07_03"></a>Locale Definition</h3><p>The capability to specify additional locales to those provided by an implementation is optional, denoted by the_POSIX2_LOCALEDEF symbol. If the option is not supported, only implementation-supplied locales are available. Such locales shall bedocumented using the format specified in this section.</p><p>Locales can be described with the file format presented in this section. The file format is that accepted by the <a href="../utilities/localedef.html"><i>localedef</i></a> utility. For the purposes of this section, the file is referred to as the"locale definition file", but no locales shall be affected by this file unless it is processed by <a href="../utilities/localedef.html"><i>localedef</i></a> or some similar mechanism. Any requirements in this section imposed upon theutility shall apply to <a href="../utilities/localedef.html"><i>localedef</i></a> or to any other similar utility used to installlocale information using the locale definition file format described here.</p><p>The locale definition file shall contain one or more locale category source definitions, and shall not contain more than onedefinition for the same locale category. If the file contains source definitions for more than one category, implementation-definedcategories, if present, shall appear after the categories defined by <a href="#tag_07_01">General</a> . A category sourcedefinition contains either the definition of a category or a <b>copy</b> directive. For a description of the <b>copy</b> directive,see <a href="../utilities/localedef.html"><i>localedef</i></a>. In the event that some of the information for a locale category, asspecified in this volume of IEEE Std 1003.1-2001, is missing from the locale source definition, the behavior of thatcategory, if it is referenced, is unspecified.</p><p>A category source definition shall consist of a category header, a category body, and a category trailer. A category headershall consist of the character string naming of the category, beginning with the characters <i>LC_ .</i> The category trailer shallconsist of the string <tt>"END"</tt> , followed by one or more <blank>s and the string used in the corresponding categoryheader.</p><p>The category body shall consist of one or more lines of text. Each line shall contain an identifier, optionally followed by oneor more operands. Identifiers shall be either keywords, identifying a particular locale element, or collating elements. In additionto the keywords defined in this volume of IEEE Std 1003.1-2001, the source can contain implementation-defined keywords.Each keyword within a locale shall have a unique name (that is, two categories cannot have a commonly-named keyword); no keywordshall start with the characters <i>LC_ .</i> Identifiers shall be separated from the operands by one or more <blank>s.</p><p>Operands shall be characters, collating elements, or strings of characters. Strings shall be enclosed in double-quotes. Literaldouble-quotes within strings shall be preceded by the <<i>escape character</i>>, described below. When a keyword is followedby more than one operand, the operands shall be separated by semicolons; <blank>s shall be allowed both before and after asemicolon.</p><p>The first category header in the file can be preceded by a line modifying the comment character. It shall have the followingformat, starting in column 1:</p><blockquote><pre><tt>"comment_char %c\n", <</tt><i>comment character</i><tt>></tt></pre></blockquote><p>The comment character shall default to the number sign ( <tt>'#'</tt> ). Blank lines and lines containing the <<i>commentcharacter</i>> in the first position shall be ignored.</p><p>The first category header in the file can be preceded by a line modifying the escape character to be used in the file. It shallhave the following format, starting in column 1:</p><blockquote><pre><tt>"escape_char %c\n", <</tt><i>escape character</i><tt>></tt></pre></blockquote><p>The escape character shall default to backslash, which is the character used in all examples shown in this volume ofIEEE Std 1003.1-2001.</p><p>A line can be continued by placing an escape character as the last character on the line; this continuation character shall bediscarded from the input. Although the implementation need not accept any one portion of a continued line with a length exceeding{LINE_MAX} bytes, it shall place no limits on the accumulated length of the continued line. Comment lines shall not be continued ona subsequent line using an escaped <newline>.</p><p>Individual characters, characters in strings, and collating elements shall be represented using symbolic names, as definedbelow. In addition, characters can be represented using the characters themselves or as octal, hexadecimal, or decimal constants.When non-symbolic notation is used, the resultant locale definitions are in many cases not portable between systems. The left anglebracket ( <tt>'<'</tt> ) is a reserved symbol, denoting the start of a symbolic name; when used to represent itself it shall bepreceded by the escape character. The following rules apply to character representation:</p><ol><li><p>A character can be represented via a symbolic name, enclosed within angle brackets <tt>'<'</tt> and <tt>'>'</tt> . Thesymbolic name, including the angle brackets, shall exactly match a symbolic name defined in the charmap file specified via the <ahref="../utilities/localedef.html"><i>localedef</i></a> <b>-f</b> option, and it shall be replaced by a character value determinedfrom the value associated with the symbolic name in the charmap file. The use of a symbolic name not found in the charmap fileshall constitute an error, unless the category is <i>LC_CTYPE</i> or <i>LC_COLLATE ,</i> in which case it shall constitute awarning condition (see <a href="../utilities/localedef.html"><i>localedef</i></a> for a description of actions resulting fromerrors and warnings). The specification of a symbolic name in a <b>collating-element</b> or <b>collating-symbol</b> section thatduplicates a symbolic name in the charmap file (if present) shall be an error. Use of the escape character or a right angle bracketwithin a symbolic name is invalid unless the character is preceded by the escape character.</p><p>For example:</p><blockquote><pre><tt><c>;<c-cedilla> "<M><a><y>"</tt></pre></blockquote></li><li><p>A character in the portable character set can be represented by the character itself, in which case the value of the characteris implementation-defined. (Implementations may allow other characters to be represented as themselves, but such locale definitionsare not portable.) Within a string, the double-quote character, the escape character, and the right angle bracket character shallbe escaped (preceded by the escape character) to be interpreted as the character itself. Outside strings, the characters:</p><blockquote><pre><tt>, ; < > </tt> <i>escape_char</i></pre></blockquote><p>shall be escaped to be interpreted as the character itself.</p><p>For example:</p><blockquote><pre><tt>c "May"</tt></pre></blockquote></li><li><p>A character can be represented as an octal constant. An octal constant shall be specified as the escape character followed bytwo or three octal digits. Each constant shall represent a byte value. Multi-byte values can be represented by concatenatedconstants specified in byte order with the last constant specifying the least significant byte of the character.</p><p>For example:</p><blockquote><pre><tt>\143;\347;\143\150 "\115\141\171"</tt></pre></blockquote></li><li><p>A character can be represented as a hexadecimal constant. A hexadecimal constant shall be specified as the escape characterfollowed by an <tt>'x'</tt> followed by two hexadecimal digits. Each constant shall represent a byte value. Multi-byte values canbe represented by concatenated constants specified in byte order with the last constant specifying the least significant byte ofthe character.</p><p>For example:</p><blockquote><pre><tt>\x63;\xe7;\x63\x68 "\x4d\x61\x79"</tt></pre></blockquote></li><li><p>A character can be represented as a decimal constant. A decimal constant shall be specified as the escape character followed bya <tt>'d'</tt> followed by two or three decimal digits. Each constant represents a byte value. Multi-byte values can be representedby concatenated constants specified in byte order with the last constant specifying the least significant byte of thecharacter.</p><p>For example:</p><blockquote><pre><tt>\d99;\d231;\d99\d104 "\d77\d97\d121"</tt></pre></blockquote></li></ol><p>Implementations may accept single-digit octal, decimal, or hexadecimal constants following the escape character. Only charactersexisting in the character set for which the locale definition is created shall be specified, whether using symbolic names, thecharacters themselves, or octal, decimal, or hexadecimal constants. If a charmap file is present, only characters defined in thecharmap can be specified using octal, decimal, or hexadecimal constants. Symbolic names not present in the charmap file can bespecified and shall be ignored, as specified under item 1 above.</p><h4><a name="tag_07_03_01"></a>LC_CTYPE</h4><p>The <i>LC_CTYPE</i> category shall define character classification, case conversion, and other character attributes. Inaddition, a series of characters can be represented by three adjacent periods representing an ellipsis symbol ( <tt>"..."</tt> ).The ellipsis specification shall be interpreted as meaning that all values between the values preceding and following it representvalid characters. The ellipsis specification shall be valid only within a single encoded character set; that is, within a group ofcharacters of the same size. An ellipsis shall be interpreted as including in the list all characters with an encoded value higherthan the encoded value of the character preceding the ellipsis and lower than the encoded value of the character following theellipsis.</p><p>For example:</p><blockquote><pre><tt>\x30;...;\x39;</tt></pre></blockquote><p>includes in the character class all characters with encoded values between the endpoints.</p><p>The following keywords shall be recognized. In the descriptions, the term "automatically included" means that it shall not bean error either to include or omit any of the referenced characters; the implementation provides them if missing (even if theentire keyword is missing) and accepts them silently if present. When the implementation automatically includes a missingcharacter, it shall have an encoded value dependent on the charmap file in effect (see the description of the <a href="../utilities/localedef.html"><i>localedef</i></a> <b>-f</b> option); otherwise, it shall have a value derived from animplementation-defined character mapping.</p><p>The character classes <b>digit</b>, <b>xdigit</b>, <b>lower</b>, <b>upper</b>, and <b>space</b> have a set of automaticallyincluded characters. These only need to be specified if the character values (that is, encoding) differ from the implementationdefault values. It is not possible to define a locale without these automatically included characters unless some implementationextension is used to prevent their inclusion. Such a definition would not be a proper superset of the C or POSIX locale and, thus,it might not be possible for conforming applications to work properly.</p><dl compact><dt><b>copy</b></dt><dd>Specify the name of an existing locale which shall be used as the definition of this category. If this keyword is specified, noother keyword shall be specified.</dd><dt><b>upper</b></dt><dd>Define characters to be classified as uppercase letters. <p>In the POSIX locale, the 26 uppercase letters shall be included:</p><blockquote><pre><tt>A B C D E F G H I J K L M N O P Q R S T U V W X Y Z</tt></pre></blockquote><p>In a locale definition file, no character specified for the keywords <b>cntrl</b>, <b>digit</b>, <b>punct</b>, or <b>space</b>shall be specified. The uppercase letters <A> to <Z>, as defined in <a href="xbd_chap06.html#tag_06_04"><i>CharacterSet Description File</i></a> (the portable character set), are automatically included in this class.</p></dd><dt><b>lower</b></dt><dd>Define characters to be classified as lowercase letters. <p>In the POSIX locale, the 26 lowercase letters shall be included:</p><blockquote><pre><tt>a b c d e f g h i j k l m n o p q r s t u v w x y z</tt></pre></blockquote><p>In a locale definition file, no character specified for the keywords <b>cntrl</b>, <b>digit</b>, <b>punct</b>, or <b>space</b>shall be specified. The lowercase letters <a> to <z> of the portable character set are automatically included in thisclass.</p></dd><dt><b>alpha</b></dt><dd>Define characters to be classified as letters. <p>In the POSIX locale, all characters in the classes <b>upper</b> and <b>lower</b> shall be included.</p><p>In a locale definition file, no character specified for the keywords <b>cntrl</b>, <b>digit</b>, <b>punct</b>, or <b>space</b>shall be specified. Characters classified as either <b>upper</b> or <b>lower</b> are automatically included in this class.</p></dd><dt><b>digit</b></dt><dd>Define the characters to be classified as numeric digits. <p>In the POSIX locale, only:</p><blockquote><pre><tt>0 1 2 3 4 5 6 7 8 9</tt></pre></blockquote><p>shall be included.</p><p>In a locale definition file, only the digits <zero>, <one>, <two>, <three>, <four>, <five>,<six>, <seven>, <eight>, and <nine> shall be specified, and in contiguous ascending sequence by numericalvalue. The digits <zero> to <nine> of the portable character set are automatically included in this class.</p></dd><dt><b>alnum</b></dt><dd>Define characters to be classified as letters and numeric digits. Only the characters specified for the <b>alpha</b> and<b>digit</b> keywords shall be specified. Characters specified for the keywords <b>alpha</b> and <b>digit</b> are automaticallyincluded in this class.</dd><dt><b>space</b></dt><dd>Define characters to be classified as white-space characters. <p>In the POSIX locale, at a minimum, the <space>, <form-feed>, <newline>, <carriage-return>, <tab>,and <vertical-tab> shall be included.</p><p>In a locale definition file, no character specified for the keywords <b>upper</b>, <b>lower</b>, <b>alpha</b>, <b>digit</b>,<b>graph</b>, or <b>xdigit</b> shall be specified. The <space>, <form-feed>, <newline>, <carriage-return>,<tab>, and <vertical-tab> of the portable character set, and any characters included in the class <b>blank</b> areautomatically included in this class.</p></dd><dt><b>cntrl</b></dt><dd>Define characters to be classified as control characters. <p>In the POSIX locale, no characters in classes <b>alpha</b> or <b>print</b> shall be included.</p><p>In a locale definition file, no character specified for the keywords <b>upper</b>, <b>lower</b>, <b>alpha</b>, <b>digit</b>,<b>punct</b>, <b>graph</b>, <b>print</b>, or <b>xdigit</b> shall be specified.</p></dd><dt><b>punct</b></dt><dd>Define characters to be classified as punctuation characters. <p>In the POSIX locale, neither the <space> nor any characters in classes <b>alpha</b>, <b>digit</b>, or <b>cntrl</b> shallbe included.</p><p>In a locale definition file, no character specified for the keywords <b>upper</b>, <b>lower</b>, <b>alpha</b>, <b>digit</b>,<b>cntrl</b>, <b>xdigit</b>, or as the <space> shall be specified.</p></dd><dt><b>graph</b></dt><dd>Define characters to be classified as printable characters, not including the <space>. <p>In the POSIX locale, all characters in classes <b>alpha</b>, <b>digit</b>, and <b>punct</b> shall be included; no characters inclass <b>cntrl</b> shall be included.</p><p>In a locale definition file, characters specified for the keywords <b>upper</b>, <b>lower</b>, <b>alpha</b>, <b>digit</b>,<b>xdigit</b>, and <b>punct</b> are automatically included in this class. No character specified for the keyword <b>cntrl</b> shallbe specified.</p></dd><dt><b>print</b></dt><dd>Define characters to be classified as printable characters, including the <space>. <p>In the POSIX locale, all characters in class <b>graph</b> shall be included; no characters in class <b>cntrl</b> shall beincluded.</p><p>In a locale definition file, characters specified for the keywords <b>upper</b>, <b>lower</b>, <b>alpha</b>, <b>digit</b>,<b>xdigit</b>, <b>punct,</b> <b>graph</b>, and the <space> are automatically included in this class. No character specifiedfor the keyword <b>cntrl</b> shall be specified.</p></dd><dt><b>xdigit</b></dt><dd>Define the characters to be classified as hexadecimal digits. <p>In the POSIX locale, only:</p><blockquote><pre><tt>0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f</tt></pre></blockquote><p>shall be included.</p><p>In a locale definition file, only the characters defined for the class <b>digit</b> shall be specified, in contiguous ascendingsequence by numerical value, followed by one or more sets of six characters representing the hexadecimal digits 10 to 15 inclusive,with each set in ascending order (for example, <A>, <B>, <C>, <D>, <E>, <F>, <a>,<b>, <c>, <d>, <e>, <f>). The digits <zero> to <nine>, the uppercase letters <A> to<F>, and the lowercase letters <a> to <f> of the portable character set are automatically included in thisclass.</p></dd><dt><b>blank</b></dt><dd>Define characters to be classified as <blank>s. <p>In the POSIX locale, only the <space> and <tab> shall be included.</p><p>In a locale definition file, the <space> and <tab> are automatically included in this class.</p></dd><dt><b>charclass</b></dt><dd>Define one or more locale-specific character class names as strings separated by semicolons. Each named character class canthen be defined subsequently in the <i>LC_CTYPE</i> definition. A character class name shall consist of at least one and at most{CHARCLASS_NAME_MAX} bytes of alphanumeric characters from the portable filename character set. The first character of a characterclass name shall not be a digit. The name shall not match any of the <i>LC_CTYPE</i> keywords defined in this volume ofIEEE Std 1003.1-2001. Future revisions of IEEE Std 1003.1-2001 will not specify any <i>LC_CTYPE</i> keywordscontaining uppercase letters.</dd><dt><i>charclass-name</i></dt><dd>Define characters to be classified as belonging to the named locale-specific character class. In the POSIX locale,locale-specific named character classes need not exist. <p>If a class name is defined by a <b>charclass</b> keyword, but no characters are subsequently assigned to it, this is not anerror; it represents a class without any characters belonging to it.</p><p>The <i>charclass-name</i> can be used as the <i>property</i> argument to the <a href="../functions/wctype.html"><i>wctype</i>()</a> function, in regular expression and shell pattern-matching bracket expressions, andby the <a href="../utilities/tr.html"><i>tr</i></a> command.</p></dd><dt><b>toupper</b></dt><dd>Define the mapping of lowercase letters to uppercase letters. <p>In the POSIX locale, at a minimum, the 26 lowercase characters:</p><blockquote><pre><tt>a b c d e f g h i j k l m n o p q r s t u v w x y z</tt>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -