perllocale.pod

来自「ARM上的如果你对底层感兴趣」· POD 代码 · 共 977 行 · 第 1/3 页

POD
977
字号
=head1 NAME

perllocale - Perl locale handling (internationalization and localization)

=head1 DESCRIPTION

Perl supports language-specific notions of data such as "is this
a letter", "what is the uppercase equivalent of this letter", and
"which of these letters comes first".  These are important issues,
especially for languages other than English--but also for English: it
would be naE<iuml>ve to imagine that C<A-Za-z> defines all the "letters"
needed to write in English. Perl is also aware that some character other
than '.' may be preferred as a decimal point, and that output date
representations may be language-specific.  The process of making an
application take account of its users' preferences in such matters is
called B<internationalization> (often abbreviated as B<i18n>); telling
such an application about a particular set of preferences is known as
B<localization> (B<l10n>).

Perl can understand language-specific data via the standardized (ISO C,
XPG4, POSIX 1.c) method called "the locale system". The locale system is
controlled per application using one pragma, one function call, and
several environment variables.

B<NOTE>: This feature is new in Perl 5.004, and does not apply unless an
application specifically requests it--see L<Backward compatibility>.
The one exception is that write() now B<always> uses the current locale
- see L<"NOTES">.

=head1 PREPARING TO USE LOCALES

If Perl applications are to understand and present your data
correctly according a locale of your choice, B<all> of the following
must be true:

=over 4

=item *

B<Your operating system must support the locale system>.  If it does,
you should find that the setlocale() function is a documented part of
its C library.

=item *

B<Definitions for locales that you use must be installed>.  You, or
your system administrator, must make sure that this is the case. The
available locales, the location in which they are kept, and the manner
in which they are installed all vary from system to system.  Some systems
provide only a few, hard-wired locales and do not allow more to be
added.  Others allow you to add "canned" locales provided by the system
supplier.  Still others allow you or the system administrator to define
and add arbitrary locales.  (You may have to ask your supplier to
provide canned locales that are not delivered with your operating
system.)  Read your system documentation for further illumination.

=item *

B<Perl must believe that the locale system is supported>.  If it does,
C<perl -V:d_setlocale> will say that the value for C<d_setlocale> is
C<define>.

=back

If you want a Perl application to process and present your data
according to a particular locale, the application code should include
the S<C<use locale>> pragma (see L<The use locale pragma>) where
appropriate, and B<at least one> of the following must be true:

=over 4

=item *

B<The locale-determining environment variables (see L<"ENVIRONMENT">)
must be correctly set up> at the time the application is started, either
by yourself or by whoever set up your system account.

=item *

B<The application must set its own locale> using the method described in
L<The setlocale function>.

=back

=head1 USING LOCALES

=head2 The use locale pragma

By default, Perl ignores the current locale.  The S<C<use locale>>
pragma tells Perl to use the current locale for some operations:

=over 4

=item *

B<The comparison operators> (C<lt>, C<le>, C<cmp>, C<ge>, and C<gt>) and
the POSIX string collation functions strcoll() and strxfrm() use
C<LC_COLLATE>.  sort() is also affected if used without an
explicit comparison function, because it uses C<cmp> by default.

B<Note:> C<eq> and C<ne> are unaffected by locale: they always
perform a byte-by-byte comparison of their scalar operands.  What's
more, if C<cmp> finds that its operands are equal according to the
collation sequence specified by the current locale, it goes on to
perform a byte-by-byte comparison, and only returns I<0> (equal) if the
operands are bit-for-bit identical.  If you really want to know whether
two strings--which C<eq> and C<cmp> may consider different--are equal
as far as collation in the locale is concerned, see the discussion in
L<Category LC_COLLATE: Collation>.

=item *

B<Regular expressions and case-modification functions> (uc(), lc(),
ucfirst(), and lcfirst()) use C<LC_CTYPE>

=item *

B<The formatting functions> (printf(), sprintf() and write()) use
C<LC_NUMERIC>

=item *

B<The POSIX date formatting function> (strftime()) uses C<LC_TIME>.

=back

C<LC_COLLATE>, C<LC_CTYPE>, and so on, are discussed further in L<LOCALE
CATEGORIES>.

The default behavior is restored with the S<C<no locale>> pragma, or
upon reaching the end of block enclosing C<use locale>.

The string result of any operation that uses locale
information is tainted, as it is possible for a locale to be
untrustworthy.  See L<"SECURITY">.

=head2 The setlocale function

You can switch locales as often as you wish at run time with the
POSIX::setlocale() function:

        # This functionality not usable prior to Perl 5.004
        require 5.004;

        # Import locale-handling tool set from POSIX module.
        # This example uses: setlocale -- the function call
        #                    LC_CTYPE -- explained below
        use POSIX qw(locale_h);

        # query and save the old locale
        $old_locale = setlocale(LC_CTYPE);

        setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
        # LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"

        setlocale(LC_CTYPE, "");
        # LC_CTYPE now reset to default defined by LC_ALL/LC_CTYPE/LANG
        # environment variables.  See below for documentation.

        # restore the old locale
        setlocale(LC_CTYPE, $old_locale);

The first argument of setlocale() gives the B<category>, the second the
B<locale>.  The category tells in what aspect of data processing you
want to apply locale-specific rules.  Category names are discussed in
L<LOCALE CATEGORIES> and L<"ENVIRONMENT">.  The locale is the name of a
collection of customization information corresponding to a particular
combination of language, country or territory, and codeset.  Read on for
hints on the naming of locales: not all systems name locales as in the
example.

If no second argument is provided and the category is something else
than LC_ALL, the function returns a string naming the current locale
for the category.  You can use this value as the second argument in a
subsequent call to setlocale().

If no second argument is provided and the category is LC_ALL, the
result is implementation-dependent.  It may be a string of
concatenated locales names (separator also implementation-dependent)
or a single locale name.  Please consult your L<setlocale(3)> for
details.

If a second argument is given and it corresponds to a valid locale,
the locale for the category is set to that value, and the function
returns the now-current locale value.  You can then use this in yet
another call to setlocale().  (In some implementations, the return
value may sometimes differ from the value you gave as the second
argument--think of it as an alias for the value you gave.)

As the example shows, if the second argument is an empty string, the
category's locale is returned to the default specified by the
corresponding environment variables.  Generally, this results in a
return to the default that was in force when Perl started up: changes
to the environment made by the application after startup may or may not
be noticed, depending on your system's C library.

If the second argument does not correspond to a valid locale, the locale
for the category is not changed, and the function returns I<undef>.

For further information about the categories, consult L<setlocale(3)>.

=head2 Finding locales

For locales available in your system, consult also L<setlocale(3)> to
see whether it leads to the list of available locales (search for the
I<SEE ALSO> section).  If that fails, try the following command lines:

        locale -a

        nlsinfo

        ls /usr/lib/nls/loc

        ls /usr/lib/locale

        ls /usr/lib/nls

and see whether they list something resembling these

        en_US.ISO8859-1     de_DE.ISO8859-1     ru_RU.ISO8859-5
        en_US.iso88591      de_DE.iso88591      ru_RU.iso88595
        en_US               de_DE               ru_RU
        en                  de                  ru
        english             german              russian
        english.iso88591    german.iso88591     russian.iso88595
        english.roman8                          russian.koi8r

Sadly, even though the calling interface for setlocale() has
been standardized, names of locales and the directories where the
configuration resides have not been.  The basic form of the name is
I<language_country/territory>B<.>I<codeset>, but the latter parts after
I<language> are not always present.  The I<language> and I<country> are
usually from the standards B<ISO 3166> and B<ISO 639>, the two-letter
abbreviations for the countries and the languages of the world,
respectively.  The I<codeset> part often mentions some B<ISO 8859>
character set, the Latin codesets.  For example, C<ISO 8859-1> is the
so-called "Western codeset" that can be used to encode most Western
European languages.  Again, there are several ways to write even the
name of that one standard.  Lamentably.

Two special locales are worth particular mention: "C" and "POSIX".
Currently these are effectively the same locale: the difference is
mainly that the first one is defined by the C standard, the second by
the POSIX standard.  They define the B<default locale> in which
every program starts in the absence of locale information in its
environment.  (The I<default> default locale, if you will.)  Its language
is (American) English and its character codeset ASCII.

B<NOTE>: Not all systems have the "POSIX" locale (not all systems are
POSIX-conformant), so use "C" when you need explicitly to specify this
default locale.

=head2 LOCALE PROBLEMS

You may encounter the following warning message at Perl startup:

	perl: warning: Setting locale failed.
	perl: warning: Please check that your locale settings:
	        LC_ALL = "En_US",
	        LANG = (unset)
	    are supported and installed on your system.
	perl: warning: Falling back to the standard locale ("C").

This means that your locale settings had LC_ALL set to "En_US" and
LANG exists but has no value.  Perl tried to believe you but could not.
Instead, Perl gave up and fell back to the "C" locale, the default locale
that is supposed to work no matter what.  This usually means your locale
settings were wrong, they mention locales your system has never heard
of, or the locale installation in your system has problems (for example,
some system files are broken or missing).  There are quick and temporary
fixes to these problems, as well as more thorough and lasting fixes.

=head2 Temporarily fixing locale problems

The two quickest fixes are either to render Perl silent about any
locale inconsistencies or to run Perl under the default locale "C".

Perl's moaning about locale problems can be silenced by setting the
environment variable PERL_BADLANG to a non-zero value, for example
"1".  This method really just sweeps the problem under the carpet: you
tell Perl to shut up even when Perl sees that something is wrong.  Do
not be surprised if later something locale-dependent misbehaves.

Perl can be run under the "C" locale by setting the environment
variable LC_ALL to "C".  This method is perhaps a bit more civilized
than the PERL_BADLANG approach, but setting LC_ALL (or
other locale variables) may affect other programs as well, not just
Perl.  In particular, external programs run from within Perl will see
these changes.  If you make the new settings permanent (read on), all
programs you run see the changes.  See L<ENVIRONMENT> for for
the full list of relevant environment variables and L<USING LOCALES>
for their effects in Perl.  Effects in other programs are 
easily deducible.  For example, the variable LC_COLLATE may well affect
your B<sort> program (or whatever the program that arranges `records'
alphabetically in your system is called).

You can test out changing these variables temporarily, and if the
new settings seem to help, put those settings into your shell startup
files.  Consult your local documentation for the exact details.  For in
Bourne-like shells (B<sh>, B<ksh>, B<bash>, B<zsh>):

	LC_ALL=en_US.ISO8859-1
	export LC_ALL

This assumes that we saw the locale "en_US.ISO8859-1" using the commands
discussed above.  We decided to try that instead of the above faulty
locale "En_US"--and in Cshish shells (B<csh>, B<tcsh>)

	setenv LC_ALL en_US.ISO8859-1
	
If you do not know what shell you have, consult your local
helpdesk or the equivalent.

=head2 Permanently fixing locale problems

The slower but superior fixes are when you may be able to yourself
fix the misconfiguration of your own environment variables.  The
mis(sing)configuration of the whole system's locales usually requires
the help of your friendly system administrator.

First, see earlier in this document about L<Finding locales>.  That tells
how to find which locales are really supported--and more importantly,
installed--on your system.  In our example error message, environment
variables affecting the locale are listed in the order of decreasing
importance (and unset variables do not matter).  Therefore, having
LC_ALL set to "En_US" must have been the bad choice, as shown by the

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?