utf8.html
来自「perl教程」· HTML 代码 · 共 218 行
HTML
218 行
<?xml version="1.0" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!-- saved from url=(0017)http://localhost/ -->
<script language="JavaScript" src="../displayToc.js"></script>
<script language="JavaScript" src="../tocParas.js"></script>
<script language="JavaScript" src="../tocTab.js"></script>
<link rel="stylesheet" type="text/css" href="../scineplex.css">
<title>utf8 - Perl pragma to enable/disable UTF-8 in source code</title>
<link rel="stylesheet" href="../Active.css" type="text/css" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:" />
</head>
<body>
<script>writelinks('__top__',1);</script>
<h1><a>utf8 - Perl pragma to enable/disable UTF-8 in source code</a></h1>
<p><a name="__index__"></a></p>
<!-- INDEX BEGIN -->
<ul>
<li><a href="#name">NAME</a></li>
<li><a href="#synopsis">SYNOPSIS</a></li>
<li><a href="#description">DESCRIPTION</a></li>
<ul>
<li><a href="#utility_functions">Utility functions</a></li>
</ul>
<li><a href="#bugs">BUGS</a></li>
<li><a href="#see_also">SEE ALSO</a></li>
</ul>
<!-- INDEX END -->
<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p>utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code</p>
<p>
</p>
<hr />
<h1><a name="synopsis">SYNOPSIS</a></h1>
<pre>
<span class="keyword">use</span> <span class="variable">utf8</span><span class="operator">;</span>
<span class="keyword">no</span> <span class="variable">utf8</span><span class="operator">;</span>
</pre>
<pre>
<span class="comment"># Convert a Perl scalar to/from UTF-8.</span>
<span class="variable">$num_octets</span> <span class="operator">=</span> <span class="variable">utf8::upgrade</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">);</span>
<span class="variable">$success</span> <span class="operator">=</span> <span class="variable">utf8::downgrade</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">[</span><span class="operator">,</span> <span class="variable">FAIL_OK</span><span class="operator">]</span><span class="operator">);</span>
</pre>
<pre>
<span class="comment"># Change the native bytes of a Perl scalar to/from UTF-8 bytes.</span>
<span class="variable">utf8::encode</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">);</span>
<span class="variable">utf8::decode</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">);</span>
</pre>
<pre>
<span class="variable">$flag</span> <span class="operator">=</span> <span class="variable">utf8::is_utf8</span><span class="operator">(</span><span class="variable">STRING</span><span class="operator">);</span> <span class="comment"># since Perl 5.8.1</span>
<span class="variable">$flag</span> <span class="operator">=</span> <span class="variable">utf8::valid</span><span class="operator">(</span><span class="variable">STRING</span><span class="operator">);</span>
</pre>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>The <code>use utf8</code> pragma tells the Perl parser to allow UTF-8 in the
program text in the current lexical scope (allow UTF-EBCDIC on EBCDIC based
platforms). The <code>no utf8</code> pragma tells Perl to switch back to treating
the source text as literal bytes in the current lexical scope.</p>
<p>This pragma is primarily a compatibility device. Perl versions
earlier than 5.6 allowed arbitrary bytes in source code, whereas
in future we would like to standardize on the UTF-8 encoding for
source text.</p>
<p><strong>Do not use this pragma for anything else than telling Perl that your
script is written in UTF-8.</strong> The utility functions described below are
useful for their own purposes, but they are not really part of the
"pragmatic" effect.</p>
<p>Until UTF-8 becomes the default format for source text, either this
pragma or the <a href="../lib/encoding.html">the encoding manpage</a> pragma should be used to recognize UTF-8
in the source. When UTF-8 becomes the standard source format, this
pragma will effectively become a no-op. For convenience in what
follows the term <em>UTF-X</em> is used to refer to UTF-8 on ASCII and ISO
Latin based platforms and UTF-EBCDIC on EBCDIC based platforms.</p>
<p>See also the effects of the <code>-C</code> switch and its cousin, the
<code>$ENV{PERL_UNICODE}</code>, in <a href="../lib/Pod/perlrun.html">the perlrun manpage</a>.</p>
<p>Enabling the <code>utf8</code> pragma has the following effect:</p>
<ul>
<li>
<p>Bytes in the source text that have their high-bit set will be treated
as being part of a literal UTF-8 character. This includes most
literals such as identifier names, string constants, and constant
regular expression patterns.</p>
<p>On EBCDIC platforms characters in the Latin 1 character set are
treated as being part of a literal UTF-EBCDIC character.</p>
</li>
</ul>
<p>Note that if you have bytes with the eighth bit on in your script
(for example embedded Latin-1 in your string literals), <code>use utf8</code>
will be unhappy since the bytes are most probably not well-formed
UTF-8. If you want to have such bytes and use utf8, you can disable
utf8 until the end the block (or file, if at top level) by <code>no utf8;</code>.</p>
<p>If you want to automatically upgrade your 8-bit legacy bytes to UTF-8,
use the <a href="../lib/encoding.html">the encoding manpage</a> pragma instead of this pragma. For example, if
you want to implicitly upgrade your ISO 8859-1 (Latin-1) bytes to UTF-8
as used in e.g. <a href="../lib/Pod/perlfunc.html#item_chr"><code>chr()</code></a> and <code>\x{...}</code>, try this:</p>
<pre>
<span class="keyword">use</span> <span class="variable">encoding</span> <span class="string">"latin-1"</span><span class="operator">;</span>
<span class="keyword">my</span> <span class="variable">$c</span> <span class="operator">=</span> <span class="keyword">chr</span><span class="operator">(</span><span class="number">0xc4</span><span class="operator">);</span>
<span class="keyword">my</span> <span class="variable">$x</span> <span class="operator">=</span> <span class="string">"\x{c5}"</span><span class="operator">;</span>
</pre>
<p>In case you are wondering: yes, <code>use encoding 'utf8';</code> works much
the same as <code>use utf8;</code>.</p>
<p>
</p>
<h2><a name="utility_functions">Utility functions</a></h2>
<p>The following functions are defined in the <code>utf8::</code> package by the
Perl core. You do not need to say <code>use utf8</code> to use these and in fact
you should not say that unless you really want to have UTF-8 source code.</p>
<ul>
<li><strong><a name="item_upgrade">$num_octets = utf8::upgrade($string)</a></strong>
<p>Converts in-place the octet sequence in the native encoding
(Latin-1 or EBCDIC) to the equivalent character sequence in <em>UTF-X</em>.
<em>$string</em> already encoded as characters does no harm.
Returns the number of octets necessary to represent the string as <em>UTF-X</em>.
Can be used to make sure that the UTF-8 flag is on,
so that <code>\w</code> or <a href="../lib/Pod/perlfunc.html#item_lc"><code>lc()</code></a> work as Unicode on strings
containing characters in the range 0x80-0xFF (on ASCII and
derivatives).</p>
<p><strong>Note that this function does not handle arbitrary encodings.</strong>
Therefore <em>Encode.pm</em> is recommended for the general purposes.</p>
<p>Affected by the encoding pragma.</p>
</li>
<li><strong><a name="item_downgrade">$success = utf8::downgrade($string[, FAIL_OK])</a></strong>
<p>Converts in-place the character sequence in <em>UTF-X</em>
to the equivalent octet sequence in the native encoding (Latin-1 or EBCDIC).
<em>$string</em> already encoded as octets does no harm.
Returns true on success. On failure dies or, if the value of
<code>FAIL_OK</code> is true, returns false.
Can be used to make sure that the UTF-8 flag is off,
e.g. when you want to make sure that the <a href="../lib/Pod/perlvar.html#item_substr"><code>substr()</code></a> or <a href="../lib/Pod/perlfunc.html#item_length"><code>length()</code></a> function
works with the usually faster byte algorithm.</p>
<p><strong>Note that this function does not handle arbitrary encodings.</strong>
Therefore <em>Encode.pm</em> is recommended for the general purposes.</p>
<p><strong>Not</strong> affected by the encoding pragma.</p>
<p><strong>NOTE:</strong> this function is experimental and may change
or be removed without notice.</p>
</li>
<li><strong><a name="item_encode">utf8::encode($string)</a></strong>
<p>Converts in-place the character sequence to the corresponding octet sequence
in <em>UTF-X</em>. The UTF-8 flag is turned off. Returns nothing.</p>
<p><strong>Note that this function does not handle arbitrary encodings.</strong>
Therefore <em>Encode.pm</em> is recommended for the general purposes.</p>
</li>
<li><strong><a name="item_decode">utf8::decode($string)</a></strong>
<p>Attempts to convert in-place the octet sequence in <em>UTF-X</em>
to the corresponding character sequence. The UTF-8 flag is turned on
only if the source string contains multiple-byte <em>UTF-X</em> characters.
If <em>$string</em> is invalid as <em>UTF-X</em>, returns false; otherwise returns true.</p>
<p><strong>Note that this function does not handle arbitrary encodings.</strong>
Therefore <em>Encode.pm</em> is recommended for the general purposes.</p>
<p><strong>NOTE:</strong> this function is experimental and may change
or be removed without notice.</p>
</li>
<li><strong><a name="item_is_utf8">$flag = utf8::is_utf8(STRING)</a></strong>
<p>(Since Perl 5.8.1) Test whether STRING is in UTF-8. Functionally
the same as Encode::is_utf8().</p>
</li>
<li><strong><a name="item_valid">$flag = utf8::valid(STRING)</a></strong>
<p>[INTERNAL] Test whether STRING is in a consistent state regarding
UTF-8. Will return true is well-formed UTF-8 and has the UTF-8 flag
on <strong>or</strong> if string is held as bytes (both these states are 'consistent').
Main reason for this routine is to allow Perl's testsuite to check
that operations have left strings in a consistent state. You most
probably want to use utf8::is_utf8() instead.</p>
</li>
</ul>
<p><code>utf8::encode</code> is like <code>utf8::upgrade</code>, but the UTF8 flag is
cleared. See <a href="../lib/Pod/perlunicode.html">the perlunicode manpage</a> for more on the UTF8 flag and the C API
functions <code>sv_utf8_upgrade</code>, <code>sv_utf8_downgrade</code>, <code>sv_utf8_encode</code>,
and <code>sv_utf8_decode</code>, which are wrapped by the Perl functions
<code>utf8::upgrade</code>, <code>utf8::downgrade</code>, <code>utf8::encode</code> and
<code>utf8::decode</code>. Note that in the Perl 5.8.0 and 5.8.1 implementation
the functions utf8::is_utf8, utf8::valid, utf8::encode, utf8::decode,
utf8::upgrade, and utf8::downgrade are always available, without a
<code>require utf8</code> statement-- this may change in future releases.</p>
<p>
</p>
<hr />
<h1><a name="bugs">BUGS</a></h1>
<p>One can have Unicode in identifier names, but not in package/class or
subroutine names. While some limited functionality towards this does
exist as of Perl 5.8.0, that is more accidental than designed; use of
Unicode for the said purposes is unsupported.</p>
<p>One reason of this unfinishedness is its (currently) inherent
unportability: since both package names and subroutine names may need
to be mapped to file and directory names, the Unicode capability of
the filesystem becomes important-- and there unfortunately aren't
portable answers.</p>
<p>
</p>
<hr />
<h1><a name="see_also">SEE ALSO</a></h1>
<p><a href="../lib/Pod/perluniintro.html">the perluniintro manpage</a>, <a href="../lib/encoding.html">the encoding manpage</a>, <a href="../lib/Pod/perlrun.html">the perlrun manpage</a>, <a href="../lib/bytes.html">the bytes manpage</a>, <a href="../lib/Pod/perlunicode.html">the perlunicode manpage</a></p>
</body>
</html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?