📄 encode.html
字号:
<?xml version="1.0" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!-- saved from url=(0017)http://localhost/ -->
<script language="JavaScript" src="../displayToc.js"></script>
<script language="JavaScript" src="../tocParas.js"></script>
<script language="JavaScript" src="../tocTab.js"></script>
<link rel="stylesheet" type="text/css" href="../scineplex.css">
<title>Encode - character encodings</title>
<link rel="stylesheet" href="../Active.css" type="text/css" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:" />
</head>
<body>
<script>writelinks('__top__',1);</script>
<h1><a>Encode - character encodings</a></h1>
<p><a name="__index__"></a></p>
<!-- INDEX BEGIN -->
<ul>
<li><a href="#name">NAME</a></li>
<li><a href="#synopsis">SYNOPSIS</a></li>
<ul>
<li><a href="#table_of_contents">Table of Contents</a></li>
</ul>
<li><a href="#description">DESCRIPTION</a></li>
<ul>
<li><a href="#terminology">TERMINOLOGY</a></li>
</ul>
<li><a href="#perl_encoding_api">PERL ENCODING API</a></li>
<ul>
<li><a href="#listing_available_encodings">Listing available encodings</a></li>
<li><a href="#defining_aliases">Defining Aliases</a></li>
</ul>
<li><a href="#encoding_via_perlio">Encoding via PerlIO</a></li>
<li><a href="#handling_malformed_data">Handling Malformed Data</a></li>
<ul>
<li><a href="#coderef_for_check">coderef for CHECK</a></li>
</ul>
<li><a href="#defining_encodings">Defining Encodings</a></li>
<li><a href="#the_utf8_flag">The UTF-8 flag</a></li>
<ul>
<li><a href="#messing_with_perl_s_internals">Messing with Perl's Internals</a></li>
</ul>
<li><a href="#utf8_vs__utf8">UTF-8 vs. utf8</a></li>
<li><a href="#see_also">SEE ALSO</a></li>
<li><a href="#maintainer">MAINTAINER</a></li>
</ul>
<!-- INDEX END -->
<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p>Encode - character encodings</p>
<p>
</p>
<hr />
<h1><a name="synopsis">SYNOPSIS</a></h1>
<pre>
<span class="keyword">use</span> <span class="variable">Encode</span><span class="operator">;</span>
</pre>
<p>
</p>
<h2><a name="table_of_contents">Table of Contents</a></h2>
<p>Encode consists of a collection of modules whose details are too big
to fit in one document. This POD itself explains the top-level APIs
and general topics at a glance. For other topics and more details,
see the PODs below:</p>
<pre>
Name Description
--------------------------------------------------------
Encode::Alias Alias definitions to encodings
Encode::Encoding Encode Implementation Base Class
Encode::Supported List of Supported Encodings
Encode::CN Simplified Chinese Encodings
Encode::JP Japanese Encodings
Encode::KR Korean Encodings
Encode::TW Traditional Chinese Encodings
--------------------------------------------------------</pre>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>The <code>Encode</code> module provides the interfaces between Perl's strings
and the rest of the system. Perl strings are sequences of
<strong>characters</strong>.</p>
<p>The repertoire of characters that Perl can represent is at least that
defined by the Unicode Consortium. On most platforms the ordinal
values of the characters (as returned by <a href="../lib/Pod/perlfunc.html#item_ord"><code>ord(ch)</code></a>) is the "Unicode
codepoint" for the character (the exceptions are those platforms where
the legacy encoding is some variant of EBCDIC rather than a super-set
of ASCII - see <a href="../lib/Pod/perlebcdic.html">the perlebcdic manpage</a>).</p>
<p>Traditionally, computer data has been moved around in 8-bit chunks
often called "bytes". These chunks are also known as "octets" in
networking standards. Perl is widely used to manipulate data of many
types - not only strings of characters representing human or computer
languages but also "binary" data being the machine's representation of
numbers, pixels in an image - or just about anything.</p>
<p>When Perl is processing "binary data", the programmer wants Perl to
process "sequences of bytes". This is not a problem for Perl - as a
byte has 256 possible values, it easily fits in Perl's much larger
"logical character".</p>
<p>
</p>
<h2><a name="terminology">TERMINOLOGY</a></h2>
<ul>
<li>
<p><em>character</em>: a character in the range 0..(2**32-1) (or more).
(What Perl's strings are made of.)</p>
</li>
<li>
<p><em>byte</em>: a character in the range 0..255
(A special case of a Perl character.)</p>
</li>
<li>
<p><em>octet</em>: 8 bits of data, with ordinal values 0..255
(Term for bytes passed to or from a non-Perl context, e.g. a disk file.)</p>
</li>
</ul>
<p>
</p>
<hr />
<h1><a name="perl_encoding_api">PERL ENCODING API</a></h1>
<dl>
<dt><strong><a name="item_encode">$octets = encode(ENCODING, $string [, CHECK])</a></strong>
<dd>
<p>Encodes a string from Perl's internal form into <em>ENCODING</em> and returns
a sequence of octets. ENCODING can be either a canonical name or
an alias. For encoding names and aliases, see <a href="#defining_aliases">Defining Aliases</a>.
For CHECK, see <a href="#handling_malformed_data">Handling Malformed Data</a>.</p>
</dd>
<dd>
<p>For example, to convert a string from Perl's internal format to
iso-8859-1 (also known as Latin1),</p>
</dd>
<dd>
<pre>
<span class="variable">$octets</span> <span class="operator">=</span> <span class="variable">encode</span><span class="operator">(</span><span class="string">"iso-8859-1"</span><span class="operator">,</span> <span class="variable">$string</span><span class="operator">);</span>
</pre>
</dd>
<dd>
<p><strong>CAVEAT</strong>: When you run <a href="#item_encode"><code>$octets = encode("utf8", $string)</code></a>, then $octets
<strong>may not be equal to</strong> $string. Though they both contain the same data, the utf8 flag
for $octets is <strong>always</strong> off. When you encode anything, utf8 flag of
the result is always off, even when it contains completely valid utf8
string. See <a href="#the_utf8_flag">The UTF-8 flag</a> below.</p>
</dd>
<dd>
<p>If the $string is <a href="../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> then <a href="../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> is returned.</p>
</dd>
</li>
<dt><strong><a name="item_decode">$string = decode(ENCODING, $octets [, CHECK])</a></strong>
<dd>
<p>Decodes a sequence of octets assumed to be in <em>ENCODING</em> into Perl's
internal form and returns the resulting string. As in encode(),
ENCODING can be either a canonical name or an alias. For encoding names
and aliases, see <a href="#defining_aliases">Defining Aliases</a>. For CHECK, see
<a href="#handling_malformed_data">Handling Malformed Data</a>.</p>
</dd>
<dd>
<p>For example, to convert ISO-8859-1 data to a string in Perl's internal format:</p>
</dd>
<dd>
<pre>
<span class="variable">$string</span> <span class="operator">=</span> <span class="variable">decode</span><span class="operator">(</span><span class="string">"iso-8859-1"</span><span class="operator">,</span> <span class="variable">$octets</span><span class="operator">);</span>
</pre>
</dd>
<dd>
<p><strong>CAVEAT</strong>: When you run <a href="#item_decode"><code>$string = decode("utf8", $octets)</code></a>, then $string
<strong>may not be equal to</strong> $octets. Though they both contain the same data,
the utf8 flag for $string is on unless $octets entirely consists of
ASCII data (or EBCDIC on EBCDIC machines). See <a href="#the_utf8_flag">The UTF-8 flag</a>
below.</p>
</dd>
<dd>
<p>If the $string is <a href="../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> then <a href="../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> is returned.</p>
</dd>
</li>
<dt><strong><a name="item_from_to">[$length =] from_to($octets, FROM_ENC, TO_ENC [, CHECK])</a></strong>
<dd>
<p>Converts <strong>in-place</strong> data between two encodings. The data in $octets
must be encoded as octets and not as characters in Perl's internal
format. For example, to convert ISO-8859-1 data to Microsoft's CP1250
encoding:</p>
</dd>
<dd>
<pre>
<span class="variable">from_to</span><span class="operator">(</span><span class="variable">$octets</span><span class="operator">,</span> <span class="string">"iso-8859-1"</span><span class="operator">,</span> <span class="string">"cp1250"</span><span class="operator">);</span>
</pre>
</dd>
<dd>
<p>and to convert it back:</p>
</dd>
<dd>
<pre>
<span class="variable">from_to</span><span class="operator">(</span><span class="variable">$octets</span><span class="operator">,</span> <span class="string">"cp1250"</span><span class="operator">,</span> <span class="string">"iso-8859-1"</span><span class="operator">);</span>
</pre>
</dd>
<dd>
<p>Note that because the conversion happens in place, the data to be
converted cannot be a string constant; it must be a scalar variable.</p>
</dd>
<dd>
<p><a href="#item_from_to"><code>from_to()</code></a> returns the length of the converted string in octets on
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -