⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 encode.html

📁 perl教程
💻 HTML
📖 第 1 页 / 共 3 页
字号:
<?xml version="1.0" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!-- saved from url=(0017)http://localhost/ -->
<script language="JavaScript" src="../displayToc.js"></script>
<script language="JavaScript" src="../tocParas.js"></script>
<script language="JavaScript" src="../tocTab.js"></script>
<link rel="stylesheet" type="text/css" href="../scineplex.css">
<title>Encode - character encodings</title>
<link rel="stylesheet" href="../Active.css" type="text/css" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:" />
</head>

<body>

<script>writelinks('__top__',1);</script>
<h1><a>Encode - character encodings</a></h1>
<p><a name="__index__"></a></p>

<!-- INDEX BEGIN -->

<ul>

	<li><a href="#name">NAME</a></li>
	<li><a href="#synopsis">SYNOPSIS</a></li>
	<ul>

		<li><a href="#table_of_contents">Table of Contents</a></li>
	</ul>

	<li><a href="#description">DESCRIPTION</a></li>
	<ul>

		<li><a href="#terminology">TERMINOLOGY</a></li>
	</ul>

	<li><a href="#perl_encoding_api">PERL ENCODING API</a></li>
	<ul>

		<li><a href="#listing_available_encodings">Listing available encodings</a></li>
		<li><a href="#defining_aliases">Defining Aliases</a></li>
	</ul>

	<li><a href="#encoding_via_perlio">Encoding via PerlIO</a></li>
	<li><a href="#handling_malformed_data">Handling Malformed Data</a></li>
	<ul>

		<li><a href="#coderef_for_check">coderef for CHECK</a></li>
	</ul>

	<li><a href="#defining_encodings">Defining Encodings</a></li>
	<li><a href="#the_utf8_flag">The UTF-8 flag</a></li>
	<ul>

		<li><a href="#messing_with_perl_s_internals">Messing with Perl's Internals</a></li>
	</ul>

	<li><a href="#utf8_vs__utf8">UTF-8 vs. utf8</a></li>
	<li><a href="#see_also">SEE ALSO</a></li>
	<li><a href="#maintainer">MAINTAINER</a></li>
</ul>
<!-- INDEX END -->

<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p>Encode - character encodings</p>
<p>
</p>
<hr />
<h1><a name="synopsis">SYNOPSIS</a></h1>
<pre>
    <span class="keyword">use</span> <span class="variable">Encode</span><span class="operator">;</span>
</pre>
<p>
</p>
<h2><a name="table_of_contents">Table of Contents</a></h2>
<p>Encode consists of a collection of modules whose details are too big
to fit in one document.  This POD itself explains the top-level APIs
and general topics at a glance.  For other topics and more details,
see the PODs below:</p>
<pre>
  Name                          Description
  --------------------------------------------------------
  Encode::Alias         Alias definitions to encodings
  Encode::Encoding      Encode Implementation Base Class
  Encode::Supported     List of Supported Encodings
  Encode::CN            Simplified Chinese Encodings
  Encode::JP            Japanese Encodings
  Encode::KR            Korean Encodings
  Encode::TW            Traditional Chinese Encodings
  --------------------------------------------------------</pre>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>The <code>Encode</code> module provides the interfaces between Perl's strings
and the rest of the system.  Perl strings are sequences of
<strong>characters</strong>.</p>
<p>The repertoire of characters that Perl can represent is at least that
defined by the Unicode Consortium. On most platforms the ordinal
values of the characters (as returned by <a href="../lib/Pod/perlfunc.html#item_ord"><code>ord(ch)</code></a>) is the &quot;Unicode
codepoint&quot; for the character (the exceptions are those platforms where
the legacy encoding is some variant of EBCDIC rather than a super-set
of ASCII - see <a href="../lib/Pod/perlebcdic.html">the perlebcdic manpage</a>).</p>
<p>Traditionally, computer data has been moved around in 8-bit chunks
often called &quot;bytes&quot;. These chunks are also known as &quot;octets&quot; in
networking standards. Perl is widely used to manipulate data of many
types - not only strings of characters representing human or computer
languages but also &quot;binary&quot; data being the machine's representation of
numbers, pixels in an image - or just about anything.</p>
<p>When Perl is processing &quot;binary data&quot;, the programmer wants Perl to
process &quot;sequences of bytes&quot;. This is not a problem for Perl - as a
byte has 256 possible values, it easily fits in Perl's much larger
&quot;logical character&quot;.</p>
<p>
</p>
<h2><a name="terminology">TERMINOLOGY</a></h2>
<ul>
<li>
<p><em>character</em>: a character in the range 0..(2**32-1) (or more).
(What Perl's strings are made of.)</p>
</li>
<li>
<p><em>byte</em>: a character in the range 0..255
(A special case of a Perl character.)</p>
</li>
<li>
<p><em>octet</em>: 8 bits of data, with ordinal values 0..255
(Term for bytes passed to or from a non-Perl context, e.g. a disk file.)</p>
</li>
</ul>
<p>
</p>
<hr />
<h1><a name="perl_encoding_api">PERL ENCODING API</a></h1>
<dl>
<dt><strong><a name="item_encode">$octets  = encode(ENCODING, $string [, CHECK])</a></strong>

<dd>
<p>Encodes a string from Perl's internal form into <em>ENCODING</em> and returns
a sequence of octets.  ENCODING can be either a canonical name or
an alias.  For encoding names and aliases, see <a href="#defining_aliases">Defining Aliases</a>.
For CHECK, see <a href="#handling_malformed_data">Handling Malformed Data</a>.</p>
</dd>
<dd>
<p>For example, to convert a string from Perl's internal format to
iso-8859-1 (also known as Latin1),</p>
</dd>
<dd>
<pre>
  <span class="variable">$octets</span> <span class="operator">=</span> <span class="variable">encode</span><span class="operator">(</span><span class="string">"iso-8859-1"</span><span class="operator">,</span> <span class="variable">$string</span><span class="operator">);</span>
</pre>
</dd>
<dd>
<p><strong>CAVEAT</strong>: When you run <a href="#item_encode"><code>$octets = encode(&quot;utf8&quot;, $string)</code></a>, then $octets
<strong>may not be equal to</strong> $string.  Though they both contain the same data, the utf8 flag
for $octets is <strong>always</strong> off.  When you encode anything, utf8 flag of
the result is always off, even when it contains completely valid utf8
string. See <a href="#the_utf8_flag">The UTF-8 flag</a> below.</p>
</dd>
<dd>
<p>If the $string is <a href="../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> then <a href="../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> is returned.</p>
</dd>
</li>
<dt><strong><a name="item_decode">$string = decode(ENCODING, $octets [, CHECK])</a></strong>

<dd>
<p>Decodes a sequence of octets assumed to be in <em>ENCODING</em> into Perl's
internal form and returns the resulting string.  As in encode(),
ENCODING can be either a canonical name or an alias. For encoding names
and aliases, see <a href="#defining_aliases">Defining Aliases</a>.  For CHECK, see
<a href="#handling_malformed_data">Handling Malformed Data</a>.</p>
</dd>
<dd>
<p>For example, to convert ISO-8859-1 data to a string in Perl's internal format:</p>
</dd>
<dd>
<pre>
  <span class="variable">$string</span> <span class="operator">=</span> <span class="variable">decode</span><span class="operator">(</span><span class="string">"iso-8859-1"</span><span class="operator">,</span> <span class="variable">$octets</span><span class="operator">);</span>
</pre>
</dd>
<dd>
<p><strong>CAVEAT</strong>: When you run <a href="#item_decode"><code>$string = decode(&quot;utf8&quot;, $octets)</code></a>, then $string
<strong>may not be equal to</strong> $octets.  Though they both contain the same data,
the utf8 flag for $string is on unless $octets entirely consists of
ASCII data (or EBCDIC on EBCDIC machines).  See <a href="#the_utf8_flag">The UTF-8 flag</a>
below.</p>
</dd>
<dd>
<p>If the $string is <a href="../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> then <a href="../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> is returned.</p>
</dd>
</li>
<dt><strong><a name="item_from_to">[$length =] from_to($octets, FROM_ENC, TO_ENC [, CHECK])</a></strong>

<dd>
<p>Converts <strong>in-place</strong> data between two encodings. The data in $octets
must be encoded as octets and not as characters in Perl's internal
format. For example, to convert ISO-8859-1 data to Microsoft's CP1250
encoding:</p>
</dd>
<dd>
<pre>
  <span class="variable">from_to</span><span class="operator">(</span><span class="variable">$octets</span><span class="operator">,</span> <span class="string">"iso-8859-1"</span><span class="operator">,</span> <span class="string">"cp1250"</span><span class="operator">);</span>
</pre>
</dd>
<dd>
<p>and to convert it back:</p>
</dd>
<dd>
<pre>
  <span class="variable">from_to</span><span class="operator">(</span><span class="variable">$octets</span><span class="operator">,</span> <span class="string">"cp1250"</span><span class="operator">,</span> <span class="string">"iso-8859-1"</span><span class="operator">);</span>
</pre>
</dd>
<dd>
<p>Note that because the conversion happens in place, the data to be
converted cannot be a string constant; it must be a scalar variable.</p>
</dd>
<dd>
<p><a href="#item_from_to"><code>from_to()</code></a> returns the length of the converted string in octets on

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -