normalize.html
来自「perl教程」· HTML 代码 · 共 500 行 · 第 1/2 页
HTML
500 行
<?xml version="1.0" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!-- saved from url=(0017)http://localhost/ -->
<script language="JavaScript" src="../../displayToc.js"></script>
<script language="JavaScript" src="../../tocParas.js"></script>
<script language="JavaScript" src="../../tocTab.js"></script>
<link rel="stylesheet" type="text/css" href="../../scineplex.css">
<title>Unicode::Normalize - Unicode Normalization Forms</title>
<link rel="stylesheet" href="../../Active.css" type="text/css" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:" />
</head>
<body>
<script>writelinks('__top__',2);</script>
<h1><a>Unicode::Normalize - Unicode Normalization Forms</a></h1>
<p><a name="__index__"></a></p>
<!-- INDEX BEGIN -->
<ul>
<li><a href="#name">NAME</a></li>
<li><a href="#synopsis">SYNOPSIS</a></li>
<li><a href="#description">DESCRIPTION</a></li>
<ul>
<li><a href="#normalization_forms">Normalization Forms</a></li>
<li><a href="#decomposition_and_composition">Decomposition and Composition</a></li>
<li><a href="#quick_check">Quick Check</a></li>
<li><a href="#character_data">Character Data</a></li>
</ul>
<li><a href="#export">EXPORT</a></li>
<li><a href="#caveats">CAVEATS</a></li>
<li><a href="#author">AUTHOR</a></li>
<li><a href="#see_also">SEE ALSO</a></li>
</ul>
<!-- INDEX END -->
<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p>Unicode::Normalize - Unicode Normalization Forms</p>
<p>
</p>
<hr />
<h1><a name="synopsis">SYNOPSIS</a></h1>
<p>(1) using function names exported by default:</p>
<pre>
<span class="keyword">use</span> <span class="variable">Unicode::Normalize</span><span class="operator">;</span>
</pre>
<pre>
<span class="variable">$NFD_string</span> <span class="operator">=</span> <span class="variable">NFD</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">);</span> <span class="comment"># Normalization Form D</span>
<span class="variable">$NFC_string</span> <span class="operator">=</span> <span class="variable">NFC</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">);</span> <span class="comment"># Normalization Form C</span>
<span class="variable">$NFKD_string</span> <span class="operator">=</span> <span class="variable">NFKD</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">);</span> <span class="comment"># Normalization Form KD</span>
<span class="variable">$NFKC_string</span> <span class="operator">=</span> <span class="variable">NFKC</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">);</span> <span class="comment"># Normalization Form KC</span>
</pre>
<p>(2) using function names exported on request:</p>
<pre>
<span class="keyword">use</span> <span class="variable">Unicode::Normalize</span> <span class="string">'normalize'</span><span class="operator">;</span>
</pre>
<pre>
<span class="variable">$NFD_string</span> <span class="operator">=</span> <span class="variable">normalize</span><span class="operator">(</span><span class="string">'D'</span><span class="operator">,</span> <span class="variable">$string</span><span class="operator">);</span> <span class="comment"># Normalization Form D</span>
<span class="variable">$NFC_string</span> <span class="operator">=</span> <span class="variable">normalize</span><span class="operator">(</span><span class="string">'C'</span><span class="operator">,</span> <span class="variable">$string</span><span class="operator">);</span> <span class="comment"># Normalization Form C</span>
<span class="variable">$NFKD_string</span> <span class="operator">=</span> <span class="variable">normalize</span><span class="operator">(</span><span class="string">'KD'</span><span class="operator">,</span> <span class="variable">$string</span><span class="operator">);</span> <span class="comment"># Normalization Form KD</span>
<span class="variable">$NFKC_string</span> <span class="operator">=</span> <span class="variable">normalize</span><span class="operator">(</span><span class="string">'KC'</span><span class="operator">,</span> <span class="variable">$string</span><span class="operator">);</span> <span class="comment"># Normalization Form KC</span>
</pre>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>Parameters:</p>
<p><code>$string</code> is used as a string under character semantics
(see <em>perlunicode</em>).</p>
<p><code>$codepoint</code> should be an unsigned integer
representing a Unicode code point.</p>
<p>Note: Between XSUB and pure Perl, there is an incompatibility
about the interpretation of <code>$codepoint</code> as a decimal number.
XSUB converts <code>$codepoint</code> to an unsigned integer, but pure Perl does not.
Do not use a floating point nor a negative sign in <code>$codepoint</code>.</p>
<p>
</p>
<h2><a name="normalization_forms">Normalization Forms</a></h2>
<dl>
<dt><strong><a name="item_nfd"><code>$NFD_string = NFD($string)</code></a></strong>
<dd>
<p>returns the Normalization Form D (formed by canonical decomposition).</p>
</dd>
</li>
<dt><strong><a name="item_nfc"><code>$NFC_string = NFC($string)</code></a></strong>
<dd>
<p>returns the Normalization Form C (formed by canonical decomposition
followed by canonical composition).</p>
</dd>
</li>
<dt><strong><a name="item_nfkd"><code>$NFKD_string = NFKD($string)</code></a></strong>
<dd>
<p>returns the Normalization Form KD (formed by compatibility decomposition).</p>
</dd>
</li>
<dt><strong><a name="item_nfkc"><code>$NFKC_string = NFKC($string)</code></a></strong>
<dd>
<p>returns the Normalization Form KC (formed by compatibility decomposition
followed by <strong>canonical</strong> composition).</p>
</dd>
</li>
<dt><strong><a name="item_fcd"><code>$FCD_string = FCD($string)</code></a></strong>
<dd>
<p>If the given string is in FCD ("Fast C or D" form; cf. UTN #5),
returns it without modification; otherwise returns an FCD string.</p>
</dd>
<dd>
<p>Note: FCD is not always unique, then plural forms may be equivalent
each other. <a href="#item_fcd"><code>FCD()</code></a> will return one of these equivalent forms.</p>
</dd>
</li>
<dt><strong><a name="item_fcc"><code>$FCC_string = FCC($string)</code></a></strong>
<dd>
<p>returns the FCC form ("Fast C Contiguous"; cf. UTN #5).</p>
</dd>
<dd>
<p>Note: FCC is unique, as well as four normalization forms (NF*).</p>
</dd>
</li>
<dt><strong><a name="item_normalize"><code>$normalized_string = normalize($form_name, $string)</code></a></strong>
<dd>
<p>As <code>$form_name</code>, one of the following names must be given.</p>
</dd>
<dd>
<pre>
'C' or 'NFC' for Normalization Form C (UAX #15)
'D' or 'NFD' for Normalization Form D (UAX #15)
'KC' or 'NFKC' for Normalization Form KC (UAX #15)
'KD' or 'NFKD' for Normalization Form KD (UAX #15)</pre>
</dd>
<dd>
<pre>
'FCD' for "Fast C or D" Form (UTN #5)
'FCC' for "Fast C Contiguous" (UTN #5)</pre>
</dd>
</li>
</dl>
<p>
</p>
<h2><a name="decomposition_and_composition">Decomposition and Composition</a></h2>
<dl>
<dt><strong><a name="item_decompose"><code>$decomposed_string = decompose($string)</code></a></strong>
<dt><strong><code>$decomposed_string = decompose($string, $useCompatMapping)</code></strong>
<dd>
<p>Decomposes the specified string and returns the result.</p>
</dd>
<dd>
<p>If the second parameter (a boolean) is omitted or false, decomposes it
using the Canonical Decomposition Mapping.
If true, decomposes it using the Compatibility Decomposition Mapping.</p>
</dd>
<dd>
<p>The string returned is not always in NFD/NFKD.
Reordering may be required.</p>
</dd>
<dd>
<pre>
<span class="variable">$NFD_string</span> <span class="operator">=</span> <span class="variable">reorder</span><span class="operator">(</span><span class="variable">decompose</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">));</span> <span class="comment"># eq. to NFD()</span>
<span class="variable">$NFKD_string</span> <span class="operator">=</span> <span class="variable">reorder</span><span class="operator">(</span><span class="variable">decompose</span><span class="operator">(</span><span class="variable">$string</span><span class="operator">,</span> <span class="variable">TRUE</span><span class="operator">));</span> <span class="comment"># eq. to NFKD()</span>
</pre>
</dd>
</li>
<dt><strong><a name="item_reorder"><code>$reordered_string = reorder($string)</code></a></strong>
<dd>
<p>Reorders the combining characters and the like in the canonical ordering
and returns the result.</p>
</dd>
<dd>
<p>E.g., when you have a list of NFD/NFKD strings,
you can get the concatenated NFD/NFKD string from them, saying</p>
</dd>
<dd>
<pre>
<span class="variable">$concat_NFD</span> <span class="operator">=</span> <span class="variable">reorder</span><span class="operator">(</span><span class="keyword">join</span> <span class="string">''</span><span class="operator">,</span> <span class="variable">@NFD_strings</span><span class="operator">);</span>
<span class="variable">$concat_NFKD</span> <span class="operator">=</span> <span class="variable">reorder</span><span class="operator">(</span><span class="keyword">join</span> <span class="string">''</span><span class="operator">,</span> <span class="variable">@NFKD_strings</span><span class="operator">);</span>
</pre>
</dd>
</li>
<dt><strong><a name="item_compose"><code>$composed_string = compose($string)</code></a></strong>
<dd>
<p>Returns the string where composable pairs are composed.</p>
</dd>
<dd>
<p>E.g., when you have a NFD/NFKD string,
you can get its NFC/NFKC string, saying</p>
</dd>
<dd>
<pre>
<span class="variable">$NFC_string</span> <span class="operator">=</span> <span class="variable">compose</span><span class="operator">(</span><span class="variable">$NFD_string</span><span class="operator">);</span>
<span class="variable">$NFKC_string</span> <span class="operator">=</span> <span class="variable">compose</span><span class="operator">(</span><span class="variable">$NFKD_string</span><span class="operator">);</span>
</pre>
</dd>
</li>
</dl>
<p>
</p>
<h2><a name="quick_check">Quick Check</a></h2>
<p>(see Annex 8, UAX #15; and <em>DerivedNormalizationProps.txt</em>)</p>
<p>The following functions check whether the string is in that normalization form.</p>
<p>The result returned will be:</p>
<pre>
YES The string is in that normalization form.
NO The string is not in that normalization form.
MAYBE Dubious. Maybe yes, maybe no.</pre>
<dl>
<dt><strong><a name="item_checknfd"><code>$result = checkNFD($string)</code></a></strong>
<dd>
<p>returns true (<code>1</code>) if <code>YES</code>; false (<code>empty string</code>) if <code>NO</code>.</p>
</dd>
</li>
<dt><strong><a name="item_checknfc"><code>$result = checkNFC($string)</code></a></strong>
<dd>
<p>returns true (<code>1</code>) if <code>YES</code>; false (<code>empty string</code>) if <code>NO</code>;
<a href="../../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> if <code>MAYBE</code>.</p>
</dd>
</li>
<dt><strong><a name="item_checknfkd"><code>$result = checkNFKD($string)</code></a></strong>
<dd>
<p>returns true (<code>1</code>) if <code>YES</code>; false (<code>empty string</code>) if <code>NO</code>.</p>
</dd>
</li>
<dt><strong><a name="item_checknfkc"><code>$result = checkNFKC($string)</code></a></strong>
<dd>
<p>returns true (<code>1</code>) if <code>YES</code>; false (<code>empty string</code>) if <code>NO</code>;
<a href="../../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> if <code>MAYBE</code>.</p>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?