📄 ch04_11.htm
字号:
<html><head><title>Unicode (Perl in a Nutshell, 2nd Edition)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Stephen Spainhour" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="0596002416L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl in a Nutshell, 2nd Edition" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img src="gifs/smbanner.gif" usemap="#banner-map" border="0" alt="Book Home" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Java and XSLT" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch04_10.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch04_12.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr></table></div><h2 class="sect1">4.11. Unicode</h2><p><a name="INDEX-792" />Unicode provides aunique number for every character, regardless of the computingplatform, program, or programming language. This is particularlyimportant because without a standard such as Unicode, computers wouldcontinue to use different encoding classes for characters, many ofwhich would conflict if character classes were used together.</p><p><a name="INDEX-793" />Unicode support was introduced to Perlwith Perl 5.6. Although it is still not completely adherent in theUnicode spec, Unicode support has matured significantly under Perl5.8. You can now use Unicode reliably with file I/O and with regularexpressions. With regular expressions, the pattern will adapt to thedata and will automatically switch to the correct Unicode characterscheme.</p><p>Perl's Unicode implementation falls into thefollowing categories:</p><dl><dt><i>I/O</i></dt><dd>There is currently no way in Perl to mark datathat's read from or written to a file as being oftype Unicode (utf8). Future versions of Perl will support such afeature.</p></dd><dt><i>Regular expressions</i></dt><dd>The determination whether to match Unicode characters is made whenthe pattern is compiled, based on whether the pattern containsUnicode characters and not when matching happens at runtime. Thiswill be changed to match Unicode characters at runtime.</p></dd><dt><b><tt class="literal">use utf8</tt></b></dt><dd>The utf8 module is still needed to enable a few Unicode features. The<tt class="literal">utf8</tt> pragma, as implemented by the utf8 module,implements tables used for Unicode support. You must load the<tt class="literal">utf8</tt> pragma explicitly to enable recognition ofUTF-8 encoded literals and identifiers in the source text.</p></dd><dt><i>Byte and character semantics</i></dt><dd>As of 5.6.0, Perl uses logically wide characters to represent stringsinternally. This internal representation uses the UTF-8 encoding.Future versions of Perl will work with characters rather than bytes.This was a purposeful decision made so Perl 5.6 could transition frombyte semantics to character semantics in programs. Perl will make thedecision to switch to character semantics if it finds that the inputdata has characters on which it can safely operate with UTF-8. Youcan disable character semantics by using the <tt class="literal">bytes</tt>pragma, as explained in <a href="ch08_01.htm">Chapter 8, "Standard Modules"</a>. Charactersemantics have the following effects:</p><ul><li><p>Strings and patterns may contain characters that have an ordinalvalue larger than 255.</p></li><li><p>Identifiers within a Perl program may contain Unicode alphanumericcharacters.</p></li><li><p>Regular expressions match characters and not bytes.</p></li><li><p>Character classes in regular expressions match characters and notbytes.</p></li><li><p>Named Unicode properties and block ranges may be used as characterclasses with the <tt class="literal">\p</tt> and <tt class="literal">\P</tt>constructs.</p></li><li><p><tt class="literal">\X</tt> matches any extended Unicode sequence.</p></li><li><p><tt class="literal">tr//</tt> matches characters instead of bytes.</p></li><li><p>Case translation operators use the Unicode case translation tableswhen provided character input.</p></li><li><p>Most operators that deal with positions or lengths in a string switchto using character positions.</p></li><li><p><tt class="literal">pack( )</tt> and <tt class="literal">unpack( )</tt> do notchange.</p></li><li><p>Bit operators work on characters.</p></li><li><p><tt class="literal">scalar reverse( )</tt> reverses characters and notbytes.</p></li></ul></dd></dl><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch04_10.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0" /></a></td><td align="right" valign="top" width="228"><a href="ch04_12.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr><tr><td align="left" valign="top" width="228">4.10. Signals</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0" /></a></td><td align="right" valign="top" width="228">4.12. Formats</td></tr></table></div><hr width="684" align="left" /><img src="../gifs/navbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -