⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 perluniintro.1

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 1
📖 第 1 页 / 共 3 页
字号:
Mappings/SpecialCasing\fR is implemented..IP "\(bu" 4String Collation.SpPeople like to see their strings nicely sorted\*(--or as Unicodeparlance goes, collated.  But again, what do you mean by collate?.Sp(Does \f(CW\*(C`LATIN CAPITAL LETTER A WITH ACUTE\*(C'\fR come before or after\&\f(CW\*(C`LATIN CAPITAL LETTER A WITH GRAVE\*(C'\fR?).SpThe short answer is that by default, Perl compares strings (\f(CW\*(C`lt\*(C'\fR,\&\f(CW\*(C`le\*(C'\fR, \f(CW\*(C`cmp\*(C'\fR, \f(CW\*(C`ge\*(C'\fR, \f(CW\*(C`gt\*(C'\fR) based only on the code points of thecharacters.  In the above case, the answer is \*(L"after\*(R", since\&\f(CW0x00C1\fR > \f(CW0x00C0\fR..SpThe long answer is that \*(L"it depends\*(R", and a good answer cannot begiven without knowing (at the very least) the language context.See Unicode::Collate, and \fIUnicode Collation Algorithm\fRhttp://www.unicode.org/unicode/reports/tr10/.Sh "Miscellaneous".IX Subsection "Miscellaneous".IP "\(bu" 4Character Ranges and Classes.SpCharacter ranges in regular expression character classes (\f(CW\*(C`/[a\-z]/\*(C'\fR)and in the \f(CW\*(C`tr///\*(C'\fR (also known as \f(CW\*(C`y///\*(C'\fR) operator are not magicallyUnicode-aware.  What this means that \f(CW\*(C`[A\-Za\-z]\*(C'\fR will not magically startto mean \*(L"all alphabetic letters\*(R"; not that it does mean that even for8\-bit characters, you should be using \f(CW\*(C`/[[:alpha:]]/\*(C'\fR in that case..SpFor specifying character classes like that in regular expressions,you can use the various Unicode properties\*(--\f(CW\*(C`\epL\*(C'\fR, or perhaps\&\f(CW\*(C`\ep{Alphabetic}\*(C'\fR, in this particular case.  You can use Unicodecode points as the end points of character ranges, but there is nomagic associated with specifying a certain range.  For furtherinformation\*(--there are dozens of Unicode character classes\*(--seeperlunicode..IP "\(bu" 4String-To-Number Conversions.SpUnicode does define several other decimal\*(--and numeric\*(--charactersbesides the familiar 0 to 9, such as the Arabic and Indic digits.Perl does not support string-to-number conversion for digits otherthan \s-1ASCII\s0 0 to 9 (and \s-1ASCII\s0 a to f for hexadecimal)..Sh "Questions With Answers".IX Subsection "Questions With Answers".IP "\(bu" 4Will My Old Scripts Break?.SpVery probably not.  Unless you are generating Unicode characterssomehow, old behaviour should be preserved.  About the only behaviourthat has changed and which could start generating Unicode is the oldbehaviour of \f(CW\*(C`chr()\*(C'\fR where supplying an argument more than 255produced a character modulo 255.  \f(CW\*(C`chr(300)\*(C'\fR, for example, was equalto \f(CW\*(C`chr(45)\*(C'\fR or \*(L"\-\*(R" (in \s-1ASCII\s0), now it is \s-1LATIN\s0 \s-1CAPITAL\s0 \s-1LETTER\s0 I \s-1WITH\s0\&\s-1BREVE\s0..IP "\(bu" 4How Do I Make My Scripts Work With Unicode?.SpVery little work should be needed since nothing changes until yougenerate Unicode data.  The most important thing is getting input asUnicode; for that, see the earlier I/O discussion..IP "\(bu" 4How Do I Know Whether My String Is In Unicode?.SpYou shouldn't care.  No, you really shouldn't.  No, really.  If youhave to care\*(--beyond the cases described above\*(--it means that wedidn't get the transparency of Unicode quite right..SpOkay, if you insist:.Sp.Vb 1\&    print utf8::is_utf8($string) ? 1 : 0, "\en";.Ve.SpBut note that this doesn't mean that any of the characters in thestring are necessary \s-1UTF\-8\s0 encoded, or that any of the characters havecode points greater than 0xFF (255) or even 0x80 (128), or that thestring has any characters at all.  All the \f(CW\*(C`is_utf8()\*(C'\fR does is toreturn the value of the internal \*(L"utf8ness\*(R" flag attached to the\&\f(CW$string\fR.  If the flag is off, the bytes in the scalar are interpretedas a single byte encoding.  If the flag is on, the bytes in the scalarare interpreted as the (multi-byte, variable-length) \s-1UTF\-8\s0 encoded codepoints of the characters.  Bytes added to an \s-1UTF\-8\s0 encoded string areautomatically upgraded to \s-1UTF\-8\s0.  If mixed non\-UTF\-8 and \s-1UTF\-8\s0 scalarsare merged (double-quoted interpolation, explicit concatenation, andprintf/sprintf parameter substitution), the result will be \s-1UTF\-8\s0 encodedas if copies of the byte strings were upgraded to \s-1UTF\-8:\s0 for example,.Sp.Vb 3\&    $a = "ab\ex80c";\&    $b = "\ex{100}";\&    print "$a = $b\en";.Ve.Spthe output string will be UTF\-8\-encoded \f(CW\*(C`ab\ex80c = \ex{100}\en\*(C'\fR, but\&\f(CW$a\fR will stay byte-encoded..SpSometimes you might really need to know the byte length of a stringinstead of the character length. For that use either the\&\f(CW\*(C`Encode::encode_utf8()\*(C'\fR function or the \f(CW\*(C`bytes\*(C'\fR pragma and its onlydefined function \f(CW\*(C`length()\*(C'\fR:.Sp.Vb 7\&    my $unicode = chr(0x100);\&    print length($unicode), "\en"; # will print 1\&    require Encode;\&    print length(Encode::encode_utf8($unicode)), "\en"; # will print 2\&    use bytes;\&    print length($unicode), "\en"; # will also print 2\&                                  # (the 0xC4 0x80 of the UTF\-8).Ve.IP "\(bu" 4How Do I Detect Data That's Not Valid In a Particular Encoding?.SpUse the \f(CW\*(C`Encode\*(C'\fR package to try converting it.For example,.Sp.Vb 7\&    use Encode \*(Aqdecode_utf8\*(Aq;\&    eval { decode_utf8($string, Encode::FB_CROAK) };\&    if ($@) {\&        # $string is valid utf8\&    } else {\&        # $string is not valid utf8\&    }.Ve.SpOr use \f(CW\*(C`unpack\*(C'\fR to try decoding it:.Sp.Vb 2\&    use warnings;\&    @chars = unpack("C0U*", $string_of_bytes_that_I_think_is_utf8);.Ve.SpIf invalid, a \f(CW\*(C`Malformed UTF\-8 character\*(C'\fR warning is produced. The \*(L"C0\*(R" means\&\*(L"process the string character per character\*(R".  Without that, the\&\f(CW\*(C`unpack("U*", ...)\*(C'\fR would work in \f(CW\*(C`U0\*(C'\fR mode (the default if the formatstring starts with \f(CW\*(C`U\*(C'\fR) and it would return the bytes making up the \s-1UTF\-8\s0encoding of the target string, something that will always work..IP "\(bu" 4How Do I Convert Binary Data Into a Particular Encoding, Or Vice Versa?.SpThis probably isn't as useful as you might think.Normally, you shouldn't need to..SpIn one sense, what you are asking doesn't make much sense: encodingsare for characters, and binary data are not \*(L"characters\*(R", so converting\&\*(L"data\*(R" into some encoding isn't meaningful unless you know in whatcharacter set and encoding the binary data is in, in which case it'snot just binary data, now is it?.SpIf you have a raw sequence of bytes that you know should beinterpreted via a particular encoding, you can use \f(CW\*(C`Encode\*(C'\fR:.Sp.Vb 2\&    use Encode \*(Aqfrom_to\*(Aq;\&    from_to($data, "iso\-8859\-1", "utf\-8"); # from latin\-1 to utf\-8.Ve.SpThe call to \f(CW\*(C`from_to()\*(C'\fR changes the bytes in \f(CW$data\fR, but nothingmaterial about the nature of the string has changed as far as Perl isconcerned.  Both before and after the call, the string \f(CW$data\fRcontains just a bunch of 8\-bit bytes. As far as Perl is concerned,the encoding of the string remains as \*(L"system-native 8\-bit bytes\*(R"..SpYou might relate this to a fictional 'Translate' module:.Sp.Vb 4\&   use Translate;\&   my $phrase = "Yes";\&   Translate::from_to($phrase, \*(Aqenglish\*(Aq, \*(Aqdeutsch\*(Aq);\&   ## phrase now contains "Ja".Ve.SpThe contents of the string changes, but not the nature of the string.Perl doesn't know any more after the call than before that thecontents of the string indicates the affirmative..SpBack to converting data.  If you have (or want) data in your system'snative 8\-bit encoding (e.g. Latin\-1, \s-1EBCDIC\s0, etc.), you can usepack/unpack to convert to/from Unicode..Sp.Vb 2\&    $native_string  = pack("W*", unpack("U*", $Unicode_string));\&    $Unicode_string = pack("U*", unpack("W*", $native_string));.Ve.SpIf you have a sequence of bytes you \fBknow\fR is valid \s-1UTF\-8\s0,but Perl doesn't know it yet, you can make Perl a believer, too:.Sp.Vb 2\&    use Encode \*(Aqdecode_utf8\*(Aq;\&    $Unicode = decode_utf8($bytes);.Ve.Spor:.Sp.Vb 1\&    $Unicode = pack("U0a*", $bytes);.Ve.SpYou can convert well-formed \s-1UTF\-8\s0 to a sequence of bytes, but ifyou just want to convert random binary data into \s-1UTF\-8\s0, you can't.\&\fBAny random collection of bytes isn't well-formed \s-1UTF\-8\s0\fR.  You canuse \f(CW\*(C`unpack("C*", $string)\*(C'\fR for the former, and you can createwell-formed Unicode data by \f(CW\*(C`pack("U*", 0xff, ...)\*(C'\fR..IP "\(bu" 4How Do I Display Unicode?  How Do I Input Unicode?.SpSee http://www.alanwood.net/unicode/ andhttp://www.cl.cam.ac.uk/~mgk25/unicode.html.IP "\(bu" 4How Does Unicode Work With Traditional Locales?.SpIn Perl, not very well.  Avoid using locales through the \f(CW\*(C`locale\*(C'\fRpragma.  Use only one or the other.  But see perlrun for thedescription of the \f(CW\*(C`\-C\*(C'\fR switch and its environment counterpart,\&\f(CW$ENV{PERL_UNICODE}\fR to see how to enable various Unicode features,for example by using locale settings..Sh "Hexadecimal Notation".IX Subsection "Hexadecimal Notation"The Unicode standard prefers using hexadecimal notation becausethat more clearly shows the division of Unicode into blocks of 256 characters.Hexadecimal is also simply shorter than decimal.  You can use decimalnotation, too, but learning to use hexadecimal just makes life easierwith the Unicode standard.  The \f(CW\*(C`U+HHHH\*(C'\fR notation uses hexadecimal,for example..PPThe \f(CW\*(C`0x\*(C'\fR prefix means a hexadecimal number, the digits are 0\-9 \fIand\fRa\-f (or A\-F, case doesn't matter).  Each hexadecimal digit representsfour bits, or half a byte.  \f(CW\*(C`print 0x..., "\en"\*(C'\fR will show ahexadecimal number in decimal, and \f(CW\*(C`printf "%x\en", $decimal\*(C'\fR willshow a decimal number in hexadecimal.  If you have just the\&\*(L"hex digits\*(R" of a hexadecimal number, you can use the \f(CW\*(C`hex()\*(C'\fR function..PP.Vb 6\&    print 0x0009, "\en";    # 9\&    print 0x000a, "\en";    # 10\&    print 0x000f, "\en";    # 15\&    print 0x0010, "\en";    # 16\&    print 0x0011, "\en";    # 17\&    print 0x0100, "\en";    # 256\&\&    print 0x0041, "\en";    # 65\&\&    printf "%x\en",  65;    # 41\&    printf "%#x\en", 65;    # 0x41\&\&    print hex("41"), "\en"; # 65.Ve.Sh "Further Resources".IX Subsection "Further Resources".IP "\(bu" 4Unicode Consortium.Sphttp://www.unicode.org/.IP "\(bu" 4Unicode \s-1FAQ\s0.Sphttp://www.unicode.org/unicode/faq/.IP "\(bu" 4Unicode Glossary.Sphttp://www.unicode.org/glossary/.IP "\(bu" 4Unicode Useful Resources.Sphttp://www.unicode.org/unicode/onlinedat/resources.html.IP "\(bu" 4Unicode and Multilingual Support in \s-1HTML\s0, Fonts, Web Browsers and Other Applications.Sphttp://www.alanwood.net/unicode/.IP "\(bu" 4\&\s-1UTF\-8\s0 and Unicode \s-1FAQ\s0 for Unix/Linux.Sphttp://www.cl.cam.ac.uk/~mgk25/unicode.html.IP "\(bu" 4Legacy Character Sets.Sphttp://www.czyborra.com/http://www.eki.ee/letter/.IP "\(bu" 4The Unicode support files live within the Perl installation in thedirectory.Sp.Vb 1\&    $Config{installprivlib}/unicore.Ve.Spin Perl 5.8.0 or newer, and.Sp.Vb 1\&    $Config{installprivlib}/unicode.Ve.Spin the Perl 5.6 series.  (The renaming to \fIlib/unicore\fR was done toavoid naming conflicts with lib/Unicode in case-insensitive filesystems.)The main Unicode data file is \fIUnicodeData.txt\fR (or \fIUnicode.301\fR inPerl 5.6.1.)  You can find the \f(CW$Config{installprivlib}\fR by.Sp.Vb 1\&    perl "\-V:installprivlib".Ve.SpYou can explore various information from the Unicode data files usingthe \f(CW\*(C`Unicode::UCD\*(C'\fR module..SH "UNICODE IN OLDER PERLS".IX Header "UNICODE IN OLDER PERLS"If you cannot upgrade your Perl to 5.8.0 or later, you can stilldo some Unicode processing by using the modules \f(CW\*(C`Unicode::String\*(C'\fR,\&\f(CW\*(C`Unicode::Map8\*(C'\fR, and \f(CW\*(C`Unicode::Map\*(C'\fR, available from \s-1CPAN\s0.If you have the \s-1GNU\s0 recode installed, you can also use thePerl front-end \f(CW\*(C`Convert::Recode\*(C'\fR for character conversions..PPThe following are fast conversions from \s-1ISO\s0 8859\-1 (Latin\-1) bytesto \s-1UTF\-8\s0 bytes and back, the code works even with older Perl 5 versions..PP.Vb 2\&    # ISO 8859\-1 to UTF\-8\&    s/([\ex80\-\exFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;\&\&    # UTF\-8 to ISO 8859\-1\&    s/([\exC2\exC3])([\ex80\-\exBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;.Ve.SH "SEE ALSO".IX Header "SEE ALSO"perlunitut, perlunicode, Encode, open, utf8, bytes,perlretut, perlrun, Unicode::Collate, Unicode::Normalize,Unicode::UCD.SH "ACKNOWLEDGMENTS".IX Header "ACKNOWLEDGMENTS"Thanks to the kind readers of the perl5\-porters@perl.org,perl\-unicode@perl.org, linux\-utf8@nl.linux.org, and unicore@unicode.orgmailing lists for their valuable feedback..SH "AUTHOR, COPYRIGHT, AND LICENSE".IX Header "AUTHOR, COPYRIGHT, AND LICENSE"Copyright 2001\-2002 Jarkko Hietaniemi <jhi@iki.fi>.PPThis document may be distributed under the same terms as Perl itself.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -