📄 encoding.3
字号:
.\" Automatically generated by Pod::Man 2.16 (Pod::Simple 3.05).\".\" Standard preamble:.\" ========================================================================.de Sh \" Subsection heading.br.if t .Sp.ne 5.PP\fB\\$1\fR.PP...de Sp \" Vertical space (when we can't use .PP).if t .sp .5v.if n .sp...de Vb \" Begin verbatim text.ft CW.nf.ne \\$1...de Ve \" End verbatim text.ft R.fi...\" Set up some character translations and predefined strings. \*(-- will.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left.\" double quote, and \*(R" will give a right double quote. \*(C+ will.\" give a nicer C++. Capital omega is used to do unbreakable dashes and.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,.\" nothing in troff, for use with C<>..tr \(*W-.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'.ie n \{\. ds -- \(*W-. ds PI pi. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch. ds L" "". ds R" "". ds C` "". ds C' ""'br\}.el\{\. ds -- \|\(em\|. ds PI \(*p. ds L" ``. ds R" '''br\}.\".\" Escape single quotes in literal strings from groff's Unicode transform..ie \n(.g .ds Aq \(aq.el .ds Aq '.\".\" If the F register is turned on, we'll generate index entries on stderr for.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index.\" entries marked with X<> in POD. Of course, you'll have to process the.\" output yourself in some meaningful fashion..ie \nF \{\. de IX. tm Index:\\$1\t\\n%\t"\\$2"... nr % 0. rr F.\}.el \{\. de IX...\}.\".\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2)..\" Fear. Run. Save yourself. No user-serviceable parts.. \" fudge factors for nroff and troff.if n \{\. ds #H 0. ds #V .8m. ds #F .3m. ds #[ \f1. ds #] \fP.\}.if t \{\. ds #H ((1u-(\\\\n(.fu%2u))*.13m). ds #V .6m. ds #F 0. ds #[ \&. ds #] \&.\}. \" simple accents for nroff and troff.if n \{\. ds ' \&. ds ` \&. ds ^ \&. ds , \&. ds ~ ~. ds /.\}.if t \{\. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u". ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'.\}. \" troff and (daisy-wheel) nroff accents.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'.ds 8 \h'\*(#H'\(*b\h'-\*(#H'.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#].ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#].ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#].ds ae a\h'-(\w'a'u*4/10)'e.ds Ae A\h'-(\w'A'u*4/10)'E. \" corrections for vroff.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'. \" for low resolution devices (crt and lpr).if \n(.H>23 .if \n(.V>19 \\{\. ds : e. ds 8 ss. ds o a. ds d- d\h'-1'\(ga. ds D- D\h'-1'\(hy. ds th \o'bp'. ds Th \o'LP'. ds ae ae. ds Ae AE.\}.rm #[ #] #H #V #F C.\" ========================================================================.\".IX Title "encoding 3".TH encoding 3 "2007-12-18" "perl v5.10.0" "Perl Programmers Reference Guide".\" For nroff, turn off justification. Always turn off hyphenation; it makes.\" way too many mistakes in technical documents..if n .ad l.nh.SH "NAME"encoding \- allows you to write your script in non\-ascii or non\-utf8.SH "SYNOPSIS".IX Header "SYNOPSIS".Vb 2\& use encoding "greek"; # Perl like Greek to you?\& use encoding "euc\-jp"; # Jperl!\&\& # or you can even do this if your shell supports your native encoding\&\& perl \-Mencoding=latin2 \-e \*(Aq...\*(Aq # Feeling centrally European?\& perl \-Mencoding=euc\-kr \-e \*(Aq...\*(Aq # Or Korean?\&\& # more control\&\& # A simple euc\-cn => utf\-8 converter\& use encoding "euc\-cn", STDOUT => "utf8"; while(<>){print};\&\& # "no encoding;" supported (but not scoped!)\& no encoding;\&\& # an alternate way, Filter\& use encoding "euc\-jp", Filter=>1;\& # now you can use kanji identifiers \-\- in euc\-jp!\&\& # switch on locale \-\& # note that this probably means that unless you have a complete control\& # over the environments the application is ever going to be run, you should\& # NOT use the feature of encoding pragma allowing you to write your script\& # in any recognized encoding because changing locale settings will wreck\& # the script; you can of course still use the other features of the pragma.\& use encoding \*(Aq:locale\*(Aq;.Ve.SH "ABSTRACT".IX Header "ABSTRACT"Let's start with a bit of history: Perl 5.6.0 introduced Unicodesupport. You could apply \f(CW\*(C`substr()\*(C'\fR and regexes even to complex \s-1CJK\s0characters \*(-- so long as the script was written in \s-1UTF\-8\s0. But backthen, text editors that supported \s-1UTF\-8\s0 were still rare and many usersinstead chose to write scripts in legacy encodings, giving up a wholenew feature of Perl 5.6..PPRewind to the future: starting from perl 5.8.0 with the \fBencoding\fRpragma, you can write your script in any encoding you like (so longas the \f(CW\*(C`Encode\*(C'\fR module supports it) and still enjoy Unicode support.This pragma achieves that by doing the following:.IP "\(bu" 4Internally converts all literals (\f(CW\*(C`q//,qq//,qr//,qw///, qx//\*(C'\fR) fromthe encoding specified to utf8. In Perl 5.8.1 and later, literals in\&\f(CW\*(C`tr///\*(C'\fR and \f(CW\*(C`DATA\*(C'\fR pseudo-filehandle are also converted..IP "\(bu" 4Changing PerlIO layers of \f(CW\*(C`STDIN\*(C'\fR and \f(CW\*(C`STDOUT\*(C'\fR to the encoding specified..Sh "Literal Conversions".IX Subsection "Literal Conversions"You can write code in EUC-JP as follows:.PP.Vb 3\& my $Rakuda = "\exF1\exD1\exF1\exCC"; # Camel in Kanji\& #<\-char\-><\-char\-> # 4 octets\& s/\ebCamel\eb/$Rakuda/;.Ve.PPAnd with \f(CW\*(C`use encoding "euc\-jp"\*(C'\fR in effect, it is the same thing asthe code in \s-1UTF\-8:\s0.PP.Vb 2\& my $Rakuda = "\ex{99F1}\ex{99DD}"; # two Unicode Characters\& s/\ebCamel\eb/$Rakuda/;.Ve.ie n .Sh "PerlIO layers for ""STD(IN|OUT)""".el .Sh "PerlIO layers for \f(CWSTD(IN|OUT)\fP".IX Subsection "PerlIO layers for STD(IN|OUT)"The \fBencoding\fR pragma also modifies the filehandle layers of\&\s-1STDIN\s0 and \s-1STDOUT\s0 to the specified encoding. Therefore,.PP.Vb 5\& use encoding "euc\-jp";\& my $message = "Camel is the symbol of perl.\en";\& my $Rakuda = "\exF1\exD1\exF1\exCC"; # Camel in Kanji\& $message =~ s/\ebCamel\eb/$Rakuda/;\& print $message;.Ve.PPWill print \*(L"\exF1\exD1\exF1\exCC is the symbol of perl.\en\*(R",not \*(L"\ex{99F1}\ex{99DD} is the symbol of perl.\en\*(R"..PPYou can override this by giving extra arguments; see below..Sh "Implicit upgrading for byte strings".IX Subsection "Implicit upgrading for byte strings"By default, if strings operating under byte semantics and stringswith Unicode character data are concatenated, the new string willbe created by decoding the byte strings as \fI\s-1ISO\s0 8859\-1 (Latin\-1)\fR..PPThe \fBencoding\fR pragma changes this to use the specified encodinginstead. For example:.PP.Vb 5\& use encoding \*(Aqutf8\*(Aq;\& my $string = chr(20000); # a Unicode string\& utf8::encode($string); # now it\*(Aqs a UTF\-8 encoded byte string\& # concatenate with another Unicode string\& print length($string . chr(20000));.Ve.PPWill print \f(CW2\fR, because \f(CW$string\fR is upgraded as \s-1UTF\-8\s0. Without\&\f(CW\*(C`use encoding \*(Aqutf8\*(Aq;\*(C'\fR, it will print \f(CW4\fR instead, since \f(CW$string\fRis three octets when interpreted as Latin\-1..Sh "Side effects".IX Subsection "Side effects"If the \f(CW\*(C`encoding\*(C'\fR pragma is in scope then the lengths returned arecalculated from the length of \f(CW$/\fR in Unicode characters, which is notalways the same as the length of \f(CW$/\fR in the native encoding..PPThis pragma affects utf8::upgrade, but not utf8::downgrade..Sh "Side effects".IX Subsection "Side effects"If the \f(CW\*(C`encoding\*(C'\fR pragma is in scope then the lengths returned arecalculated from the length of \f(CW$/\fR in Unicode characters, which is notalways the same as the length of \f(CW$/\fR in the native encoding..PPThis pragma affects utf8::upgrade, but not utf8::downgrade..Sh "Side effects".IX Subsection "Side effects"If the \f(CW\*(C`encoding\*(C'\fR pragma is in scope then the lengths returned arecalculated from the length of \f(CW$/\fR in Unicode characters, which is notalways the same as the length of \f(CW$/\fR in the native encoding..PPThis pragma affects utf8::upgrade, but not utf8::downgrade..SH "FEATURES THAT REQUIRE 5.8.1".IX Header "FEATURES THAT REQUIRE 5.8.1"Some of the features offered by this pragma requires perl 5.8.1. Mostof these are done by Inaba Hiroto. Any other features and changesare good for 5.8.0..ie n .IP """NON-EUC"" doublebyte encodings" 4.el .IP "``NON-EUC'' doublebyte encodings" 4.IX Item "NON-EUC doublebyte encodings"Because perl needs to parse script before applying this pragma, suchencodings as Shift_JIS and Big\-5 that may contain '\e' (\s-1BACKSLASH\s0;\&\ex5c) in the second byte fails because the second byte mayaccidentally escape the quoting character that follows. Perl 5.8.1or later fixes this problem..IP "tr//" 4.IX Item "tr//"\&\f(CW\*(C`tr//\*(C'\fR was overlooked by Perl 5 porters when they released perl 5.8.0See the section below for details..IP "\s-1DATA\s0 pseudo-filehandle" 4.IX Item "DATA pseudo-filehandle"Another feature that was overlooked was \f(CW\*(C`DATA\*(C'\fR..SH "USAGE".IX Header "USAGE".IP "use encoding [\fI\s-1ENCNAME\s0\fR] ;" 4.IX Item "use encoding [ENCNAME] ;"Sets the script encoding to \fI\s-1ENCNAME\s0\fR. And unless ${^UNICODE} exists and non-zero, PerlIO layers of \s-1STDIN\s0 and \s-1STDOUT\s0 are set to":encoding(\fI\s-1ENCNAME\s0\fR)"..Sp
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -