📄 encode.3
字号:
.\" Automatically generated by Pod::Man 2.16 (Pod::Simple 3.05).\".\" Standard preamble:.\" ========================================================================.de Sh \" Subsection heading.br.if t .Sp.ne 5.PP\fB\\$1\fR.PP...de Sp \" Vertical space (when we can't use .PP).if t .sp .5v.if n .sp...de Vb \" Begin verbatim text.ft CW.nf.ne \\$1...de Ve \" End verbatim text.ft R.fi...\" Set up some character translations and predefined strings. \*(-- will.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left.\" double quote, and \*(R" will give a right double quote. \*(C+ will.\" give a nicer C++. Capital omega is used to do unbreakable dashes and.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,.\" nothing in troff, for use with C<>..tr \(*W-.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'.ie n \{\. ds -- \(*W-. ds PI pi. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch. ds L" "". ds R" "". ds C` "". ds C' ""'br\}.el\{\. ds -- \|\(em\|. ds PI \(*p. ds L" ``. ds R" '''br\}.\".\" Escape single quotes in literal strings from groff's Unicode transform..ie \n(.g .ds Aq \(aq.el .ds Aq '.\".\" If the F register is turned on, we'll generate index entries on stderr for.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index.\" entries marked with X<> in POD. Of course, you'll have to process the.\" output yourself in some meaningful fashion..ie \nF \{\. de IX. tm Index:\\$1\t\\n%\t"\\$2"... nr % 0. rr F.\}.el \{\. de IX...\}.\".\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2)..\" Fear. Run. Save yourself. No user-serviceable parts.. \" fudge factors for nroff and troff.if n \{\. ds #H 0. ds #V .8m. ds #F .3m. ds #[ \f1. ds #] \fP.\}.if t \{\. ds #H ((1u-(\\\\n(.fu%2u))*.13m). ds #V .6m. ds #F 0. ds #[ \&. ds #] \&.\}. \" simple accents for nroff and troff.if n \{\. ds ' \&. ds ` \&. ds ^ \&. ds , \&. ds ~ ~. ds /.\}.if t \{\. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u". ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'.\}. \" troff and (daisy-wheel) nroff accents.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'.ds 8 \h'\*(#H'\(*b\h'-\*(#H'.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#].ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#].ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#].ds ae a\h'-(\w'a'u*4/10)'e.ds Ae A\h'-(\w'A'u*4/10)'E. \" corrections for vroff.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'. \" for low resolution devices (crt and lpr).if \n(.H>23 .if \n(.V>19 \\{\. ds : e. ds 8 ss. ds o a. ds d- d\h'-1'\(ga. ds D- D\h'-1'\(hy. ds th \o'bp'. ds Th \o'LP'. ds ae ae. ds Ae AE.\}.rm #[ #] #H #V #F C.\" ========================================================================.\".IX Title "Encode 3".TH Encode 3 "2007-12-18" "perl v5.10.0" "Perl Programmers Reference Guide".\" For nroff, turn off justification. Always turn off hyphenation; it makes.\" way too many mistakes in technical documents..if n .ad l.nh.SH "NAME"Encode \- character encodings.SH "SYNOPSIS".IX Header "SYNOPSIS".Vb 1\& use Encode;.Ve.Sh "Table of Contents".IX Subsection "Table of Contents"Encode consists of a collection of modules whose details are too bigto fit in one document. This \s-1POD\s0 itself explains the top-level APIsand general topics at a glance. For other topics and more details,see the PODs below:.PP.Vb 10\& Name Description\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& Encode::Alias Alias definitions to encodings\& Encode::Encoding Encode Implementation Base Class\& Encode::Supported List of Supported Encodings\& Encode::CN Simplified Chinese Encodings\& Encode::JP Japanese Encodings\& Encode::KR Korean Encodings\& Encode::TW Traditional Chinese Encodings\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-.Ve.SH "DESCRIPTION".IX Header "DESCRIPTION"The \f(CW\*(C`Encode\*(C'\fR module provides the interfaces between Perl's stringsand the rest of the system. Perl strings are sequences of\&\fBcharacters\fR..PPThe repertoire of characters that Perl can represent is at least thatdefined by the Unicode Consortium. On most platforms the ordinalvalues of the characters (as returned by \f(CW\*(C`ord(ch)\*(C'\fR) is the \*(L"Unicodecodepoint\*(R" for the character (the exceptions are those platforms wherethe legacy encoding is some variant of \s-1EBCDIC\s0 rather than a super-setof \s-1ASCII\s0 \- see perlebcdic)..PPTraditionally, computer data has been moved around in 8\-bit chunksoften called \*(L"bytes\*(R". These chunks are also known as \*(L"octets\*(R" innetworking standards. Perl is widely used to manipulate data of manytypes \- not only strings of characters representing human or computerlanguages but also \*(L"binary\*(R" data being the machine's representation ofnumbers, pixels in an image \- or just about anything..PPWhen Perl is processing \*(L"binary data\*(R", the programmer wants Perl toprocess \*(L"sequences of bytes\*(R". This is not a problem for Perl \- as abyte has 256 possible values, it easily fits in Perl's much larger\&\*(L"logical character\*(R"..Sh "\s-1TERMINOLOGY\s0".IX Subsection "TERMINOLOGY".IP "\(bu" 2\&\fIcharacter\fR: a character in the range 0..(2**32\-1) (or more).(What Perl's strings are made of.).IP "\(bu" 2\&\fIbyte\fR: a character in the range 0..255(A special case of a Perl character.).IP "\(bu" 2\&\fIoctet\fR: 8 bits of data, with ordinal values 0..255(Term for bytes passed to or from a non-Perl context, e.g. a disk file.).SH "PERL ENCODING API".IX Header "PERL ENCODING API".ie n .IP "$octets\fR = encode(\s-1ENCODING\s0, \f(CW$string [, \s-1CHECK\s0])" 2.el .IP "\f(CW$octets\fR = encode(\s-1ENCODING\s0, \f(CW$string\fR [, \s-1CHECK\s0])" 2.IX Item "$octets = encode(ENCODING, $string [, CHECK])"Encodes a string from Perl's internal form into \fI\s-1ENCODING\s0\fR and returnsa sequence of octets. \s-1ENCODING\s0 can be either a canonical name oran alias. For encoding names and aliases, see \*(L"Defining Aliases\*(R".For \s-1CHECK\s0, see \*(L"Handling Malformed Data\*(R"..SpFor example, to convert a string from Perl's internal format toiso\-8859\-1 (also known as Latin1),.Sp.Vb 1\& $octets = encode("iso\-8859\-1", $string);.Ve.Sp\&\fB\s-1CAVEAT\s0\fR: When you run \f(CW\*(C`$octets = encode("utf8", $string)\*(C'\fR, then\&\f(CW$octets\fR \fBmay not be equal to\fR \f(CW$string\fR. Though they both contain thesame data, the \s-1UTF8\s0 flag for \f(CW$octets\fR is \fBalways\fR off. When youencode anything, \s-1UTF8\s0 flag of the result is always off, even when itcontains completely valid utf8 string. See \*(L"The \s-1UTF8\s0 flag\*(R" below..SpIf the \f(CW$string\fR is \f(CW\*(C`undef\*(C'\fR then \f(CW\*(C`undef\*(C'\fR is returned..ie n .IP "$string\fR = decode(\s-1ENCODING\s0, \f(CW$octets [, \s-1CHECK\s0])" 2.el .IP "\f(CW$string\fR = decode(\s-1ENCODING\s0, \f(CW$octets\fR [, \s-1CHECK\s0])" 2.IX Item "$string = decode(ENCODING, $octets [, CHECK])"Decodes a sequence of octets assumed to be in \fI\s-1ENCODING\s0\fR into Perl'sinternal form and returns the resulting string. As in \fIencode()\fR,\&\s-1ENCODING\s0 can be either a canonical name or an alias. For encoding namesand aliases, see \*(L"Defining Aliases\*(R". For \s-1CHECK\s0, see\&\*(L"Handling Malformed Data\*(R"..SpFor example, to convert \s-1ISO\-8859\-1\s0 data to a string in Perl's internal format:.Sp.Vb 1\& $string = decode("iso\-8859\-1", $octets);.Ve.Sp\&\fB\s-1CAVEAT\s0\fR: When you run \f(CW\*(C`$string = decode("utf8", $octets)\*(C'\fR, then \f(CW$string\fR\&\fBmay not be equal to\fR \f(CW$octets\fR. Though they both contain the same data,the \s-1UTF8\s0 flag for \f(CW$string\fR is on unless \f(CW$octets\fR entirely consists of\&\s-1ASCII\s0 data (or \s-1EBCDIC\s0 on \s-1EBCDIC\s0 machines). See \*(L"The \s-1UTF8\s0 flag\*(R"below..SpIf the \f(CW$string\fR is \f(CW\*(C`undef\*(C'\fR then \f(CW\*(C`undef\*(C'\fR is returned..IP "[$obj =] find_encoding(\s-1ENCODING\s0)" 2.IX Item "[$obj =] find_encoding(ENCODING)"Returns the \fIencoding object\fR corresponding to \s-1ENCODING\s0. Returnsundef if no matching \s-1ENCODING\s0 is find..SpThis object is what actually does the actual (en|de)coding..Sp.Vb 1\& $utf8 = decode($name, $bytes);.Ve.Spis in fact.Sp.Vb 5\& $utf8 = do{\& $obj = find_encoding($name);\& croak qq(encoding "$name" not found) unless ref $obj;\& $obj\->decode($bytes)\& };.Ve.Spwith more error checking..SpTherefore you can save time by reusing this object as follows;.Sp.Vb 5\& my $enc = find_encoding("iso\-8859\-1");\& while(<>){\& my $utf8 = $enc\->decode($_);\& # and do someting with $utf8;\& }.Ve.SpBesides \f(CW\*(C`\->decode\*(C'\fR and \f(CW\*(C`\->encode\*(C'\fR, other methods areavailable as well. For instance, \f(CW\*(C`\-> name\*(C'\fR returns the canonicalname of the encoding object..Sp.Vb 1\& find_encoding("latin1")\->name; # iso\-8859\-1.Ve.SpSee Encode::Encoding for details..IP "[$length =] from_to($octets, \s-1FROM_ENC\s0, \s-1TO_ENC\s0 [, \s-1CHECK\s0])" 2.IX Item "[$length =] from_to($octets, FROM_ENC, TO_ENC [, CHECK])"Converts \fBin-place\fR data between two encodings. The data in \f(CW$octets\fRmust be encoded as octets and not as characters in Perl's internalformat. For example, to convert \s-1ISO\-8859\-1\s0 data to Microsoft's \s-1CP1250\s0encoding:.Sp.Vb 1\& from_to($octets, "iso\-8859\-1", "cp1250");.Ve.Spand to convert it back:.Sp.Vb 1\& from_to($octets, "cp1250", "iso\-8859\-1");.Ve.SpNote that because the conversion happens in place, the data to beconverted cannot be a string constant; it must be a scalar variable..Sp\&\fIfrom_to()\fR returns the length of the converted string in octets onsuccess, \fIundef\fR on error..Sp\&\fB\s-1CAVEAT\s0\fR: The following operations look the same but are not quite so;.Sp.Vb 2\& from_to($data, "iso\-8859\-1", "utf8"); #1\& $data = decode("iso\-8859\-1", $data); #2.Ve.SpBoth #1 and #2 make \f(CW$data\fR consist of a completely valid \s-1UTF\-8\s0 stringbut only #2 turns \s-1UTF8\s0 flag on. #1 is equivalent to.Sp.Vb 1\& $data = encode("utf8", decode("iso\-8859\-1", $data));.Ve.SpSee \*(L"The \s-1UTF8\s0 flag\*(R" below..SpAlso note that.Sp.Vb 1\& from_to($octets, $from, $to, $check);.Ve.Spis equivalent to.Sp.Vb 1\& $octets = encode($to, decode($from, $octets), $check);.Ve.SpYes, it does not respect the \f(CW$check\fR during decoding. It isdeliberately done that way. If you need minute control, \f(CW\*(C`decode\*(C'\fRthen \f(CW\*(C`encode\*(C'\fR as follows;.Sp.Vb 1\& $octets = encode($to, decode($from, $octets, $check_from), $check_to);.Ve.ie n .IP "$octets = encode_utf8($string);" 2.el .IP "\f(CW$octets\fR = encode_utf8($string);" 2.IX Item "$octets = encode_utf8($string);"Equivalent to \f(CW\*(C`$octets = encode("utf8", $string);\*(C'\fR The charactersthat comprise \f(CW$string\fR are encoded in Perl's internal format and theresult is returned as a sequence of octets. All possiblecharacters have a \s-1UTF\-8\s0 representation so this function cannot fail..ie n .IP "$string = decode_utf8($octets [, \s-1CHECK\s0]);" 2.el .IP "\f(CW$string\fR = decode_utf8($octets [, \s-1CHECK\s0]);" 2.IX Item "$string = decode_utf8($octets [, CHECK]);"equivalent to \f(CW\*(C`$string = decode("utf8", $octets [, CHECK])\*(C'\fR.The sequence of octets represented by\&\f(CW$octets\fR is decoded from \s-1UTF\-8\s0 into a sequence of logicalcharacters. Not all sequences of octets form valid \s-1UTF\-8\s0 encodings, soit is possible for this call to fail. For \s-1CHECK\s0, see\&\*(L"Handling Malformed Data\*(R"..Sh "Listing available encodings".IX Subsection "Listing available encodings".Vb 2\& use Encode;\& @list = Encode\->encodings();.Ve.PPReturns a list of the canonical names of the available encodings thatare loaded. To get a list of all available encodings including theones that are not loaded yet, say.PP.Vb 1\& @all_encodings = Encode\->encodings(":all");.Ve.PPOr you can give the name of a specific module..PP.Vb 1\& @with_jp = Encode\->encodings("Encode::JP");
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -