encode::guess.3

来自「视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.」· 3 代码 · 共 324 行

3
324
字号
.\" Automatically generated by Pod::Man 2.16 (Pod::Simple 3.05).\".\" Standard preamble:.\" ========================================================================.de Sh \" Subsection heading.br.if t .Sp.ne 5.PP\fB\\$1\fR.PP...de Sp \" Vertical space (when we can't use .PP).if t .sp .5v.if n .sp...de Vb \" Begin verbatim text.ft CW.nf.ne \\$1...de Ve \" End verbatim text.ft R.fi...\" Set up some character translations and predefined strings.  \*(-- will.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left.\" double quote, and \*(R" will give a right double quote.  \*(C+ will.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,.\" nothing in troff, for use with C<>..tr \(*W-.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'.ie n \{\.    ds -- \(*W-.    ds PI pi.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch.    ds L" "".    ds R" "".    ds C` "".    ds C' ""'br\}.el\{\.    ds -- \|\(em\|.    ds PI \(*p.    ds L" ``.    ds R" '''br\}.\".\" Escape single quotes in literal strings from groff's Unicode transform..ie \n(.g .ds Aq \(aq.el       .ds Aq '.\".\" If the F register is turned on, we'll generate index entries on stderr for.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index.\" entries marked with X<> in POD.  Of course, you'll have to process the.\" output yourself in some meaningful fashion..ie \nF \{\.    de IX.    tm Index:\\$1\t\\n%\t"\\$2"...    nr % 0.    rr F.\}.el \{\.    de IX...\}.\".\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2)..\" Fear.  Run.  Save yourself.  No user-serviceable parts..    \" fudge factors for nroff and troff.if n \{\.    ds #H 0.    ds #V .8m.    ds #F .3m.    ds #[ \f1.    ds #] \fP.\}.if t \{\.    ds #H ((1u-(\\\\n(.fu%2u))*.13m).    ds #V .6m.    ds #F 0.    ds #[ \&.    ds #] \&.\}.    \" simple accents for nroff and troff.if n \{\.    ds ' \&.    ds ` \&.    ds ^ \&.    ds , \&.    ds ~ ~.    ds /.\}.if t \{\.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u".    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'.\}.    \" troff and (daisy-wheel) nroff accents.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'.ds 8 \h'\*(#H'\(*b\h'-\*(#H'.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#].ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#].ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#].ds ae a\h'-(\w'a'u*4/10)'e.ds Ae A\h'-(\w'A'u*4/10)'E.    \" corrections for vroff.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'.    \" for low resolution devices (crt and lpr).if \n(.H>23 .if \n(.V>19 \\{\.    ds : e.    ds 8 ss.    ds o a.    ds d- d\h'-1'\(ga.    ds D- D\h'-1'\(hy.    ds th \o'bp'.    ds Th \o'LP'.    ds ae ae.    ds Ae AE.\}.rm #[ #] #H #V #F C.\" ========================================================================.\".IX Title "Encode::Guess 3".TH Encode::Guess 3 "2007-12-18" "perl v5.10.0" "Perl Programmers Reference Guide".\" For nroff, turn off justification.  Always turn off hyphenation; it makes.\" way too many mistakes in technical documents..if n .ad l.nh.SH "NAME"Encode::Guess \-\- Guesses encoding from data.SH "SYNOPSIS".IX Header "SYNOPSIS".Vb 1\&  # if you are sure $data won\*(Aqt contain anything bogus\&\&  use Encode;\&  use Encode::Guess qw/euc\-jp shiftjis 7bit\-jis/;\&  my $utf8 = decode("Guess", $data);\&  my $data = encode("Guess", $utf8);   # this doesn\*(Aqt work!\&\&  # more elaborate way\&  use Encode::Guess;\&  my $enc = guess_encoding($data, qw/euc\-jp shiftjis 7bit\-jis/);\&  ref($enc) or die "Can\*(Aqt guess: $enc"; # trap error this way\&  $utf8 = $enc\->decode($data);\&  # or\&  $utf8 = decode($enc\->name, $data).Ve.SH "ABSTRACT".IX Header "ABSTRACT"Encode::Guess enables you to guess in what encoding a given data isencoded, or at least tries to..SH "DESCRIPTION".IX Header "DESCRIPTION"By default, it checks only ascii, utf8 and \s-1UTF\-16/32\s0 with \s-1BOM\s0..PP.Vb 1\&  use Encode::Guess; # ascii/utf8/BOMed UTF.Ve.PPTo use it more practically, you have to give the names of encodings tocheck (\fIsuspects\fR as follows).  The name of suspects can either becanonical names or aliases..PP\&\s-1CAVEAT:\s0 Unlike \s-1UTF\-\s0(16|32), \s-1BOM\s0 in utf8 is \s-1NOT\s0 \s-1AUTOMATICALLY\s0 \s-1STRIPPED\s0..PP.Vb 2\& # tries all major Japanese Encodings as well\&  use Encode::Guess qw/euc\-jp shiftjis 7bit\-jis/;.Ve.PPIf the \f(CW$Encode::Guess::NoUTFAutoGuess\fR variable is set to a truevalue, no heuristics will be applied to \s-1UTF8/16/32\s0, and the resultwill be limited to the suspects and \f(CW\*(C`ascii\*(C'\fR..IP "Encode::Guess\->set_suspects" 4.IX Item "Encode::Guess->set_suspects"You can also change the internal suspects list via \f(CW\*(C`set_suspects\*(C'\fRmethod..Sp.Vb 2\&  use Encode::Guess;\&  Encode::Guess\->set_suspects(qw/euc\-jp shiftjis 7bit\-jis/);.Ve.IP "Encode::Guess\->add_suspects" 4.IX Item "Encode::Guess->add_suspects"Or you can use \f(CW\*(C`add_suspects\*(C'\fR method.  The difference is that\&\f(CW\*(C`set_suspects\*(C'\fR flushes the current suspects list while\&\f(CW\*(C`add_suspects\*(C'\fR adds..Sp.Vb 5\&  use Encode::Guess;\&  Encode::Guess\->add_suspects(qw/euc\-jp shiftjis 7bit\-jis/);\&  # now the suspects are euc\-jp,shiftjis,7bit\-jis, AND\&  # euc\-kr,euc\-cn, and big5\-eten\&  Encode::Guess\->add_suspects(qw/euc\-kr euc\-cn big5\-eten/);.Ve.ie n .IP "Encode::decode(""Guess"" ...)" 4.el .IP "Encode::decode(``Guess'' ...)" 4.IX Item "Encode::decode(Guess ...)"When you are content with suspects list, you can now.Sp.Vb 1\&  my $utf8 = Encode::decode("Guess", $data);.Ve.IP "Encode::Guess\->guess($data)" 4.IX Item "Encode::Guess->guess($data)"But it will croak if:.RS 4.IP "\(bu" 4Two or more suspects remain.IP "\(bu" 4No suspects left.RE.RS 4.SpSo you should instead try this;.Sp.Vb 1\&  my $decoder = Encode::Guess\->guess($data);.Ve.SpOn success, \f(CW$decoder\fR is an object that is documented inEncode::Encoding.  So you can now do this;.Sp.Vb 1\&  my $utf8 = $decoder\->decode($data);.Ve.SpOn failure, \f(CW$decoder\fR now contains an error message so the whole thingwould be as follows;.Sp.Vb 3\&  my $decoder = Encode::Guess\->guess($data);\&  die $decoder unless ref($decoder);\&  my $utf8 = $decoder\->decode($data);.Ve.RE.IP "guess_encoding($data, [, \fIlist of suspects\fR])" 4.IX Item "guess_encoding($data, [, list of suspects])"You can also try \f(CW\*(C`guess_encoding\*(C'\fR function which is exported bydefault.  It takes \f(CW$data\fR to check and it also takes the list ofsuspects by option.  The optional suspect list is \fInot reflected\fR tothe internal suspects list..Sp.Vb 5\&  my $decoder = guess_encoding($data, qw/euc\-jp euc\-kr euc\-cn/);\&  die $decoder unless ref($decoder);\&  my $utf8 = $decoder\->decode($data);\&  # check only ascii and utf8\&  my $decoder = guess_encoding($data);.Ve.SH "CAVEATS".IX Header "CAVEATS".IP "\(bu" 4Because of the algorithm used, \s-1ISO\-8859\s0 series and other single-byteencodings do not work well unless either one of \s-1ISO\-8859\s0 is the onlyone suspect (besides ascii and utf8)..Sp.Vb 5\&  use Encode::Guess;\&  # perhaps ok\&  my $decoder = guess_encoding($data, \*(Aqlatin1\*(Aq);\&  # definitely NOT ok\&  my $decoder = guess_encoding($data, qw/latin1 greek/);.Ve.SpThe reason is that Encode::Guess guesses encoding by trial and error.It first splits \f(CW$data\fR into lines and tries to decode the line for eachsuspect.  It keeps it going until all but one encoding is eliminatedout of suspects list.  \s-1ISO\-8859\s0 series is just too successful for mostcases (because it fills almost all code points in \ex00\-\exff)..IP "\(bu" 4Do not mix national standard encodings and the corresponding vendorencodings..Sp.Vb 3\&  # a very bad idea\&  my $decoder\&     = guess_encoding($data, qw/shiftjis MacJapanese cp932/);.Ve.SpThe reason is that vendor encoding is usually a superset of nationalstandard so it becomes too ambiguous for most cases..IP "\(bu" 4On the other hand, mixing various national standard encodingsautomagically works unless \f(CW$data\fR is too short to allow for guessing..Sp.Vb 6\& # This is ok if $data is long enough\& my $decoder =  \&  guess_encoding($data, qw/euc\-cn\&                           euc\-jp shiftjis 7bit\-jis\&                           euc\-kr\&                           big5\-eten/);.Ve.IP "\(bu" 4\&\s-1DO\s0 \s-1NOT\s0 \s-1PUT\s0 \s-1TOO\s0 \s-1MANY\s0 \s-1SUSPECTS\s0!  Don't you try something like this!.Sp.Vb 2\&  my $decoder = guess_encoding($data, \&                               Encode\->encodings(":all"));.Ve.PPIt is, after all, just a guess.  You should alway be explicit when itcomes to encodings.  But there are some, especially Japanese,environment that guess-coding is a must.  Use this module with care..SH "TO DO".IX Header "TO DO"Encode::Guess does not work on \s-1EBCDIC\s0 platforms..SH "SEE ALSO".IX Header "SEE ALSO"Encode, Encode::Encoding

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?