📄 encode::supported.3
字号:
One possible workaround is.Sp.Vb 3\& $gsm =~ s/\ex00\ez/\ex00\ex00/;\& $uni = decode("gsm0338", $gsm);\& $uni .= "\exA0" if $gsm =~ /\ex1B\ez/;.Ve.SpNote that the Encode implementation of \s-1GSM0338\s0 does not implement thereuse of Latin capital letters as Greek capital letters (for example,the 0x5A is U+005A (\s-1LATIN\s0 \s-1CAPITAL\s0 \s-1LETTER\s0 Z), not U+0396 (\s-1GREEK\s0 \s-1CAPITAL\s0\&\s-1LETTER\s0 \s-1ZETA\s0)..SpThe \s-1GSM0338\s0 is also covered in Encode::Byte even though it is notan \*(L"extended \s-1ASCII\s0\*(R" encoding..Sh "\s-1CJK:\s0 Chinese, Japanese, Korean (Multibyte)".IX Subsection "CJK: Chinese, Japanese, Korean (Multibyte)"Note that Vietnamese is listed above. Also read \*(L"Encoding vs Charset\*(R"below. Also note that these are implemented in distinct modules bycountries, due to the size concerns (simplified Chinese is mappedto '\s-1CN\s0', continental China, while traditional Chinese is mapped to\&'\s-1TW\s0', Taiwan). Please refer to their respective documentation pages..IP "Encode::CN \*(-- Continental China" 2.IX Item "Encode::CN Continental China".Vb 9\& Standard DOS/Win Macintosh Comment/Reference\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& euc\-cn [1] MacChineseSimp\& (gbk) cp936 [2]\& gb12345\-raw { GB12345 without CES }\& gb2312\-raw { GB2312 without CES }\& hz\& iso\-ir\-165\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\&\& [1] GB2312 is aliased to this. See L<Microsoft\-related naming mess>\& [2] gbk is aliased to this. See L<Microsoft\-related naming mess>.Ve.IP "Encode::JP \*(-- Japan" 2.IX Item "Encode::JP Japan".Vb 11\& Standard DOS/Win Macintosh Comment/Reference\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& euc\-jp\& shiftjis cp932 macJapanese\& 7bit\-jis\& iso\-2022\-jp [RFC1468]\& iso\-2022\-jp\-1 [RFC2237]\& jis0201\-raw { JIS X 0201 (roman + halfwidth kana) without CES }\& jis0208\-raw { JIS X 0208 (Kanji + fullwidth kana) without CES }\& jis0212\-raw { JIS X 0212 (Extended Kanji) without CES }\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-.Ve.IP "Encode::KR \*(-- Korea" 2.IX Item "Encode::KR Korea".Vb 8\& Standard DOS/Win Macintosh Comment/Reference\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& euc\-kr MacKorean [RFC1557]\& cp949 [1] \& iso\-2022\-kr [RFC1557]\& johab [KS X 1001:1998, Annex 3]\& ksc5601\-raw { KSC5601 without CES }\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\&\& [1] ks_c_5601\-1987, (x\-)?windows\-949, and uhc are aliased to this.\& See below..Ve.IP "Encode::TW \*(-- Taiwan" 2.IX Item "Encode::TW Taiwan".Vb 5\& Standard DOS/Win Macintosh Comment/Reference\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& big5\-eten cp950 MacChineseTrad {big5 aliased to big5\-eten}\& big5\-hkscs \& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-.Ve.IP "Encode::HanExtra \*(-- More Chinese via \s-1CPAN\s0" 2.IX Item "Encode::HanExtra More Chinese via CPAN"Due to the size concerns, additional Chinese encodings below aredistributed separately on \s-1CPAN\s0, under the name Encode::HanExtra..Sp.Vb 8\& Standard DOS/Win Macintosh Comment/Reference\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& big5ext CMEX\*(Aqs Big5e Extension\& big5plus CMEX\*(Aqs Big5+ Extension\& cccii Chinese Character Code for Information Interchange\& euc\-tw EUC (Extended Unix Character)\& gb18030 GBK with Traditional Characters\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-.Ve.IP "Encode::JIS2K \*(-- \s-1JIS\s0 X 0213 encodings via \s-1CPAN\s0" 2.IX Item "Encode::JIS2K JIS X 0213 encodings via CPAN"Due to size concerns, additional Japanese encodings below aredistributed separately on \s-1CPAN\s0, under the name Encode::JIS2K..Sp.Vb 8\& Standard DOS/Win Macintosh Comment/Reference\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& euc\-jisx0213\& shiftjisx0123\& iso\-2022\-jp\-3\& jis0213\-1\-raw\& jis0213\-2\-raw\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-.Ve.Sh "Miscellaneous encodings".IX Subsection "Miscellaneous encodings".IP "Encode::EBCDIC" 2.IX Item "Encode::EBCDIC"See perlebcdic for details..Sp.Vb 8\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& cp37\& cp500 \& cp875 \& cp1026 \& cp1047 \& posix\-bc\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-.Ve.IP "Encode::Symbols" 2.IX Item "Encode::Symbols"For symbols and dingbats..Sp.Vb 7\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& symbol\& dingbats\& MacDingbats\& AdobeZdingbat\& AdobeSymbol\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-.Ve.IP "Encode::MIME::Header" 2.IX Item "Encode::MIME::Header"Strictly speaking, \s-1MIME\s0 header encoding documented in \s-1RFC\s0 2047 is moreof encapsulation than encoding. However, their support in modernworld is imperative so they are supported..Sp.Vb 5\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& MIME\-Header [RFC2047]\& MIME\-B [RFC2047]\& MIME\-Q [RFC2047]\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-.Ve.IP "Encode::Guess" 2.IX Item "Encode::Guess"This one is not a name of encoding but a utility that lets you pick upthe most appropriate encoding for a data out of given \fIsuspects\fR. SeeEncode::Guess for details..SH "Unsupported encodings".IX Header "Unsupported encodings"The following encodings are not supported as yet; some because theyare rarely used, some because of technical difficulties. They maybe supported by external modules via \s-1CPAN\s0 in the future, however..IP "\s-1ISO\-2022\-JP\-2\s0 [\s-1RFC1554\s0]" 2.IX Item "ISO-2022-JP-2 [RFC1554]"Not very popular yet. Needs Unicode Database or equivalent toimplement \fIencode()\fR (because it includes \s-1JIS\s0 X 0208/0212, \s-1KSC5601\s0, and\&\s-1GB2312\s0 simultaneously, whose code points in Unicode overlap. So youneed to lookup the database to determine to what character set a givenUnicode character should belong)..IP "\s-1ISO\-2022\-CN\s0 [\s-1RFC1922\s0]" 2.IX Item "ISO-2022-CN [RFC1922]"Not very popular. Needs \s-1CNS\s0 11643\-1 and \-2 which are not available inthis module. \s-1CNS\s0 11643 is supported (via euc-tw) in Encode::HanExtra.Autrijus Tang may add support for this encoding in his module in future..IP "Various HP-UX encodings" 2.IX Item "Various HP-UX encodings"The following are unsupported due to the lack of mapping data..Sp.Vb 2\& \*(Aq8\*(Aq \- arabic8, greek8, hebrew8, kana8, thai8, and turkish8\& \*(Aq15\*(Aq \- japanese15, korean15, and roi15.Ve.IP "Cyrillic encoding \s-1ISO\-IR\-111\s0" 2.IX Item "Cyrillic encoding ISO-IR-111"Anton Tagunov doubts its usefulness..IP "\s-1ISO\-8859\-8\-1\s0 [Hebrew]" 2.IX Item "ISO-8859-8-1 [Hebrew]"None of the Encode team knows Hebrew enough (\s-1ISO\-8859\-8\s0, cp1255 andMacHebrew are supported because and just because there were mappingsavailable at <http://www.unicode.org/>). Contributions welcome..IP "\s-1ISIRI\s0 3342, Iran System, \s-1ISIRI\s0 2900 [Farsi]" 2.IX Item "ISIRI 3342, Iran System, ISIRI 2900 [Farsi]"Ditto..IP "Thai encoding \s-1TCVN\s0" 2.IX Item "Thai encoding TCVN"Ditto..IP "Vietnamese encodings \s-1VPS\s0" 2.IX Item "Vietnamese encodings VPS"Though Jungshik Shin has reported that Mozilla supports this encoding,it was too late before 5.8.0 for us to add it. In the future, itmay be available via a separate module. See<http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvlatin/vps.uf>and<http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvlatin/vps.ut>if you are interested in helping us..IP "Various Mac encodings" 2.IX Item "Various Mac encodings"The following are unsupported due to the lack of mapping data..Sp.Vb 5\& MacArmenian, MacBengali, MacBurmese, MacEthiopic\& MacExtArabic, MacGeorgian, MacKannada, MacKhmer\& MacLaotian, MacMalayalam, MacMongolian, MacOriya\& MacSinhalese, MacTamil, MacTelugu, MacTibetan\& MacVietnamese.Ve.SpThe rest which are already available are based upon the vendor mappingsat <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> ..IP "(Mac) Indic encodings" 2.IX Item "(Mac) Indic encodings"The maps for the following are available at <http://www.unicode.org/>but remain unsupport because those encodings need algorithmicalapproach, currently unsupported by \fIenc2xs\fR:.Sp.Vb 3\& MacDevanagari\& MacGurmukhi\& MacGujarati.Ve.SpFor details, please see \f(CW\*(C`Unicode mapping issues and notes:\*(C'\fR at<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/DEVANAGA.TXT> ..SpI believe this issue is prevalent not only for Mac Indics but also inother Indic encodings, but the above were the only Indic encodingsmaps that I could find at <http://www.unicode.org/> ..SH "Encoding vs. Charset \*(-- terminology".IX Header "Encoding vs. Charset terminology"We are used to using the term (character) \fIencoding\fR and \fIcharacterset\fR interchangeably. But just as confusing the terms byte andcharacter is dangerous and the terms should be differentiated whenneeded, we need to differentiate \fIencoding\fR and \fIcharacter set\fR..PPTo understand that, here is a description of how we make computersgrok our characters..IP "\(bu" 2First we start with which characters to include. We call thiscollection of characters \fIcharacter repertoire\fR..IP "\(bu" 2Then we have to give each character a unique \s-1ID\s0 so your computer cantell the difference between 'a' and 'A'. This itemized characterrepertoire is now a \fIcharacter set\fR..IP "\(bu" 2If your computer can grow the character set without furtherprocessing, you can go ahead and use it. This is called a \fIcodedcharacter set\fR (\s-1CCS\s0) or \fIraw character encoding\fR. \s-1ASCII\s0 is used thisway for most cases..IP "\(bu" 2But in many cases, especially multi-byte \s-1CJK\s0 encodings, you have totweak a little more. Your network connection may not accept any datawith the Most Significant Bit set, and your computer may not be able totell if a given byte is a whole character or just half of it. So youhave to \fIencode\fR the character set to use it..SpA \fIcharacter encoding scheme\fR (\s-1CES\s0) determines how to encode a givencharacter set, or a set of multiple character sets. 7bit \s-1ISO\-2022\s0 isan example of a \s-1CES\s0. You switch between character sets via \fIescapesequences\fR..PPTechnically, or mathematically, speaking, a character set encoded insuch a \s-1CES\s0 that maps character by character may form a \s-1CCS\s0. \s-1EUC\s0 is suchan example. The \s-1CES\s0 of \s-1EUC\s0 is as follows:.IP "\(bu" 2Map \s-1ASCII\s0 unchanged..IP "\(bu" 2Map such a character set that consists of 94 or 96 powered by Nmembers by adding 0x80 to each byte..IP "\(bu" 2You can also use 0x8e and 0x8f to indicate that the following sequence ofcharacters belongs to yet another character set. To each following byteis added the value 0x80..PPBy carefully looking at the encoded byte sequence, you can find that thebyte sequence conforms a unique number. In that sense, \s-1EUC\s0 is a \s-1CCS\s0generated by a \s-1CES\s0 above from up to four \s-1CCS\s0 (complicated?). \s-1UTF\-8\s0falls into this category. See \*(L"\s-1UTF\-8\s0\*(R" in perlUnicode to find out how\&\s-1UTF\-8\s0 maps Unicode to a byte sequence..PPYou may also have found out by now why 7bit \s-1ISO\-2022\s0 cannot comprisea \s-1CCS\s0. If you look at a byte sequence \ex21\ex21, you can't tell ifit is two !'s or \s-1IDEOGRAPHIC\s0 \s-1SPACE\s0. \s-1EUC\s0 maps the latter to \exA1\exA1so you have no trouble differentiating between \*(L"!!\*(R". and \*(L"\ \ \*(R"..SH "Encoding Classification (by Anton Tagunov and Dan Kogai)".IX Header "Encoding Classification (by Anton Tagunov and Dan Kogai)"This section tries to classify the supported encodings by their applicability for information exchange over the Internet and to choose the most suitable aliases to name them in the context of such communication..IP "\(bu" 2To (en|de)code encodings marked by \f(CW\*(C`(**)\*(C'\fR, you need \&\f(CW\*(C`Encode::HanExtra\*(C'\fR, available from \s-1CPAN\s0..PPEncoding names.PP.Vb 3\& US\-ASCII UTF\-8 ISO\-8859\-* KOI8\-R\& Shift_JIS EUC\-JP ISO\-2022\-JP ISO\-2022\-JP\-1\& EUC\-KR Big5 GB2312.Ve.PPare registered with \s-1IANA\s0 as preferred \s-1MIME\s0 names and maybe used over the Internet..PP\&\f(CW\*(C`Shift_JIS\*(C'\fR has been officialized by \s-1JIS\s0 X 0208:1997.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -