📄 encode::supported.3
字号:
\&\*(L"Microsoft-related naming mess\*(R" gives details..PP\&\f(CW\*(C`GB2312\*(C'\fR is the \s-1IANA\s0 name for \f(CW\*(C`EUC\-CN\*(C'\fR.See \*(L"Microsoft-related naming mess\*(R" for details..PP\&\f(CW\*(C`GB_2312\-80\*(C'\fR \fIraw\fR encoding is available as \f(CW\*(C`gb2312\-raw\*(C'\fRwith Encode. See Encode::CN for details..PP.Vb 2\& EUC\-CN\& KOI8\-U [RFC2319].Ve.PPhave not been registered with \s-1IANA\s0 (as of March 2002) butseem to be supported by major web browsers. The \s-1IANA\s0 name for \f(CW\*(C`EUC\-CN\*(C'\fR is \f(CW\*(C`GB2312\*(C'\fR..PP.Vb 1\& KS_C_5601\-1987.Ve.PPis heavily misused.See \*(L"Microsoft-related naming mess\*(R" for details..PP\&\f(CW\*(C`KS_C_5601\-1987\*(C'\fR \fIraw\fR encoding is available as \f(CW\*(C`kcs5601\-raw\*(C'\fRwith Encode. See Encode::KR for details..PP.Vb 1\& UTF\-16 UTF\-16BE UTF\-16LE.Ve.PPare IANA-registered \f(CW\*(C`charset\*(C'\fRs. See [\s-1RFC\s0 2781] for details.Jungshik Shin reports that \s-1UTF\-16\s0 with a \s-1BOM\s0 is well acceptedby \s-1MS\s0 \s-1IE\s0 5/6 and \s-1NS\s0 4/6. Beware however that.IP "\(bu" 2\&\f(CW\*(C`UTF\-16\*(C'\fR support in any software you're going to beusing/interoperating with has probably been less testedthen \f(CW\*(C`UTF\-8\*(C'\fR support.IP "\(bu" 2\&\f(CW\*(C`UTF\-8\*(C'\fR coded data seamlessly passes traditionalcommand piping (\f(CW\*(C`cat\*(C'\fR, \f(CW\*(C`more\*(C'\fR, etc.) while \f(CW\*(C`UTF\-16\*(C'\fR codeddata is likely to cause confusion (with its zero bytes,for example).IP "\(bu" 2it is beyond the power of words to describe the way \s-1HTML\s0 browsersencode non\-\f(CW\*(C`ASCII\*(C'\fR form data. To get a general impression, visit<http://ppewww.ph.gla.ac.uk/~flavell/charset/form\-i18n.html>.While encoding of form data has stabilized for \f(CW\*(C`UTF\-8\*(C'\fR encoded pages(at least \s-1IE\s0 5/6, \s-1NS\s0 6, and Opera 6 behave consistently), be sure toexpect fun (and cross-browser discrepancies) with \f(CW\*(C`UTF\-16\*(C'\fR encodedpages!.PPThe rule of thumb is to use \f(CW\*(C`UTF\-8\*(C'\fR unless you know whatyou're doing and unless you really benefit from using \f(CW\*(C`UTF\-16\*(C'\fR..PP.Vb 5\& ISO\-IR\-165 [RFC1345]\& VISCII\& GB 12345\& GB 18030 (**) (see links bellow)\& EUC\-TW (**).Ve.PPare totally valid encodings but not registered at \s-1IANA\s0.The names under which they are listed here are probably themost widely-known names for these encodings and are recommendednames..PP.Vb 1\& BIG5PLUS (**).Ve.PPis a proprietary name..Sh "Microsoft-related naming mess".IX Subsection "Microsoft-related naming mess"Microsoft products misuse the following names:.IP "\s-1KS_C_5601\-1987\s0" 2.IX Item "KS_C_5601-1987"Microsoft extension to \f(CW\*(C`EUC\-KR\*(C'\fR..SpProper names: \f(CW\*(C`CP949\*(C'\fR, \f(CW\*(C`UHC\*(C'\fR, \f(CW\*(C`x\-windows\-949\*(C'\fR (as used by Mozilla)..SpSee <http://lists.w3.org/Archives/Public/ietf\-charsets/2001AprJun/0033.html>for details..SpEncode aliases \f(CW\*(C`KS_C_5601\-1987\*(C'\fR to \f(CW\*(C`cp949\*(C'\fR to reflect this commonmisusage. \fIRaw\fR \f(CW\*(C`KS_C_5601\-1987\*(C'\fR encoding is available as\&\f(CW\*(C`kcs5601\-raw\*(C'\fR..SpSee Encode::KR for details..IP "\s-1GB2312\s0" 2.IX Item "GB2312"Microsoft extension to \f(CW\*(C`EUC\-CN\*(C'\fR..SpProper names: \f(CW\*(C`CP936\*(C'\fR, \f(CW\*(C`GBK\*(C'\fR..Sp\&\f(CW\*(C`GB2312\*(C'\fR has been registered in the \f(CW\*(C`EUC\-CN\*(C'\fR meaning at\&\s-1IANA\s0. This has partially repaired the situation: Microsoft's \&\f(CW\*(C`GB2312\*(C'\fR has become a superset of the official \f(CW\*(C`GB2312\*(C'\fR..SpEncode aliases \f(CW\*(C`GB2312\*(C'\fR to \f(CW\*(C`euc\-cn\*(C'\fR in full agreement with\&\s-1IANA\s0 registration. \f(CW\*(C`cp936\*(C'\fR is supported separately.\&\fIRaw\fR \f(CW\*(C`GB_2312\-80\*(C'\fR encoding is available as \f(CW\*(C`gb2312\-raw\*(C'\fR..SpSee Encode::CN for details..IP "Big5" 2.IX Item "Big5"Microsoft extension to \f(CW\*(C`Big5\*(C'\fR..SpProper name: \f(CW\*(C`CP950\*(C'\fR..SpEncode separately supports \f(CW\*(C`Big5\*(C'\fR and \f(CW\*(C`cp950\*(C'\fR..IP "Shift_JIS" 2.IX Item "Shift_JIS"Microsoft's understanding of \f(CW\*(C`Shift_JIS\*(C'\fR..Sp\&\s-1JIS\s0 has not endorsed the full Microsoft standard however.The official \f(CW\*(C`Shift_JIS\*(C'\fR includes only \s-1JIS\s0 X 0201 and \s-1JIS\s0 X 0208character sets, while Microsoft has always used \f(CW\*(C`Shift_JIS\*(C'\fRto encode a wider character repertoire. See \f(CW\*(C`IANA\*(C'\fR registration for\&\f(CW\*(C`Windows\-31J\*(C'\fR..SpAs a historical predecessor, Microsoft's variantprobably has more rights for the name, though it may be objectedthat Microsoft shouldn't have used \s-1JIS\s0 as part of the namein the first place..SpUnambiguous name: \f(CW\*(C`CP932\*(C'\fR. \f(CW\*(C`IANA\*(C'\fR name (also used by Mozilla, andprovided as an alias by Encode): \f(CW\*(C`Windows\-31J\*(C'\fR..SpEncode separately supports \f(CW\*(C`Shift_JIS\*(C'\fR and \f(CW\*(C`cp932\*(C'\fR..SH "Glossary".IX Header "Glossary".IP "character repertoire" 2.IX Item "character repertoire"A collection of unique characters. A \fIcharacter\fR set in the strictestsense. At this stage, characters are not numbered..IP "coded character set (\s-1CCS\s0)" 2.IX Item "coded character set (CCS)"A character set that is mapped in a way computers can use directly.Many character encodings, including \s-1EUC\s0, fall in this category..IP "character encoding scheme (\s-1CES\s0)" 2.IX Item "character encoding scheme (CES)"An algorithm to map a character set to a byte sequence. You don'thave to be able to tell which character set a given byte sequencebelongs. 7\-bit \s-1ISO\-2022\s0 is a \s-1CES\s0 but it cannot be a \s-1CCS\s0. \s-1EUC\s0 is anexample of being both a \s-1CCS\s0 and \s-1CES\s0..IP "charset (in \s-1MIME\s0 context)" 2.IX Item "charset (in MIME context)"has long been used in the meaning of \f(CW\*(C`encoding\*(C'\fR, \s-1CES\s0..SpWhile the word combination \f(CW\*(C`character set\*(C'\fR has lost this meaningin \s-1MIME\s0 context since [\s-1RFC\s0 2130], the \f(CW\*(C`charset\*(C'\fR abbreviation hasretained it. This is how [\s-1RFC\s0 2277] and [\s-1RFC\s0 2278] bless \f(CW\*(C`charset\*(C'\fR:.Sp.Vb 7\& This document uses the term "charset" to mean a set of rules for\& mapping from a sequence of octets to a sequence of characters, such\& as the combination of a coded character set and a character encoding\& scheme; this is also what is used as an identifier in MIME "charset="\& parameters, and registered in the IANA charset registry ... (Note\& that this is NOT a term used by other standards bodies, such as ISO).\& [RFC 2277].Ve.IP "\s-1EUC\s0" 2.IX Item "EUC"Extended Unix Character. See \s-1ISO\-2022\s0..IP "\s-1ISO\-2022\s0" 2.IX Item "ISO-2022"A \s-1CES\s0 that was carefully designed to coexist with \s-1ASCII\s0. There are a 7bit version and an 8 bit version..SpThe 7 bit version switches character set via escape sequence so itcannot form a \s-1CCS\s0. Since this is more difficult to handle in programsthan the 8 bit version, the 7 bit version is not very popular except foriso\-2022\-jp, the \fIde facto\fR standard \s-1CES\s0 for e\-mails..SpThe 8 bit version can form a \s-1CCS\s0. \s-1EUC\s0 and \s-1ISO\-8859\s0 are two examplesthereof. Pre\-5.6 perl could use them as string literals..IP "\s-1UCS\s0" 2.IX Item "UCS"Short for \fIUniversal Character Set\fR. When you say just \s-1UCS\s0, it means\&\fIUnicode\fR..IP "\s-1UCS\-2\s0" 2.IX Item "UCS-2"\&\s-1ISO/IEC\s0 10646 encoding form: Universal Character Set coded in twooctets..IP "Unicode" 2.IX Item "Unicode"A character set that aims to include all character repertoires of theworld. Many character sets in various national as well as industrialstandards have become, in a way, just subsets of Unicode..IP "\s-1UTF\s0" 2.IX Item "UTF"Short for \fIUnicode Transformation Format\fR. Determines how to map aUnicode character into a byte sequence..IP "\s-1UTF\-16\s0" 2.IX Item "UTF-16"A \s-1UTF\s0 in 16\-bit encoding. Can either be in big endian or littleendian. The big endian version is called \s-1UTF\-16BE\s0 (equal to \s-1UCS\-2\s0 + surrogate support) and the little endian version is called \s-1UTF\-16LE\s0..SH "See Also".IX Header "See Also"Encode, Encode::Byte, Encode::CN, Encode::JP, Encode::KR, Encode::TW,Encode::EBCDIC, Encode::SymbolEncode::MIME::Header, Encode::Guess.SH "References".IX Header "References".IP "\s-1ECMA\s0" 2.IX Item "ECMA"European Computer Manufacturers Association<http://www.ecma.ch>.RS 2.ie n .IP "\s-1ECMA\-035\s0 (eq ""ISO\-2022"")" 2.el .IP "\s-1ECMA\-035\s0 (eq \f(CWISO\-2022\fR)" 2.IX Item "ECMA-035 (eq ISO-2022)"<http://www.ecma.ch/ecma1/STAND/ECMA\-035.HTM>.SpThe specification of \s-1ISO\-2022\s0 is available from the link above..RE.RS 2.RE.IP "\s-1IANA\s0" 2.IX Item "IANA"Internet Assigned Numbers Authority<http://www.iana.org/>.RS 2.IP "Assigned Charset Names by \s-1IANA\s0" 2.IX Item "Assigned Charset Names by IANA"<http://www.iana.org/assignments/character\-sets>.SpMost of the \f(CW\*(C`canonical names\*(C'\fR in Encode derive from this listso you can directly apply the string you have extracted from \s-1MIME\s0header of mails and web pages..RE.RS 2.RE.IP "\s-1ISO\s0" 2.IX Item "ISO"International Organization for Standardization<http://www.iso.ch/>.IP "\s-1RFC\s0" 2.IX Item "RFC"Request For Comments \*(-- need I say more?<http://www.rfc\-editor.org/>, <http://www.rfc.net/>,<http://www.faqs.org/rfcs/>.IP "\s-1UC\s0" 2.IX Item "UC"Unicode Consortium<http://www.unicode.org/>.RS 2.IP "Unicode Glossary" 2.IX Item "Unicode Glossary"<http://www.unicode.org/glossary/>.SpThe glossary of this document is based upon this site..RE.RS 2.RE.Sh "Other Notable Sites".IX Subsection "Other Notable Sites".IP "czyborra.com" 2.IX Item "czyborra.com"<http://czyborra.com/>.SpContains a lot of useful information, especially gory details of \s-1ISO\s0vs. vendor mappings..IP "\s-1CJK\s0.inf" 2.IX Item "CJK.inf"<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>.SpSomewhat obsolete (last update in 1996), but still useful. Also try.Sp<ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf>.SpYou will find brief info on \f(CW\*(C`EUC\-CN\*(C'\fR, \f(CW\*(C`GBK\*(C'\fR and mostly on \f(CW\*(C`GB 18030\*(C'\fR..IP "Jungshik Shin's Hangul \s-1FAQ\s0" 2.IX Item "Jungshik Shin's Hangul FAQ"<http://jshin.net/faq>.SpAnd especially its subject 8..Sp<http://jshin.net/faq/qa8.html>.SpA comprehensive overview of the Korean (\f(CW\*(C`KS *\*(C'\fR) standards..ie n .IP "debian.org: ""Introduction to i18n""" 2.el .IP "debian.org: ``Introduction to i18n''" 2.IX Item "debian.org: Introduction to i18n"A brief description for most of the mentioned \s-1CJK\s0 encodings iscontained in<http://www.debian.org/doc/manuals/intro\-i18n/ch\-codes.en.html>.Sh "Offline sources".IX Subsection "Offline sources".ie n .IP """CJKV Information Processing"" by Ken Lunde" 2.el .IP "\f(CWCJKV Information Processing\fR by Ken Lunde" 2.IX Item "CJKV Information Processing by Ken Lunde"\&\s-1CJKV\s0 Information Processing1999 O'Reilly & Associates, \s-1ISBN\s0 : 1\-56592\-224\-7.SpThe modern successor of \f(CW\*(C`CJK.inf\*(C'\fR..SpFeatures a comprehensive coverage of \s-1CJKV\s0 character sets andencodings along with many other issues faced by anyone tryingto better support \s-1CJKV\s0 languages/scripts in all the areas ofinformation processing..SpTo purchase this book, visit<http://www.oreilly.com/catalog/cjkvinfo/>or your favourite bookstore.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -