📄 unicode::ucd.3
字号:
See also \*(L"Blocks versus Scripts\*(R"..PPIf supplied with an argument that can't be a code point, \fIcharscript()\fR triesto do the opposite and interpret the argument as a character script. Thereturn value is a \fIrange\fR: an anonymous list of lists that contain\&\fIstart-of-range\fR, \fIend-of-range\fR code point pairs. You can test whether acode point is in a range using the \*(L"charinrange\*(R" function. If theargument is not a known character script, \f(CW\*(C`undef\*(C'\fR is returned..Sh "charblocks".IX Subsection "charblocks".Vb 1\& use Unicode::UCD \*(Aqcharblocks\*(Aq;\&\& my $charblocks = charblocks();.Ve.PP\&\fIcharblocks()\fR returns a reference to a hash with the known block namesas the keys, and the code point ranges (see \*(L"charblock\*(R") as the values..PPSee also \*(L"Blocks versus Scripts\*(R"..Sh "charscripts".IX Subsection "charscripts".Vb 1\& use Unicode::UCD \*(Aqcharscripts\*(Aq;\&\& my $charscripts = charscripts();.Ve.PP\&\fIcharscripts()\fR returns a reference to a hash with the known scriptnames as the keys, and the code point ranges (see \*(L"charscript\*(R") asthe values..PPSee also \*(L"Blocks versus Scripts\*(R"..Sh "Blocks versus Scripts".IX Subsection "Blocks versus Scripts"The difference between a block and a script is that scripts are closerto the linguistic notion of a set of characters required to presentlanguages, while block is more of an artifact of the Unicode characternumbering and separation into blocks of (mostly) 256 characters..PPFor example the Latin \fBscript\fR is spread over several \fBblocks\fR, suchas \f(CW\*(C`Basic Latin\*(C'\fR, \f(CW\*(C`Latin 1 Supplement\*(C'\fR, \f(CW\*(C`Latin Extended\-A\*(C'\fR, and\&\f(CW\*(C`Latin Extended\-B\*(C'\fR. On the other hand, the Latin script does notcontain all the characters of the \f(CW\*(C`Basic Latin\*(C'\fR block (also known asthe \s-1ASCII\s0): it includes only the letters, and not, for example, the digitsor the punctuation..PPFor blocks see http://www.unicode.org/Public/UNIDATA/Blocks.txt.PPFor scripts see \s-1UTR\s0 #24: http://www.unicode.org/unicode/reports/tr24/.Sh "Matching Scripts and Blocks".IX Subsection "Matching Scripts and Blocks"Scripts are matched with the regular-expression construct\&\f(CW\*(C`\ep{...}\*(C'\fR (e.g. \f(CW\*(C`\ep{Tibetan}\*(C'\fR matches characters of the Tibetan script),while \f(CW\*(C`\ep{In...}\*(C'\fR is used for blocks (e.g. \f(CW\*(C`\ep{InTibetan}\*(C'\fR matchesany of the 256 code points in the Tibetan block)..Sh "Code Point Arguments".IX Subsection "Code Point Arguments"A \fIcode point argument\fR is either a decimal or a hexadecimal scalardesignating a Unicode character, or \f(CW\*(C`U+\*(C'\fR followed by hexadecimalsdesignating a Unicode character. In other words, if you want a codepoint to be interpreted as a hexadecimal number, you must prefix itwith either \f(CW\*(C`0x\*(C'\fR or \f(CW\*(C`U+\*(C'\fR, because a string like e.g. \f(CW123\fR willbe interpreted as a decimal code point. Also note that Unicode is\&\fBnot\fR limited to 16 bits (the number of Unicode characters isopen-ended, in theory unlimited): you may have more than 4 hexdigits..Sh "charinrange".IX Subsection "charinrange"In addition to using the \f(CW\*(C`\ep{In...}\*(C'\fR and \f(CW\*(C`\eP{In...}\*(C'\fR constructs, youcan also test whether a code point is in the \fIrange\fR as returned by\&\*(L"charblock\*(R" and \*(L"charscript\*(R" or as the values of the hash returnedby \*(L"charblocks\*(R" and \*(L"charscripts\*(R" by using \fIcharinrange()\fR:.PP.Vb 1\& use Unicode::UCD qw(charscript charinrange);\&\& $range = charscript(\*(AqHiragana\*(Aq);\& print "looks like hiragana\en" if charinrange($range, $codepoint);.Ve.Sh "general_categories".IX Subsection "general_categories".Vb 1\& use Unicode::UCD \*(Aqgeneral_categories\*(Aq;\&\& my $categories = general_categories();.Ve.PPThe \fIgeneral_categories()\fR returns a reference to a hash which has shortgeneral category names (such as \f(CW\*(C`Lu\*(C'\fR, \f(CW\*(C`Nd\*(C'\fR, \f(CW\*(C`Zs\*(C'\fR, \f(CW\*(C`S\*(C'\fR) as keys and longnames (such as \f(CW\*(C`UppercaseLetter\*(C'\fR, \f(CW\*(C`DecimalNumber\*(C'\fR, \f(CW\*(C`SpaceSeparator\*(C'\fR,\&\f(CW\*(C`Symbol\*(C'\fR) as values. The hash is reversible in case you need to gofrom the long names to the short names. The general category is theone returned from \fIcharinfo()\fR under the \f(CW\*(C`category\*(C'\fR key..Sh "bidi_types".IX Subsection "bidi_types".Vb 1\& use Unicode::UCD \*(Aqbidi_types\*(Aq;\&\& my $categories = bidi_types();.Ve.PPThe \fIbidi_types()\fR returns a reference to a hash which has the shortbidi (bidirectional) type names (such as \f(CW\*(C`L\*(C'\fR, \f(CW\*(C`R\*(C'\fR) as keys and longnames (such as \f(CW\*(C`Left\-to\-Right\*(C'\fR, \f(CW\*(C`Right\-to\-Left\*(C'\fR) as values. Thehash is reversible in case you need to go from the long names to theshort names. The bidi type is the one returned from \fIcharinfo()\fRunder the \f(CW\*(C`bidi\*(C'\fR key. For the exact meaning of the various bidi classesthe Unicode \s-1TR9\s0 is recommended reading:http://www.unicode.org/reports/tr9/tr9\-17.html(as of Unicode 5.0.0).Sh "compexcl".IX Subsection "compexcl".Vb 1\& use Unicode::UCD \*(Aqcompexcl\*(Aq;\&\& my $compexcl = compexcl("09dc");.Ve.PPThe \fIcompexcl()\fR returns the composition exclusion (that is, if thecharacter should not be produced during a precomposition) of the character specified by a \fBcode point argument\fR..PPIf there is a composition exclusion for the character, true isreturned. Otherwise, false is returned..Sh "casefold".IX Subsection "casefold".Vb 1\& use Unicode::UCD \*(Aqcasefold\*(Aq;\&\& my $casefold = casefold("00DF");.Ve.PPThe \fIcasefold()\fR returns the locale-independent case folding of thecharacter specified by a \fBcode point argument\fR..PPIf there is a case folding for that character, a reference to a hashwith the following fields is returned:.PP.Vb 1\& key\&\& code code point with at least four hexdigits\& status "C", "F", "S", or "I"\& mapping one or more codes separated by spaces.Ve.PPThe meaning of the \fIstatus\fR is as follows:.PP.Vb 10\& C common case folding, common mappings shared\& by both simple and full mappings\& F full case folding, mappings that cause strings\& to grow in length. Multiple characters are separated\& by spaces\& S simple case folding, mappings to single characters\& where different from F\& I special case for dotted uppercase I and\& dotless lowercase i\& \- If this mapping is included, the result is\& case\-insensitive, but dotless and dotted I\*(Aqs\& are not distinguished\& \- If this mapping is excluded, the result is not\& fully case\-insensitive, but dotless and dotted\& I\*(Aqs are distinguished.Ve.PPIf there is no case folding for that character, \f(CW\*(C`undef\*(C'\fR is returned..PPFor more information about case mappings seehttp://www.unicode.org/unicode/reports/tr21/.Sh "casespec".IX Subsection "casespec".Vb 1\& use Unicode::UCD \*(Aqcasespec\*(Aq;\&\& my $casespec = casespec("FB00");.Ve.PPThe \fIcasespec()\fR returns the potentially locale-dependent case mappingof the character specified by a \fBcode point argument\fR. The mappingmay change the length of the string (which the basic Unicode casemappings as returned by \fIcharinfo()\fR never do)..PPIf there is a case folding for that character, a reference to a hashwith the following fields is returned:.PP.Vb 1\& key\&\& code code point with at least four hexdigits\& lower lowercase\& title titlecase\& upper uppercase\& condition condition list (may be undef).Ve.PPThe \f(CW\*(C`condition\*(C'\fR is optional. Where present, it consists of one ormore \fIlocales\fR or \fIcontexts\fR, separated by spaces (other than asused to separate elements, spaces are to be ignored). A conditionlist overrides the normal behavior if all of the listed conditions aretrue. Case distinctions in the condition list are not significant.Conditions preceded by \*(L"\s-1NON_\s0\*(R" represent the negation of the condition..PPNote that when there are multiple case folding definitions for asingle code point because of different locales, the value returned by\&\fIcasespec()\fR is a hash reference which has the locales as the keys andhash references as described above as the values..PPA \fIlocale\fR is defined as a 2\-letter \s-1ISO\s0 3166 country code, possiblyfollowed by a \*(L"_\*(R" and a 2\-letter \s-1ISO\s0 language code (possibly followedby a \*(L"_\*(R" and a variant code). You can find the lists of those codes,see Locale::Country and Locale::Language..PPA \fIcontext\fR is one of the following choices:.PP.Vb 4\& FINAL The letter is not followed by a letter of\& general category L (e.g. Ll, Lt, Lu, Lm, or Lo)\& MODERN The mapping is only used for modern text\& AFTER_i The last base character was "i" (U+0069).Ve.PPFor more information about case mappings seehttp://www.unicode.org/unicode/reports/tr21/.Sh "\fInamedseq()\fP".IX Subsection "namedseq()".Vb 1\& use Unicode::UCD \*(Aqnamedseq\*(Aq;\&\& my $namedseq = namedseq("KATAKANA LETTER AINU P");\& my @namedseq = namedseq("KATAKANA LETTER AINU P");\& my %namedseq = namedseq();.Ve.PPIf used with a single argument in a scalar context, returns the stringconsisting of the code points of the named sequence, or \f(CW\*(C`undef\*(C'\fR if nonamed sequence by that name exists. If used with a single argument ina list context, returns list of the code points. If used with noarguments in a list context, returns a hash with the names of thenamed sequences as the keys and the named sequences as strings asthe values. Otherwise, returns \f(CW\*(C`undef\*(C'\fR or empty list dependingon the context..PP(New from Unicode 4.1.0).Sh "Unicode::UCD::UnicodeVersion".IX Subsection "Unicode::UCD::UnicodeVersion"\&\fIUnicode::UCD::UnicodeVersion()\fR returns the version of the UnicodeCharacter Database, in other words, the version of the Unicodestandard the database implements. The version is a stringof numbers delimited by dots (\f(CW\*(Aq.\*(Aq\fR)..Sh "Implementation Note".IX Subsection "Implementation Note"The first use of \fIcharinfo()\fR opens a read-only filehandle to the UnicodeCharacter Database (the database is included in the Perl distribution).The filehandle is then kept open for further queries. In other words,if you are wondering where one of your filehandles went, that's where..SH "BUGS".IX Header "BUGS"Does not yet support \s-1EBCDIC\s0 platforms..SH "AUTHOR".IX Header "AUTHOR"Jarkko Hietaniemi
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -