📄 unicode::collate.3
字号:
.IP "ignoreName" 4.IX Item "ignoreName".PD\&\-\- see 3.2.2 Variable Weighting, \s-1UTS\s0 #10..SpMakes the entry in the table completely ignorable;i.e. as if the weights were zero at all level..SpThrough \f(CW\*(C`ignoreChar\*(C'\fR, any character matching \f(CW\*(C`qr/$ignoreChar/\*(C'\fRwill be ignored. Through \f(CW\*(C`ignoreName\*(C'\fR, any character whose name(given in the \f(CW\*(C`table\*(C'\fR file as a comment) matches \f(CW\*(C`qr/$ignoreName/\*(C'\fRwill be ignored..SpE.g. when 'a' and 'e' are ignorable,\&'element' is equal to 'lament' (or 'lmnt')..IP "katakana_before_hiragana" 4.IX Item "katakana_before_hiragana"\&\-\- see 7.3.1 Tertiary Weight Table, \s-1UTS\s0 #10..SpBy default, hiragana is before katakana.If the parameter is made true, this is reversed..Sp\&\fB\s-1NOTE\s0\fR: This parameter simplemindedly assumes that any hiragana/katakanadistinctions must occur in level 3, and their weights at level 3 must besame as those mentioned in 7.3.1, \s-1UTS\s0 #10.If you define your collation elements which violate this requirement,this parameter does not work validly..IP "level" 4.IX Item "level"\&\-\- see 4.3 Form Sort Key, \s-1UTS\s0 #10..SpSet the maximum level.Any higher levels than the specified one are ignored..Sp.Vb 4\& Level 1: alphabetic ordering\& Level 2: diacritic ordering\& Level 3: case ordering\& Level 4: tie\-breaking (e.g. in the case when variable is \*(Aqshifted\*(Aq)\&\& ex.level => 2,.Ve.SpIf omitted, the maximum is the 4th..IP "normalization" 4.IX Item "normalization"\&\-\- see 4.1 Normalize, \s-1UTS\s0 #10..SpIf specified, strings are normalized before preparation of sort keys(the normalization is executed after preprocess)..SpA form name \f(CW\*(C`Unicode::Normalize::normalize()\*(C'\fR accepts will be appliedas \f(CW$normalization_form\fR.Acceptable names include \f(CW\*(AqNFD\*(Aq\fR, \f(CW\*(AqNFC\*(Aq\fR, \f(CW\*(AqNFKD\*(Aq\fR, and \f(CW\*(AqNFKC\*(Aq\fR.See \f(CW\*(C`Unicode::Normalize::normalize()\*(C'\fR for detail.If omitted, \f(CW\*(AqNFD\*(Aq\fR is used..Sp\&\f(CW\*(C`normalization\*(C'\fR is performed after \f(CW\*(C`preprocess\*(C'\fR (if defined)..SpFurthermore, special values, \f(CW\*(C`undef\*(C'\fR and \f(CW"prenormalized"\fR, can be used,though they are not concerned with \f(CW\*(C`Unicode::Normalize::normalize()\*(C'\fR..SpIf \f(CW\*(C`undef\*(C'\fR (not a string \f(CW"undef"\fR) is passed explicitlyas the value for this key,any normalization is not carried out (this may make tailoring easierif any normalization is not desired). Under \f(CW\*(C`(normalization => undef)\*(C'\fR,only contiguous contractions are resolved;e.g. even if \f(CW\*(C`A\-ring\*(C'\fR (and \f(CW\*(C`A\-ring\-cedilla\*(C'\fR) is ordered after \f(CW\*(C`Z\*(C'\fR,\&\f(CW\*(C`A\-cedilla\-ring\*(C'\fR would be primary equal to \f(CW\*(C`A\*(C'\fR.In this point,\&\f(CW\*(C`(normalization => undef, preprocess => sub { NFD(shift) })\*(C'\fR\&\fBis not\fR equivalent to \f(CW\*(C`(normalization => \*(AqNFD\*(Aq)\*(C'\fR..SpIn the case of \f(CW\*(C`(normalization => "prenormalized")\*(C'\fR,any normalization is not performed, butnon-contiguous contractions with combining characters are performed.Therefore\&\f(CW\*(C`(normalization => \*(Aqprenormalized\*(Aq, preprocess => sub { NFD(shift) })\*(C'\fR\&\fBis\fR equivalent to \f(CW\*(C`(normalization => \*(AqNFD\*(Aq)\*(C'\fR.If source strings are finely prenormalized,\&\f(CW\*(C`(normalization => \*(Aqprenormalized\*(Aq)\*(C'\fR may save time for normalization..SpExcept \f(CW\*(C`(normalization => undef)\*(C'\fR,\&\fBUnicode::Normalize\fR is required (see also \fB\s-1CAVEAT\s0\fR)..IP "overrideCJK" 4.IX Item "overrideCJK"\&\-\- see 7.1 Derived Collation Elements, \s-1UTS\s0 #10..SpBy default, \s-1CJK\s0 Unified Ideographs are ordered in Unicode codepoint orderbut \f(CW\*(C`CJK Unified Ideographs\*(C'\fR (if \f(CW\*(C`UCA_Version\*(C'\fR is 8 to 11, its range is\&\f(CW\*(C`U+4E00..U+9FA5\*(C'\fR; if \f(CW\*(C`UCA_Version\*(C'\fR is 14, its range is \f(CW\*(C`U+4E00..U+9FBB\*(C'\fR)are lesser than \f(CW\*(C`CJK Unified Ideographs Extension\*(C'\fR (its range is\&\f(CW\*(C`U+3400..U+4DB5\*(C'\fR and \f(CW\*(C`U+20000..U+2A6D6\*(C'\fR)..SpThrough \f(CW\*(C`overrideCJK\*(C'\fR, ordering of \s-1CJK\s0 Unified Ideographs can be overrided..Spex. \s-1CJK\s0 Unified Ideographs in the \s-1JIS\s0 code point order..Sp.Vb 7\& overrideCJK => sub {\& my $u = shift; # get a Unicode codepoint\& my $b = pack(\*(Aqn\*(Aq, $u); # to UTF\-16BE\& my $s = your_unicode_to_sjis_converter($b); # convert\& my $n = unpack(\*(Aqn\*(Aq, $s); # convert sjis to short\& [ $n, 0x20, 0x2, $u ]; # return the collation element\& },.Ve.Spex. ignores all \s-1CJK\s0 Unified Ideographs..Sp.Vb 1\& overrideCJK => sub {()}, # CODEREF returning empty list\&\& # where \->eq("Pe\ex{4E00}rl", "Perl") is true\& # as U+4E00 is a CJK Unified Ideograph and to be ignorable..Ve.SpIf \f(CW\*(C`undef\*(C'\fR is passed explicitly as the value for this key,weights for \s-1CJK\s0 Unified Ideographs are treated as undefined.But assignment of weight for \s-1CJK\s0 Unified Ideographsin table or \f(CW\*(C`entry\*(C'\fR is still valid..IP "overrideHangul" 4.IX Item "overrideHangul"\&\-\- see 7.1 Derived Collation Elements, \s-1UTS\s0 #10..SpBy default, Hangul Syllables are decomposed into Hangul Jamo,even if \f(CW\*(C`(normalization => undef)\*(C'\fR.But the mapping of Hangul Syllables may be overrided..SpThis parameter works like \f(CW\*(C`overrideCJK\*(C'\fR, so see there for examples..SpIf you want to override the mapping of Hangul Syllables,\&\s-1NFD\s0, \s-1NFKD\s0, and \s-1FCD\s0 are not appropriate,since they will decompose Hangul Syllables before overriding..SpIf \f(CW\*(C`undef\*(C'\fR is passed explicitly as the value for this key,weight for Hangul Syllables is treated as undefinedwithout decomposition into Hangul Jamo.But definition of weight for Hangul Syllablesin table or \f(CW\*(C`entry\*(C'\fR is still valid..IP "preprocess" 4.IX Item "preprocess"\&\-\- see 5.1 Preprocessing, \s-1UTS\s0 #10..SpIf specified, the coderef is used to preprocessbefore the formation of sort keys..Spex. dropping English articles, such as \*(L"a\*(R" or \*(L"the\*(R".Then, \*(L"the pen\*(R" is before \*(L"a pencil\*(R"..Sp.Vb 5\& preprocess => sub {\& my $str = shift;\& $str =~ s/\eb(?:an?|the)\es+//gi;\& return $str;\& },.Ve.Sp\&\f(CW\*(C`preprocess\*(C'\fR is performed before \f(CW\*(C`normalization\*(C'\fR (if defined)..IP "rearrange" 4.IX Item "rearrange"\&\-\- see 3.1.3 Rearrangement, \s-1UTS\s0 #10..SpCharacters that are not coded in logical order and to be rearranged.If \f(CW\*(C`UCA_Version\*(C'\fR is equal to or lesser than 11, default is:.Sp.Vb 1\& rearrange => [ 0x0E40..0x0E44, 0x0EC0..0x0EC4 ],.Ve.SpIf you want to disallow any rearrangement, pass \f(CW\*(C`undef\*(C'\fR or \f(CW\*(C`[]\*(C'\fR(a reference to empty list) as the value for this key..SpIf \f(CW\*(C`UCA_Version\*(C'\fR is equal to 14, default is \f(CW\*(C`[]\*(C'\fR (i.e. no rearrangement)..Sp\&\fBAccording to the version 9 of \s-1UCA\s0, this parameter shall not be used;but it is not warned at present.\fR.IP "table" 4.IX Item "table"\&\-\- see 3.2 Default Unicode Collation Element Table, \s-1UTS\s0 #10..SpYou can use another collation element table if desired..SpThe table file should locate in the \fIUnicode/Collate\fR directoryon \f(CW@INC\fR. Say, if the filename is \fIFoo.txt\fR,the table file is searched as \fIUnicode/Collate/Foo.txt\fR in \f(CW@INC\fR..SpBy default, \fIallkeys.txt\fR (as the filename of \s-1DUCET\s0) is used.If you will prepare your own table file, any name other than \fIallkeys.txt\fRmay be better to avoid namespace conflict..SpIf \f(CW\*(C`undef\*(C'\fR is passed explicitly as the value for this key,no file is read (but you can define collation elements via \f(CW\*(C`entry\*(C'\fR)..SpA typical way to define a collation element tablewithout any file of table:.Sp.Vb 11\& $onlyABC = Unicode::Collate\->new(\& table => undef,\& entry => << \*(AqENTRIES\*(Aq,\&0061 ; [.0101.0020.0002.0061] # LATIN SMALL LETTER A\&0041 ; [.0101.0020.0008.0041] # LATIN CAPITAL LETTER A\&0062 ; [.0102.0020.0002.0062] # LATIN SMALL LETTER B\&0042 ; [.0102.0020.0008.0042] # LATIN CAPITAL LETTER B\&0063 ; [.0103.0020.0002.0063] # LATIN SMALL LETTER C\&0043 ; [.0103.0020.0008.0043] # LATIN CAPITAL LETTER C\&ENTRIES\& );.Ve.SpIf \f(CW\*(C`ignoreName\*(C'\fR or \f(CW\*(C`undefName\*(C'\fR is used, character names should bespecified as a comment (following \f(CW\*(C`#\*(C'\fR) on each line..IP "undefChar" 4.IX Item "undefChar".PD 0.IP "undefName" 4.IX Item "undefName".PD\&\-\- see 6.3.4 Reducing the Repertoire, \s-1UTS\s0 #10..SpUndefines the collation element as if it were unassigned in the table.This reduces the size of the table.If an unassigned character appears in the string to be collated,the sort key is made from its codepointas a single-character collation element,as it is greater than any other assigned collation elements(in the codepoint order among the unassigned characters).But, it'd be better to ignore charactersunfamiliar to you and maybe never used..SpThrough \f(CW\*(C`undefChar\*(C'\fR, any character matching \f(CW\*(C`qr/$undefChar/\*(C'\fRwill be undefined. Through \f(CW\*(C`undefName\*(C'\fR, any character whose name(given in the \f(CW\*(C`table\*(C'\fR file as a comment) matches \f(CW\*(C`qr/$undefName/\*(C'\fRwill be undefined..Spex. Collation weights for beyond-BMP characters are not stored in object:.Sp.Vb 1\& undefChar => qr/[^\e0\-\ex{fffd}]/,.Ve.IP "upper_before_lower" 4.IX Item "upper_before_lower"\&\-\- see 6.6 Case Comparisons, \s-1UTS\s0 #10..SpBy default, lowercase is before uppercase.If the parameter is made true, this is reversed..Sp\&\fB\s-1NOTE\s0\fR: This parameter simplemindedly assumes that any lowercase/uppercasedistinctions must occur in level 3, and their weights at level 3 must besame as those mentioned in 7.3.1, \s-1UTS\s0 #10.If you define your collation elements which differs from this requirement,this parameter doesn't work validly..IP "variable" 4.IX Item "variable"\&\-\- see 3.2.2 Variable Weighting, \s-1UTS\s0 #10..SpThis key allows to variable weighting for variable collation elements,which are marked with an \s-1ASTERISK\s0 in the table(\s-1NOTE:\s0 Many punction marks and symbols are variable in \fIallkeys.txt\fR)..Sp.Vb 1\& variable => \*(Aqblanked\*(Aq, \*(Aqnon\-ignorable\*(Aq, \*(Aqshifted\*(Aq, or \*(Aqshift\-trimmed\*(Aq..Ve.SpThese names are case-insensitive.By default (if specification is omitted), 'shifted' is adopted..Sp.Vb 2\& \*(AqBlanked\*(Aq Variable elements are made ignorable at levels 1 through 3;\& considered at the 4th level.\&\& \*(AqNon\-Ignorable\*(Aq Variable elements are not reset to ignorable.\&\& \*(AqShifted\*(Aq Variable elements are made ignorable at levels 1 through 3\& their level 4 weight is replaced by the old level 1 weight.\& Level 4 weight for Non\-Variable elements is 0xFFFF.\&\& \*(AqShift\-Trimmed\*(Aq Same as \*(Aqshifted\*(Aq, but all FFFF\*(Aqs at the 4th level\& are trimmed..Ve.Sh "Methods for Collation".IX Subsection "Methods for Collation".ie n .IP """@sorted = $Collator\->sort(@not_sorted)""" 4.el .IP "\f(CW@sorted = $Collator\->sort(@not_sorted)\fR" 4.IX Item "@sorted = $Collator->sort(@not_sorted)"Sorts a list of strings..ie n .IP """$result = $Collator\->cmp($a, $b)""" 4.el .IP "\f(CW$result = $Collator\->cmp($a, $b)\fR" 4.IX Item "$result = $Collator->cmp($a, $b)"Returns 1 (when \f(CW$a\fR is greater than \f(CW$b\fR)or 0 (when \f(CW$a\fR is equal to \f(CW$b\fR)or \-1 (when \f(CW$a\fR is lesser than \f(CW$b\fR)..ie n .IP """$result = $Collator\->eq($a, $b)""" 4.el .IP "\f(CW$result = $Collator\->eq($a, $b)\fR" 4.IX Item "$result = $Collator->eq($a, $b)".PD 0.ie n .IP """$result = $Collator\->ne($a, $b)""" 4.el .IP "\f(CW$result = $Collator\->ne($a, $b)\fR" 4.IX Item "$result = $Collator->ne($a, $b)".ie n .IP """$result = $Collator\->lt($a, $b)""" 4.el .IP "\f(CW$result = $Collator\->lt($a, $b)\fR" 4.IX Item "$result = $Collator->lt($a, $b)"
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -