📄 ucd.pm

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 PM
📖 第 1 页 / 共 2 页
字号:
上一页 12
while C<\p{In...}> is used for blocks (e.g. C<\p{InTibetan}> matchesany of the 256 code points in the Tibetan block).=head2 Code Point ArgumentsA I<code point argument> is either a decimal or a hexadecimal scalardesignating a Unicode character, or C<U+> followed by hexadecimalsdesignating a Unicode character.  In other words, if you want a codepoint to be interpreted as a hexadecimal number, you must prefix itwith either C<0x> or C<U+>, because a string like e.g. C<123> willbe interpreted as a decimal code point.  Also note that Unicode isB<not> limited to 16 bits (the number of Unicode characters isopen-ended, in theory unlimited): you may have more than 4 hexdigits.=head2 charinrangeIn addition to using the C<\p{In...}> and C<\P{In...}> constructs, youcan also test whether a code point is in the I<range> as returned byL</charblock> and L</charscript> or as the values of the hash returnedby L</charblocks> and L</charscripts> by using charinrange():    use Unicode::UCD qw(charscript charinrange);    $range = charscript('Hiragana');    print "looks like hiragana\n" if charinrange($range, $codepoint);=cutmy %GENERAL_CATEGORIES = (    'L'  =>         'Letter',    'LC' =>         'CasedLetter',    'Lu' =>         'UppercaseLetter',    'Ll' =>         'LowercaseLetter',    'Lt' =>         'TitlecaseLetter',    'Lm' =>         'ModifierLetter',    'Lo' =>         'OtherLetter',    'M'  =>         'Mark',    'Mn' =>         'NonspacingMark',    'Mc' =>         'SpacingMark',    'Me' =>         'EnclosingMark',    'N'  =>         'Number',    'Nd' =>         'DecimalNumber',    'Nl' =>         'LetterNumber',    'No' =>         'OtherNumber',    'P'  =>         'Punctuation',    'Pc' =>         'ConnectorPunctuation',    'Pd' =>         'DashPunctuation',    'Ps' =>         'OpenPunctuation',    'Pe' =>         'ClosePunctuation',    'Pi' =>         'InitialPunctuation',    'Pf' =>         'FinalPunctuation',    'Po' =>         'OtherPunctuation',    'S'  =>         'Symbol',    'Sm' =>         'MathSymbol',    'Sc' =>         'CurrencySymbol',    'Sk' =>         'ModifierSymbol',    'So' =>         'OtherSymbol',    'Z'  =>         'Separator',    'Zs' =>         'SpaceSeparator',    'Zl' =>         'LineSeparator',    'Zp' =>         'ParagraphSeparator',    'C'  =>         'Other',    'Cc' =>         'Control',    'Cf' =>         'Format',    'Cs' =>         'Surrogate',    'Co' =>         'PrivateUse',    'Cn' =>         'Unassigned', );sub general_categories {    return dclone \%GENERAL_CATEGORIES;}=head2 general_categories    use Unicode::UCD 'general_categories';    my $categories = general_categories();The general_categories() returns a reference to a hash which has shortgeneral category names (such as C<Lu>, C<Nd>, C<Zs>, C<S>) as keys and longnames (such as C<UppercaseLetter>, C<DecimalNumber>, C<SpaceSeparator>,C<Symbol>) as values.  The hash is reversible in case you need to gofrom the long names to the short names.  The general category is theone returned from charinfo() under the C<category> key.=cutmy %BIDI_TYPES = (   'L'   => 'Left-to-Right',   'LRE' => 'Left-to-Right Embedding',   'LRO' => 'Left-to-Right Override',   'R'   => 'Right-to-Left',   'AL'  => 'Right-to-Left Arabic',   'RLE' => 'Right-to-Left Embedding',   'RLO' => 'Right-to-Left Override',   'PDF' => 'Pop Directional Format',   'EN'  => 'European Number',   'ES'  => 'European Number Separator',   'ET'  => 'European Number Terminator',   'AN'  => 'Arabic Number',   'CS'  => 'Common Number Separator',   'NSM' => 'Non-Spacing Mark',   'BN'  => 'Boundary Neutral',   'B'   => 'Paragraph Separator',   'S'   => 'Segment Separator',   'WS'  => 'Whitespace',   'ON'  => 'Other Neutrals', ); sub bidi_types {    return dclone \%BIDI_TYPES;}=head2 bidi_types    use Unicode::UCD 'bidi_types';    my $categories = bidi_types();The bidi_types() returns a reference to a hash which has the shortbidi (bidirectional) type names (such as C<L>, C<R>) as keys and longnames (such as C<Left-to-Right>, C<Right-to-Left>) as values.  Thehash is reversible in case you need to go from the long names to theshort names.  The bidi type is the one returned from charinfo()under the C<bidi> key.  For the exact meaning of the various bidi classesthe Unicode TR9 is recommended reading:http://www.unicode.org/reports/tr9/tr9-17.html(as of Unicode 5.0.0)=cut=head2 compexcl    use Unicode::UCD 'compexcl';    my $compexcl = compexcl("09dc");The compexcl() returns the composition exclusion (that is, if thecharacter should not be produced during a precomposition) of the character specified by a B<code point argument>.If there is a composition exclusion for the character, true isreturned.  Otherwise, false is returned.=cutmy %COMPEXCL;sub _compexcl {    unless (%COMPEXCL) {	if (openunicode(\$COMPEXCLFH, "CompositionExclusions.txt")) {	    local $_;	    while (<$COMPEXCLFH>) {		if (/^([0-9A-F]+)\s+\#\s+/) {		    my $code = hex($1);		    $COMPEXCL{$code} = undef;		}	    }	    close($COMPEXCLFH);	}    }}sub compexcl {    my $arg  = shift;    my $code = _getcode($arg);    croak __PACKAGE__, "::compexcl: unknown code '$arg'"	unless defined $code;    _compexcl() unless %COMPEXCL;    return exists $COMPEXCL{$code};}=head2 casefold    use Unicode::UCD 'casefold';    my $casefold = casefold("00DF");The casefold() returns the locale-independent case folding of thecharacter specified by a B<code point argument>.If there is a case folding for that character, a reference to a hashwith the following fields is returned:    key    code             code point with at least four hexdigits    status           "C", "F", "S", or "I"    mapping          one or more codes separated by spacesThe meaning of the I<status> is as follows:   C                 common case folding, common mappings shared                     by both simple and full mappings   F                 full case folding, mappings that cause strings                     to grow in length. Multiple characters are separated                     by spaces   S                 simple case folding, mappings to single characters                     where different from F   I                 special case for dotted uppercase I and                     dotless lowercase i                     - If this mapping is included, the result is                       case-insensitive, but dotless and dotted I's                       are not distinguished                     - If this mapping is excluded, the result is not                       fully case-insensitive, but dotless and dotted                       I's are distinguishedIf there is no case folding for that character, C<undef> is returned.For more information about case mappings seehttp://www.unicode.org/unicode/reports/tr21/=cutmy %CASEFOLD;sub _casefold {    unless (%CASEFOLD) {	if (openunicode(\$CASEFOLDFH, "CaseFolding.txt")) {	    local $_;	    while (<$CASEFOLDFH>) {		if (/^([0-9A-F]+); ([CFSI]); ([0-9A-F]+(?: [0-9A-F]+)*);/) {		    my $code = hex($1);		    $CASEFOLD{$code} = { code    => $1,					 status  => $2,					 mapping => $3 };		}	    }	    close($CASEFOLDFH);	}    }}sub casefold {    my $arg  = shift;    my $code = _getcode($arg);    croak __PACKAGE__, "::casefold: unknown code '$arg'"	unless defined $code;    _casefold() unless %CASEFOLD;    return $CASEFOLD{$code};}=head2 casespec    use Unicode::UCD 'casespec';    my $casespec = casespec("FB00");The casespec() returns the potentially locale-dependent case mappingof the character specified by a B<code point argument>.  The mappingmay change the length of the string (which the basic Unicode casemappings as returned by charinfo() never do).If there is a case folding for that character, a reference to a hashwith the following fields is returned:    key    code             code point with at least four hexdigits    lower            lowercase    title            titlecase    upper            uppercase    condition        condition list (may be undef)The C<condition> is optional.  Where present, it consists of one ormore I<locales> or I<contexts>, separated by spaces (other than asused to separate elements, spaces are to be ignored).  A conditionlist overrides the normal behavior if all of the listed conditions aretrue.  Case distinctions in the condition list are not significant.Conditions preceded by "NON_" represent the negation of the condition.Note that when there are multiple case folding definitions for asingle code point because of different locales, the value returned bycasespec() is a hash reference which has the locales as the keys andhash references as described above as the values.A I<locale> is defined as a 2-letter ISO 3166 country code, possiblyfollowed by a "_" and a 2-letter ISO language code (possibly followedby a "_" and a variant code).  You can find the lists of those codes,see L<Locale::Country> and L<Locale::Language>.A I<context> is one of the following choices:    FINAL            The letter is not followed by a letter of                     general category L (e.g. Ll, Lt, Lu, Lm, or Lo)    MODERN           The mapping is only used for modern text    AFTER_i          The last base character was "i" (U+0069)For more information about case mappings seehttp://www.unicode.org/unicode/reports/tr21/=cutmy %CASESPEC;sub _casespec {    unless (%CASESPEC) {	if (openunicode(\$CASESPECFH, "SpecialCasing.txt")) {	    local $_;	    while (<$CASESPECFH>) {		if (/^([0-9A-F]+); ([0-9A-F]+(?: [0-9A-F]+)*)?; ([0-9A-F]+(?: [0-9A-F]+)*)?; ([0-9A-F]+(?: [0-9A-F]+)*)?; (\w+(?: \w+)*)?/) {		    my ($hexcode, $lower, $title, $upper, $condition) =			($1, $2, $3, $4, $5);		    my $code = hex($hexcode);		    if (exists $CASESPEC{$code}) {			if (exists $CASESPEC{$code}->{code}) {			    my ($oldlower,				$oldtitle,				$oldupper,				$oldcondition) =				    @{$CASESPEC{$code}}{qw(lower							   title							   upper							   condition)};			    if (defined $oldcondition) {				my ($oldlocale) =				($oldcondition =~ /^([a-z][a-z](?:_\S+)?)/);				delete $CASESPEC{$code};				$CASESPEC{$code}->{$oldlocale} =				{ code      => $hexcode,				  lower     => $oldlower,				  title     => $oldtitle,				  upper     => $oldupper,				  condition => $oldcondition };			    }			}			my ($locale) =			    ($condition =~ /^([a-z][a-z](?:_\S+)?)/);			$CASESPEC{$code}->{$locale} =			{ code      => $hexcode,			  lower     => $lower,			  title     => $title,			  upper     => $upper,			  condition => $condition };		    } else {			$CASESPEC{$code} =			{ code      => $hexcode,			  lower     => $lower,			  title     => $title,			  upper     => $upper,			  condition => $condition };		    }		}	    }	    close($CASESPECFH);	}    }}sub casespec {    my $arg  = shift;    my $code = _getcode($arg);    croak __PACKAGE__, "::casespec: unknown code '$arg'"	unless defined $code;    _casespec() unless %CASESPEC;    return ref $CASESPEC{$code} ? dclone $CASESPEC{$code} : $CASESPEC{$code};}=head2 namedseq()    use Unicode::UCD 'namedseq';    my $namedseq = namedseq("KATAKANA LETTER AINU P");    my @namedseq = namedseq("KATAKANA LETTER AINU P");    my %namedseq = namedseq();If used with a single argument in a scalar context, returns the stringconsisting of the code points of the named sequence, or C<undef> if nonamed sequence by that name exists.  If used with a single argument ina list context, returns list of the code points.  If used with noarguments in a list context, returns a hash with the names of thenamed sequences as the keys and the named sequences as strings asthe values.  Otherwise, returns C<undef> or empty list dependingon the context.(New from Unicode 4.1.0)=cutmy %NAMEDSEQ;sub _namedseq {    unless (%NAMEDSEQ) {	if (openunicode(\$NAMEDSEQFH, "NamedSequences.txt")) {	    local $_;	    while (<$NAMEDSEQFH>) {		if (/^(.+)\s*;\s*([0-9A-F]+(?: [0-9A-F]+)*)$/) {		    my ($n, $s) = ($1, $2);		    my @s = map { chr(hex($_)) } split(' ', $s);		    $NAMEDSEQ{$n} = join("", @s);		}	    }	    close($NAMEDSEQFH);	}    }}sub namedseq {    _namedseq() unless %NAMEDSEQ;    my $wantarray = wantarray();    if (defined $wantarray) {	if ($wantarray) {	    if (@_ == 0) {		return %NAMEDSEQ;	    } elsif (@_ == 1) {		my $s = $NAMEDSEQ{ $_[0] };		return defined $s ? map { ord($_) } split('', $s) : ();	    }	} elsif (@_ == 1) {	    return $NAMEDSEQ{ $_[0] };	}    }    return;}=head2 Unicode::UCD::UnicodeVersionUnicode::UCD::UnicodeVersion() returns the version of the UnicodeCharacter Database, in other words, the version of the Unicodestandard the database implements.  The version is a stringof numbers delimited by dots (C<'.'>).=cutmy $UNICODEVERSION;sub UnicodeVersion {    unless (defined $UNICODEVERSION) {	openunicode(\$VERSIONFH, "version");	chomp($UNICODEVERSION = <$VERSIONFH>);	close($VERSIONFH);	croak __PACKAGE__, "::VERSION: strange version '$UNICODEVERSION'"	    unless $UNICODEVERSION =~ /^\d+(?:\.\d+)+$/;    }    return $UNICODEVERSION;}=head2 Implementation NoteThe first use of charinfo() opens a read-only filehandle to the UnicodeCharacter Database (the database is included in the Perl distribution).The filehandle is then kept open for further queries.  In other words,if you are wondering where one of your filehandles went, that's where.=head1 BUGSDoes not yet support EBCDIC platforms.=head1 AUTHORJarkko Hietaniemi=cut1;
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -