perlunicode.pod

来自「视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.」· POD 代码 · 共 1,623 行 · 第 1/4 页

POD
1,623
字号
=item ScriptsThe script names which can be used by C<\p{...}> and C<\P{...}>,such as in C<\p{Latin}> or C<\p{Cyrillic}>, are as follows:    Arabic    Armenian    Balinese    Bengali    Bopomofo    Braille    Buginese    Buhid    CanadianAboriginal    Cherokee    Coptic    Cuneiform    Cypriot    Cyrillic    Deseret    Devanagari    Ethiopic    Georgian    Glagolitic    Gothic    Greek    Gujarati    Gurmukhi    Han    Hangul    Hanunoo    Hebrew    Hiragana    Inherited    Kannada    Katakana    Kharoshthi    Khmer    Lao    Latin    Limbu    LinearB    Malayalam    Mongolian    Myanmar    NewTaiLue    Nko    Ogham    OldItalic    OldPersian    Oriya    Osmanya    PhagsPa    Phoenician    Runic    Shavian    Sinhala    SylotiNagri    Syriac    Tagalog    Tagbanwa    TaiLe    Tamil    Telugu    Thaana    Thai    Tibetan    Tifinagh    Ugaritic    Yi=item Extended property classesExtended property classes can supplement the basicproperties, defined by the F<PropList> Unicode database:    ASCIIHexDigit    BidiControl    Dash    Deprecated    Diacritic    Extender    HexDigit    Hyphen    Ideographic    IDSBinaryOperator    IDSTrinaryOperator    JoinControl    LogicalOrderException    NoncharacterCodePoint    OtherAlphabetic    OtherDefaultIgnorableCodePoint    OtherGraphemeExtend    OtherIDStart    OtherIDContinue    OtherLowercase    OtherMath    OtherUppercase    PatternSyntax    PatternWhiteSpace    QuotationMark    Radical    SoftDotted    STerm    TerminalPunctuation    UnifiedIdeograph    VariationSelector    WhiteSpaceand there are further derived properties:    Alphabetic  =  Lu + Ll + Lt + Lm + Lo + Nl + OtherAlphabetic    Lowercase   =  Ll + OtherLowercase    Uppercase   =  Lu + OtherUppercase    Math        =  Sm + OtherMath    IDStart     =  Lu + Ll + Lt + Lm + Lo + Nl + OtherIDStart    IDContinue  =  IDStart + Mn + Mc + Nd + Pc + OtherIDContinue    DefaultIgnorableCodePoint                =  OtherDefaultIgnorableCodePoint                   + Cf + Cc + Cs + Noncharacters + VariationSelector                   - WhiteSpace - FFF9..FFFB (Annotation Characters)    Any         =  Any code points (i.e. U+0000 to U+10FFFF)    Assigned    =  Any non-Cn code points (i.e. synonym for \P{Cn})    Unassigned  =  Synonym for \p{Cn}    ASCII       =  ASCII (i.e. U+0000 to U+007F)    Common      =  Any character (or unassigned code point)                   not explicitly assigned to a script=item Use of "Is" PrefixFor backward compatibility (with Perl 5.6), all properties mentionedso far may have C<Is> prepended to their name, so C<\P{IsLu}>, forexample, is equal to C<\P{Lu}>.=item BlocksIn addition to B<scripts>, Unicode also defines B<blocks> ofcharacters.  The difference between scripts and blocks is that theconcept of scripts is closer to natural languages, while the conceptof blocks is more of an artificial grouping based on groups of 256Unicode characters. For example, the C<Latin> script contains lettersfrom many blocks but does not contain all the characters from thoseblocks. It does not, for example, contain digits, because digits areshared across many scripts. Digits and similar groups, likepunctuation, are in a category called C<Common>.For more about scripts, see the UAX#24 "Script Names":   http://www.unicode.org/reports/tr24/For more about blocks, see:   http://www.unicode.org/Public/UNIDATA/Blocks.txtBlock names are given with the C<In> prefix. For example, theKatakana block is referenced via C<\p{InKatakana}>.  The C<In>prefix may be omitted if there is no naming conflict with a scriptor any other property, but it is recommended that C<In> always be usedfor block tests to avoid confusion.These block names are supported:    InAegeanNumbers    InAlphabeticPresentationForms    InAncientGreekMusicalNotation    InAncientGreekNumbers    InArabic    InArabicPresentationFormsA    InArabicPresentationFormsB    InArabicSupplement    InArmenian    InArrows    InBalinese    InBasicLatin    InBengali    InBlockElements    InBopomofo    InBopomofoExtended    InBoxDrawing    InBraillePatterns    InBuginese    InBuhid    InByzantineMusicalSymbols    InCJKCompatibility    InCJKCompatibilityForms    InCJKCompatibilityIdeographs    InCJKCompatibilityIdeographsSupplement    InCJKRadicalsSupplement    InCJKStrokes    InCJKSymbolsAndPunctuation    InCJKUnifiedIdeographs    InCJKUnifiedIdeographsExtensionA    InCJKUnifiedIdeographsExtensionB    InCherokee    InCombiningDiacriticalMarks    InCombiningDiacriticalMarksSupplement    InCombiningDiacriticalMarksforSymbols    InCombiningHalfMarks    InControlPictures    InCoptic    InCountingRodNumerals    InCuneiform    InCuneiformNumbersAndPunctuation    InCurrencySymbols    InCypriotSyllabary    InCyrillic    InCyrillicSupplement    InDeseret    InDevanagari    InDingbats    InEnclosedAlphanumerics    InEnclosedCJKLettersAndMonths    InEthiopic    InEthiopicExtended    InEthiopicSupplement    InGeneralPunctuation    InGeometricShapes    InGeorgian    InGeorgianSupplement    InGlagolitic    InGothic    InGreekExtended    InGreekAndCoptic    InGujarati    InGurmukhi    InHalfwidthAndFullwidthForms    InHangulCompatibilityJamo    InHangulJamo    InHangulSyllables    InHanunoo    InHebrew    InHighPrivateUseSurrogates    InHighSurrogates    InHiragana    InIPAExtensions    InIdeographicDescriptionCharacters    InKanbun    InKangxiRadicals    InKannada    InKatakana    InKatakanaPhoneticExtensions    InKharoshthi    InKhmer    InKhmerSymbols    InLao    InLatin1Supplement    InLatinExtendedA    InLatinExtendedAdditional    InLatinExtendedB    InLatinExtendedC    InLatinExtendedD    InLetterlikeSymbols    InLimbu    InLinearBIdeograms    InLinearBSyllabary    InLowSurrogates    InMalayalam    InMathematicalAlphanumericSymbols    InMathematicalOperators    InMiscellaneousMathematicalSymbolsA    InMiscellaneousMathematicalSymbolsB    InMiscellaneousSymbols    InMiscellaneousSymbolsAndArrows    InMiscellaneousTechnical    InModifierToneLetters    InMongolian    InMusicalSymbols    InMyanmar    InNKo    InNewTaiLue    InNumberForms    InOgham    InOldItalic    InOldPersian    InOpticalCharacterRecognition    InOriya    InOsmanya    InPhagspa    InPhoenician    InPhoneticExtensions    InPhoneticExtensionsSupplement    InPrivateUseArea    InRunic    InShavian    InSinhala    InSmallFormVariants    InSpacingModifierLetters    InSpecials    InSuperscriptsAndSubscripts    InSupplementalArrowsA    InSupplementalArrowsB    InSupplementalMathematicalOperators    InSupplementalPunctuation    InSupplementaryPrivateUseAreaA    InSupplementaryPrivateUseAreaB    InSylotiNagri    InSyriac    InTagalog    InTagbanwa    InTags    InTaiLe    InTaiXuanJingSymbols    InTamil    InTelugu    InThaana    InThai    InTibetan    InTifinagh    InUgaritic    InUnifiedCanadianAboriginalSyllabics    InVariationSelectors    InVariationSelectorsSupplement    InVerticalForms    InYiRadicals    InYiSyllables    InYijingHexagramSymbols=back=head2 User-Defined Character PropertiesYou can define your own character properties by defining subroutineswhose names begin with "In" or "Is".  The subroutines can be defined inany package.  The user-defined properties can be used in the regularexpression C<\p> and C<\P> constructs; if you are using a user-definedproperty from a package other than the one you are in, you must specifyits package in the C<\p> or C<\P> construct.    # assuming property IsForeign defined in Lang::    package main;  # property package name required    if ($txt =~ /\p{Lang::IsForeign}+/) { ... }    package Lang;  # property package name not required    if ($txt =~ /\p{IsForeign}+/) { ... }Note that the effect is compile-time and immutable once defined.The subroutines must return a specially-formatted string, with oneor more newline-separated lines.  Each line must be one of the following:=over 4=item *A single hexadecimal number denoting a Unicode code point to include.=item *Two hexadecimal numbers separated by horizontal whitespace (space ortabular characters) denoting a range of Unicode code points to include.=item *Something to include, prefixed by "+": a built-in characterproperty (prefixed by "utf8::") or a user-defined character property,to represent all the characters in that property; two hexadecimal codepoints for a range; or a single hexadecimal code point.=item *Something to exclude, prefixed by "-": an existing characterproperty (prefixed by "utf8::") or a user-defined character property,to represent all the characters in that property; two hexadecimal codepoints for a range; or a single hexadecimal code point.=item *Something to negate, prefixed "!": an existing characterproperty (prefixed by "utf8::") or a user-defined character property,to represent all the characters in that property; two hexadecimal codepoints for a range; or a single hexadecimal code point.=item *Something to intersect with, prefixed by "&": an existing characterproperty (prefixed by "utf8::") or a user-defined character property,for all the characters except the characters in the property; twohexadecimal code points for a range; or a single hexadecimal code point.=backFor example, to define a property that covers both the Japanesesyllabaries (hiragana and katakana), you can define    sub InKana {	return <<END;    3040\t309F    30A0\t30FF    END    }Imagine that the here-doc end marker is at the beginning of the line.Now you can use C<\p{InKana}> and C<\P{InKana}>.You could also have used the existing block property names:    sub InKana {	return <<'END';    +utf8::InHiragana    +utf8::InKatakana

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?