perlunicode.pod
来自「视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.」· POD 代码 · 共 1,623 行 · 第 1/4 页
POD
1,623 行
=item ScriptsThe script names which can be used by C<\p{...}> and C<\P{...}>,such as in C<\p{Latin}> or C<\p{Cyrillic}>, are as follows: Arabic Armenian Balinese Bengali Bopomofo Braille Buginese Buhid CanadianAboriginal Cherokee Coptic Cuneiform Cypriot Cyrillic Deseret Devanagari Ethiopic Georgian Glagolitic Gothic Greek Gujarati Gurmukhi Han Hangul Hanunoo Hebrew Hiragana Inherited Kannada Katakana Kharoshthi Khmer Lao Latin Limbu LinearB Malayalam Mongolian Myanmar NewTaiLue Nko Ogham OldItalic OldPersian Oriya Osmanya PhagsPa Phoenician Runic Shavian Sinhala SylotiNagri Syriac Tagalog Tagbanwa TaiLe Tamil Telugu Thaana Thai Tibetan Tifinagh Ugaritic Yi=item Extended property classesExtended property classes can supplement the basicproperties, defined by the F<PropList> Unicode database: ASCIIHexDigit BidiControl Dash Deprecated Diacritic Extender HexDigit Hyphen Ideographic IDSBinaryOperator IDSTrinaryOperator JoinControl LogicalOrderException NoncharacterCodePoint OtherAlphabetic OtherDefaultIgnorableCodePoint OtherGraphemeExtend OtherIDStart OtherIDContinue OtherLowercase OtherMath OtherUppercase PatternSyntax PatternWhiteSpace QuotationMark Radical SoftDotted STerm TerminalPunctuation UnifiedIdeograph VariationSelector WhiteSpaceand there are further derived properties: Alphabetic = Lu + Ll + Lt + Lm + Lo + Nl + OtherAlphabetic Lowercase = Ll + OtherLowercase Uppercase = Lu + OtherUppercase Math = Sm + OtherMath IDStart = Lu + Ll + Lt + Lm + Lo + Nl + OtherIDStart IDContinue = IDStart + Mn + Mc + Nd + Pc + OtherIDContinue DefaultIgnorableCodePoint = OtherDefaultIgnorableCodePoint + Cf + Cc + Cs + Noncharacters + VariationSelector - WhiteSpace - FFF9..FFFB (Annotation Characters) Any = Any code points (i.e. U+0000 to U+10FFFF) Assigned = Any non-Cn code points (i.e. synonym for \P{Cn}) Unassigned = Synonym for \p{Cn} ASCII = ASCII (i.e. U+0000 to U+007F) Common = Any character (or unassigned code point) not explicitly assigned to a script=item Use of "Is" PrefixFor backward compatibility (with Perl 5.6), all properties mentionedso far may have C<Is> prepended to their name, so C<\P{IsLu}>, forexample, is equal to C<\P{Lu}>.=item BlocksIn addition to B<scripts>, Unicode also defines B<blocks> ofcharacters. The difference between scripts and blocks is that theconcept of scripts is closer to natural languages, while the conceptof blocks is more of an artificial grouping based on groups of 256Unicode characters. For example, the C<Latin> script contains lettersfrom many blocks but does not contain all the characters from thoseblocks. It does not, for example, contain digits, because digits areshared across many scripts. Digits and similar groups, likepunctuation, are in a category called C<Common>.For more about scripts, see the UAX#24 "Script Names": http://www.unicode.org/reports/tr24/For more about blocks, see: http://www.unicode.org/Public/UNIDATA/Blocks.txtBlock names are given with the C<In> prefix. For example, theKatakana block is referenced via C<\p{InKatakana}>. The C<In>prefix may be omitted if there is no naming conflict with a scriptor any other property, but it is recommended that C<In> always be usedfor block tests to avoid confusion.These block names are supported: InAegeanNumbers InAlphabeticPresentationForms InAncientGreekMusicalNotation InAncientGreekNumbers InArabic InArabicPresentationFormsA InArabicPresentationFormsB InArabicSupplement InArmenian InArrows InBalinese InBasicLatin InBengali InBlockElements InBopomofo InBopomofoExtended InBoxDrawing InBraillePatterns InBuginese InBuhid InByzantineMusicalSymbols InCJKCompatibility InCJKCompatibilityForms InCJKCompatibilityIdeographs InCJKCompatibilityIdeographsSupplement InCJKRadicalsSupplement InCJKStrokes InCJKSymbolsAndPunctuation InCJKUnifiedIdeographs InCJKUnifiedIdeographsExtensionA InCJKUnifiedIdeographsExtensionB InCherokee InCombiningDiacriticalMarks InCombiningDiacriticalMarksSupplement InCombiningDiacriticalMarksforSymbols InCombiningHalfMarks InControlPictures InCoptic InCountingRodNumerals InCuneiform InCuneiformNumbersAndPunctuation InCurrencySymbols InCypriotSyllabary InCyrillic InCyrillicSupplement InDeseret InDevanagari InDingbats InEnclosedAlphanumerics InEnclosedCJKLettersAndMonths InEthiopic InEthiopicExtended InEthiopicSupplement InGeneralPunctuation InGeometricShapes InGeorgian InGeorgianSupplement InGlagolitic InGothic InGreekExtended InGreekAndCoptic InGujarati InGurmukhi InHalfwidthAndFullwidthForms InHangulCompatibilityJamo InHangulJamo InHangulSyllables InHanunoo InHebrew InHighPrivateUseSurrogates InHighSurrogates InHiragana InIPAExtensions InIdeographicDescriptionCharacters InKanbun InKangxiRadicals InKannada InKatakana InKatakanaPhoneticExtensions InKharoshthi InKhmer InKhmerSymbols InLao InLatin1Supplement InLatinExtendedA InLatinExtendedAdditional InLatinExtendedB InLatinExtendedC InLatinExtendedD InLetterlikeSymbols InLimbu InLinearBIdeograms InLinearBSyllabary InLowSurrogates InMalayalam InMathematicalAlphanumericSymbols InMathematicalOperators InMiscellaneousMathematicalSymbolsA InMiscellaneousMathematicalSymbolsB InMiscellaneousSymbols InMiscellaneousSymbolsAndArrows InMiscellaneousTechnical InModifierToneLetters InMongolian InMusicalSymbols InMyanmar InNKo InNewTaiLue InNumberForms InOgham InOldItalic InOldPersian InOpticalCharacterRecognition InOriya InOsmanya InPhagspa InPhoenician InPhoneticExtensions InPhoneticExtensionsSupplement InPrivateUseArea InRunic InShavian InSinhala InSmallFormVariants InSpacingModifierLetters InSpecials InSuperscriptsAndSubscripts InSupplementalArrowsA InSupplementalArrowsB InSupplementalMathematicalOperators InSupplementalPunctuation InSupplementaryPrivateUseAreaA InSupplementaryPrivateUseAreaB InSylotiNagri InSyriac InTagalog InTagbanwa InTags InTaiLe InTaiXuanJingSymbols InTamil InTelugu InThaana InThai InTibetan InTifinagh InUgaritic InUnifiedCanadianAboriginalSyllabics InVariationSelectors InVariationSelectorsSupplement InVerticalForms InYiRadicals InYiSyllables InYijingHexagramSymbols=back=head2 User-Defined Character PropertiesYou can define your own character properties by defining subroutineswhose names begin with "In" or "Is". The subroutines can be defined inany package. The user-defined properties can be used in the regularexpression C<\p> and C<\P> constructs; if you are using a user-definedproperty from a package other than the one you are in, you must specifyits package in the C<\p> or C<\P> construct. # assuming property IsForeign defined in Lang:: package main; # property package name required if ($txt =~ /\p{Lang::IsForeign}+/) { ... } package Lang; # property package name not required if ($txt =~ /\p{IsForeign}+/) { ... }Note that the effect is compile-time and immutable once defined.The subroutines must return a specially-formatted string, with oneor more newline-separated lines. Each line must be one of the following:=over 4=item *A single hexadecimal number denoting a Unicode code point to include.=item *Two hexadecimal numbers separated by horizontal whitespace (space ortabular characters) denoting a range of Unicode code points to include.=item *Something to include, prefixed by "+": a built-in characterproperty (prefixed by "utf8::") or a user-defined character property,to represent all the characters in that property; two hexadecimal codepoints for a range; or a single hexadecimal code point.=item *Something to exclude, prefixed by "-": an existing characterproperty (prefixed by "utf8::") or a user-defined character property,to represent all the characters in that property; two hexadecimal codepoints for a range; or a single hexadecimal code point.=item *Something to negate, prefixed "!": an existing characterproperty (prefixed by "utf8::") or a user-defined character property,to represent all the characters in that property; two hexadecimal codepoints for a range; or a single hexadecimal code point.=item *Something to intersect with, prefixed by "&": an existing characterproperty (prefixed by "utf8::") or a user-defined character property,for all the characters except the characters in the property; twohexadecimal code points for a range; or a single hexadecimal code point.=backFor example, to define a property that covers both the Japanesesyllabaries (hiragana and katakana), you can define sub InKana { return <<END; 3040\t309F 30A0\t30FF END }Imagine that the here-doc end marker is at the beginning of the line.Now you can use C<\p{InKana}> and C<\P{InKana}>.You could also have used the existing block property names: sub InKana { return <<'END'; +utf8::InHiragana +utf8::InKatakana
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?