📄 perlrecharclass.1
字号:
.\" Automatically generated by Pod::Man 2.16 (Pod::Simple 3.05).\".\" Standard preamble:.\" ========================================================================.de Sh \" Subsection heading.br.if t .Sp.ne 5.PP\fB\\$1\fR.PP...de Sp \" Vertical space (when we can't use .PP).if t .sp .5v.if n .sp...de Vb \" Begin verbatim text.ft CW.nf.ne \\$1...de Ve \" End verbatim text.ft R.fi...\" Set up some character translations and predefined strings. \*(-- will.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left.\" double quote, and \*(R" will give a right double quote. \*(C+ will.\" give a nicer C++. Capital omega is used to do unbreakable dashes and.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,.\" nothing in troff, for use with C<>..tr \(*W-.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'.ie n \{\. ds -- \(*W-. ds PI pi. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch. ds L" "". ds R" "". ds C` "". ds C' ""'br\}.el\{\. ds -- \|\(em\|. ds PI \(*p. ds L" ``. ds R" '''br\}.\".\" Escape single quotes in literal strings from groff's Unicode transform..ie \n(.g .ds Aq \(aq.el .ds Aq '.\".\" If the F register is turned on, we'll generate index entries on stderr for.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index.\" entries marked with X<> in POD. Of course, you'll have to process the.\" output yourself in some meaningful fashion..ie \nF \{\. de IX. tm Index:\\$1\t\\n%\t"\\$2"... nr % 0. rr F.\}.el \{\. de IX...\}.\".\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2)..\" Fear. Run. Save yourself. No user-serviceable parts.. \" fudge factors for nroff and troff.if n \{\. ds #H 0. ds #V .8m. ds #F .3m. ds #[ \f1. ds #] \fP.\}.if t \{\. ds #H ((1u-(\\\\n(.fu%2u))*.13m). ds #V .6m. ds #F 0. ds #[ \&. ds #] \&.\}. \" simple accents for nroff and troff.if n \{\. ds ' \&. ds ` \&. ds ^ \&. ds , \&. ds ~ ~. ds /.\}.if t \{\. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u". ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'.\}. \" troff and (daisy-wheel) nroff accents.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'.ds 8 \h'\*(#H'\(*b\h'-\*(#H'.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#].ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#].ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#].ds ae a\h'-(\w'a'u*4/10)'e.ds Ae A\h'-(\w'A'u*4/10)'E. \" corrections for vroff.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'. \" for low resolution devices (crt and lpr).if \n(.H>23 .if \n(.V>19 \\{\. ds : e. ds 8 ss. ds o a. ds d- d\h'-1'\(ga. ds D- D\h'-1'\(hy. ds th \o'bp'. ds Th \o'LP'. ds ae ae. ds Ae AE.\}.rm #[ #] #H #V #F C.\" ========================================================================.\".IX Title "PERLRECHARCLASS 1".TH PERLRECHARCLASS 1 "2007-12-18" "perl v5.10.0" "Perl Programmers Reference Guide".\" For nroff, turn off justification. Always turn off hyphenation; it makes.\" way too many mistakes in technical documents..if n .ad l.nh.SH "NAME"perlrecharclass \- Perl Regular Expression Character Classes.SH "DESCRIPTION".IX Header "DESCRIPTION"The top level documentation about Perl regular expressionsis found in perlre..PPThis manual page discusses the syntax and use of characterclasses in Perl Regular Expressions..PPA character class is a way of denoting a set of characters,in such a way that one character of the set is matched.It's important to remember that matching a character classconsumes exactly one character in the source string. (The sourcestring is the string the regular expression is matched against.).PPThere are three types of character classes in Perl regularexpressions: the dot, backslashed sequences, and the bracketed form..Sh "The dot".IX Subsection "The dot"The dot (or period), \f(CW\*(C`.\*(C'\fR is probably the most used, and certainlythe most well-known character class. By default, a dot matches anycharacter, except for the newline. The default can be changed toadd matching the newline with the \fIsingle line\fR modifier: eitherfor the entire regular expression using the \f(CW\*(C`/s\*(C'\fR modifier, orlocally using \f(CW\*(C`(?s)\*(C'\fR..PPHere are some examples:.PP.Vb 7\& "a" =~ /./ # Match\& "." =~ /./ # Match\& "" =~ /./ # No match (dot has to match a character)\& "\en" =~ /./ # No match (dot does not match a newline)\& "\en" =~ /./s # Match (global \*(Aqsingle line\*(Aq modifier)\& "\en" =~ /(?s:.)/ # Match (local \*(Aqsingle line\*(Aq modifier)\& "ab" =~ /^.$/ # No match (dot matches one character).Ve.Sh "Backslashed sequences".IX Subsection "Backslashed sequences"Perl regular expressions contain many backslashed sequences thatconstitute a character class. That is, they will match a singlecharacter, if that character belongs to a specific set of characters(defined by the sequence). A backslashed sequence is a sequence ofcharacters starting with a backslash. Not all backslashed sequencesare character class; for a full list, see perlrebackslash..PPHere's a list of the backslashed sequences, which are discussed inmore detail below..PP.Vb 12\& \ed Match a digit character.\& \eD Match a non\-digit character.\& \ew Match a "word" character.\& \eW Match a non\-"word" character.\& \es Match a white space character.\& \eS Match a non\-white space character.\& \eh Match a horizontal white space character.\& \eH Match a character that isn\*(Aqt horizontal white space.\& \ev Match a vertical white space character.\& \eV Match a character that isn\*(Aqt vertical white space.\& \epP, \ep{Prop} Match a character matching a Unicode property.\& \ePP, \eP{Prop} Match a character that doesn\*(Aqt match a Unicode property..Ve.PP\fIDigits\fR.IX Subsection "Digits".PP\&\f(CW\*(C`\ed\*(C'\fR matches a single character that is considered to be a \fIdigit\fR.What is considered a digit depends on the internal encoding ofthe source string. If the source string is in \s-1UTF\-8\s0 format, \f(CW\*(C`\ed\*(C'\fRnot only matches the digits '0' \- '9', but also Arabic, Devanagari anddigits from other languages. Otherwise, if there is a locale in effect,it will match whatever characters the locale considers digits. Withouta locale, \f(CW\*(C`\ed\*(C'\fR matches the digits '0' to '9'.See \*(L"Locale, Unicode and \s-1UTF\-8\s0\*(R"..PPAny character that isn't matched by \f(CW\*(C`\ed\*(C'\fR will be matched by \f(CW\*(C`\eD\*(C'\fR..PP\fIWord characters\fR.IX Subsection "Word characters".PP\&\f(CW\*(C`\ew\*(C'\fR matches a single \fIword\fR character: an alphanumeric character(that is, an alphabetic character, or a digit), or the underscore (\f(CW\*(C`_\*(C'\fR).What is considered a word character depends on the internal encodingof the string. If it's in \s-1UTF\-8\s0 format, \f(CW\*(C`\ew\*(C'\fR matches those charactersthat are considered word characters in the Unicode database. That is, itnot only matches \s-1ASCII\s0 letters, but also Thai letters, Greek letters, etc.If the source string isn't in \s-1UTF\-8\s0 format, \f(CW\*(C`\ew\*(C'\fR matches those charactersthat are considered word characters by the current locale. Withouta locale in effect, \f(CW\*(C`\ew\*(C'\fR matches the \s-1ASCII\s0 letters, digits and theunderscore..PPAny character that isn't matched by \f(CW\*(C`\ew\*(C'\fR will be matched by \f(CW\*(C`\eW\*(C'\fR..PP\fIWhite space\fR.IX Subsection "White space".PP\&\f(CW\*(C`\es\*(C'\fR matches any single character that is consider white space. In the\&\s-1ASCII\s0 range, \f(CW\*(C`\es\*(C'\fR matches the horizontal tab (\f(CW\*(C`\et\*(C'\fR), the new line(\f(CW\*(C`\en\*(C'\fR), the form feed (\f(CW\*(C`\ef\*(C'\fR), the carriage return (\f(CW\*(C`\er\*(C'\fR), and thespace (the vertical tab, \f(CW\*(C`\ecK\*(C'\fR is not matched by \f(CW\*(C`\es\*(C'\fR). The exact setof characters matched by \f(CW\*(C`\es\*(C'\fR depends on whether the source string isin \s-1UTF\-8\s0 format. If it is, \f(CW\*(C`\es\*(C'\fR matches what is considered white spacein the Unicode database. Otherwise, if there is a locale in effect, \f(CW\*(C`\es\*(C'\fRmatches whatever is considered white space by the current locale. Withouta locale, \f(CW\*(C`\es\*(C'\fR matches the five characters mentioned in the beginningof this paragraph. Perhaps the most notable difference is that \f(CW\*(C`\es\*(C'\fRmatches a non-breaking space only if the non-breaking space is in a\&\s-1UTF\-8\s0 encoded string..PPAny character that isn't matched by \f(CW\*(C`\es\*(C'\fR will be matched by \f(CW\*(C`\eS\*(C'\fR..PP\&\f(CW\*(C`\eh\*(C'\fR will match any character that is considered horizontal white space;this includes the space and the tab characters. \f(CW\*(C`\eH\*(C'\fR will match any characterthat is not considered horizontal white space..PP\&\f(CW\*(C`\ev\*(C'\fR will match any character that is considered vertical white space;this includes the carriage return and line feed characters (newline).\&\f(CW\*(C`\eV\*(C'\fR will match any character that is not considered vertical white space..PP\&\f(CW\*(C`\eR\*(C'\fR matches anything that can be considered a newline under Unicoderules. It's not a character class, as it can match a multi-charactersequence. Therefore, it cannot be used inside a bracketed characterclass. Details are discussed in perlrebackslash..PP\&\f(CW\*(C`\eh\*(C'\fR, \f(CW\*(C`\eH\*(C'\fR, \f(CW\*(C`\ev\*(C'\fR, \f(CW\*(C`\eV\*(C'\fR, and \f(CW\*(C`\eR\*(C'\fR are new in perl 5.10.0..PPNote that unlike \f(CW\*(C`\es\*(C'\fR, \f(CW\*(C`\ed\*(C'\fR and \f(CW\*(C`\ew\*(C'\fR, \f(CW\*(C`\eh\*(C'\fR and \f(CW\*(C`\ev\*(C'\fR always matchthe same characters, regardless whether the source string is in \s-1UTF\-8\s0format or not. The set of characters they match is also not influencedby locale..PPOne might think that \f(CW\*(C`\es\*(C'\fR is equivalent with \f(CW\*(C`[\eh\ev]\*(C'\fR. This is not true.The vertical tab (\f(CW"\ex0b"\fR) is not matched by \f(CW\*(C`\es\*(C'\fR, it is howeverconsidered vertical white space. Furthermore, if the source string isnot in \s-1UTF\-8\s0 format, the next line (\f(CW"\ex85"\fR) and the no-break space(\f(CW"\exA0"\fR) are not matched by \f(CW\*(C`\es\*(C'\fR, but are by \f(CW\*(C`\ev\*(C'\fR and \f(CW\*(C`\eh\*(C'\fR respectively.If the source string is in \s-1UTF\-8\s0 format, both the next line and theno-break space are matched by \f(CW\*(C`\es\*(C'\fR..PPThe following table is a complete listing of characters matched by\&\f(CW\*(C`\es\*(C'\fR, \f(CW\*(C`\eh\*(C'\fR and \f(CW\*(C`\ev\*(C'\fR..PPThe first column gives the code point of the character (in hex format),the second column gives the (Unicode) name. The third column indicatesby which class(es) the character is matched..PP.Vb 10\& 0x00009 CHARACTER TABULATION h s\& 0x0000a LINE FEED (LF) vs\& 0x0000b LINE TABULATION v\& 0x0000c FORM FEED (FF) vs\& 0x0000d CARRIAGE RETURN (CR) vs\& 0x00020 SPACE h s\& 0x00085 NEXT LINE (NEL) vs [1]\& 0x000a0 NO\-BREAK SPACE h s [1]\& 0x01680 OGHAM SPACE MARK h s\& 0x0180e MONGOLIAN VOWEL SEPARATOR h s\& 0x02000 EN QUAD h s\& 0x02001 EM QUAD h s\& 0x02002 EN SPACE h s\& 0x02003 EM SPACE h s\& 0x02004 THREE\-PER\-EM SPACE h s\& 0x02005 FOUR\-PER\-EM SPACE h s\& 0x02006 SIX\-PER\-EM SPACE h s\& 0x02007 FIGURE SPACE h s\& 0x02008 PUNCTUATION SPACE h s\& 0x02009 THIN SPACE h s\& 0x0200a HAIR SPACE h s\& 0x02028 LINE SEPARATOR vs\& 0x02029 PARAGRAPH SEPARATOR vs\& 0x0202f NARROW NO\-BREAK SPACE h s\& 0x0205f MEDIUM MATHEMATICAL SPACE h s\& 0x03000 IDEOGRAPHIC SPACE h s.Ve.IP "[1]" 4.IX Item "[1]"\&\s-1NEXT\s0 \s-1LINE\s0 and NO-BREAK \s-1SPACE\s0 only match \f(CW\*(C`\es\*(C'\fR if the source string is in\&\s-1UTF\-8\s0 format..PPIt is worth noting that \f(CW\*(C`\ed\*(C'\fR, \f(CW\*(C`\ew\*(C'\fR, etc, match single characters, notcomplete numbers or words. To match a number (that consists of integers),use \f(CW\*(C`\ed+\*(C'\fR; to match a word, use \f(CW\*(C`\ew+\*(C'\fR..PP\fIUnicode Properties\fR.IX Subsection "Unicode Properties".PP\&\f(CW\*(C`\epP\*(C'\fR and \f(CW\*(C`\ep{Prop}\*(C'\fR are character classes to match characters thatfit given Unicode classes. One letter classes can be used in the \f(CW\*(C`\epP\*(C'\fRform, with the class name following the \f(CW\*(C`\ep\*(C'\fR, otherwise, the propertyname is enclosed in braces, and follows the \f(CW\*(C`\ep\*(C'\fR. For instance, amatch for a number can be written as \f(CW\*(C`/\epN/\*(C'\fR or as \f(CW\*(C`/\ep{Number}/\*(C'\fR.Lowercase letters are matched by the property \fILowercaseLetter\fR whichhas as short form \fILl\fR. They have to be written as \f(CW\*(C`/\ep{Ll}/\*(C'\fR or\&\f(CW\*(C`/\ep{LowercaseLetter}/\*(C'\fR. \f(CW\*(C`/\epLl/\*(C'\fR is valid, but means something different.It matches a two character string: a letter (Unicode property \f(CW\*(C`\epL\*(C'\fR),followed by a lowercase \f(CW\*(C`l\*(C'\fR..PPFor a list of possible properties, see\&\*(L"Unicode Character Properties\*(R" in perlunicode. It is also possible todefined your own properties. This is discussed in
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -