📄 perlre.1
字号:
By default, when a quantified subpattern does not allow the rest of theoverall pattern to match, Perl will backtrack. However, this behaviour issometimes undesirable. Thus Perl provides the \*(L"possessive\*(R" quantifier formas well..PP.Vb 6\& *+ Match 0 or more times and give nothing back\& ++ Match 1 or more times and give nothing back\& ?+ Match 0 or 1 time and give nothing back\& {n}+ Match exactly n times and give nothing back (redundant)\& {n,}+ Match at least n times and give nothing back\& {n,m}+ Match at least n but not more than m times and give nothing back.Ve.PPFor instance,.PP.Vb 1\& \*(Aqaaaa\*(Aq =~ /a++a/.Ve.PPwill never match, as the \f(CW\*(C`a++\*(C'\fR will gobble up all the \f(CW\*(C`a\*(C'\fR's in thestring and won't leave any for the remaining part of the pattern. Thisfeature can be extremely useful to give perl hints about where itshouldn't backtrack. For instance, the typical \*(L"match a double-quotedstring\*(R" problem can be most efficiently performed when written as:.PP.Vb 1\& /"(?:[^"\e\e]++|\e\e.)*+"/.Ve.PPas we know that if the final quote does not match, backtracking will nothelp. See the independent subexpression \f(CW\*(C`(?>...)\*(C'\fR for more details;possessive quantifiers are just syntactic sugar for that construct. Forinstance the above example could also be written as follows:.PP.Vb 1\& /"(?>(?:(?>[^"\e\e]+)|\e\e.)*)"/.Ve.PP\fIEscape sequences\fR.IX Subsection "Escape sequences".PPBecause patterns are processed as double quoted strings, the followingalso work:.IX Xref "\et \en \er \ef \ee \ea \el \eu \eL \eU \eE \eQ \e0 \ec \eN \ex".PP.Vb 10\& \et tab (HT, TAB)\& \en newline (LF, NL)\& \er return (CR)\& \ef form feed (FF)\& \ea alarm (bell) (BEL)\& \ee escape (think troff) (ESC)\& \e033 octal char (example: ESC)\& \ex1B hex char (example: ESC)\& \ex{263a} long hex char (example: Unicode SMILEY)\& \ecK control char (example: VT)\& \eN{name} named Unicode character\& \el lowercase next char (think vi)\& \eu uppercase next char (think vi)\& \eL lowercase till \eE (think vi)\& \eU uppercase till \eE (think vi)\& \eE end case modification (think vi)\& \eQ quote (disable) pattern metacharacters till \eE.Ve.PPIf \f(CW\*(C`use locale\*(C'\fR is in effect, the case map used by \f(CW\*(C`\el\*(C'\fR, \f(CW\*(C`\eL\*(C'\fR, \f(CW\*(C`\eu\*(C'\fRand \f(CW\*(C`\eU\*(C'\fR is taken from the current locale. See perllocale. Fordocumentation of \f(CW\*(C`\eN{name}\*(C'\fR, see charnames..PPYou cannot include a literal \f(CW\*(C`$\*(C'\fR or \f(CW\*(C`@\*(C'\fR within a \f(CW\*(C`\eQ\*(C'\fR sequence.An unescaped \f(CW\*(C`$\*(C'\fR or \f(CW\*(C`@\*(C'\fR interpolates the corresponding variable,while escaping will cause the literal string \f(CW\*(C`\e$\*(C'\fR to be matched.You'll need to write something like \f(CW\*(C`m/\eQuser\eE\e@\eQhost/\*(C'\fR..PP\fICharacter Classes and other Special Escapes\fR.IX Subsection "Character Classes and other Special Escapes".PPIn addition, Perl defines the following:.IX Xref "\ew \eW \es \eS \ed \eD \eX \ep \eP \eC \eg \ek \eN \eK \ev \eV \eh \eH word whitespace character class backreference".PP.Vb 10\& \ew Match a "word" character (alphanumeric plus "_")\& \eW Match a non\-"word" character\& \es Match a whitespace character\& \eS Match a non\-whitespace character\& \ed Match a digit character\& \eD Match a non\-digit character\& \epP Match P, named property. Use \ep{Prop} for longer names.\& \ePP Match non\-P\& \eX Match eXtended Unicode "combining character sequence",\& equivalent to (?:\ePM\epM*)\& \eC Match a single C char (octet) even under Unicode.\& NOTE: breaks up characters into their UTF\-8 bytes,\& so you may end up with malformed pieces of UTF\-8.\& Unsupported in lookbehind.\& \e1 Backreference to a specific group.\& \*(Aq1\*(Aq may actually be any positive integer.\& \eg1 Backreference to a specific or previous group,\& \eg{\-1} number may be negative indicating a previous buffer and may\& optionally be wrapped in curly brackets for safer parsing.\& \eg{name} Named backreference\& \ek<name> Named backreference\& \eK Keep the stuff left of the \eK, don\*(Aqt include it in $&\& \ev Vertical whitespace\& \eV Not vertical whitespace\& \eh Horizontal whitespace\& \eH Not horizontal whitespace\& \eR Linebreak.Ve.PPA \f(CW\*(C`\ew\*(C'\fR matches a single alphanumeric character (an alphabeticcharacter, or a decimal digit) or \f(CW\*(C`_\*(C'\fR, not a whole word. Use \f(CW\*(C`\ew+\*(C'\fRto match a string of Perl-identifier characters (which isn't the sameas matching an English word). If \f(CW\*(C`use locale\*(C'\fR is in effect, the listof alphabetic characters generated by \f(CW\*(C`\ew\*(C'\fR is taken from the currentlocale. See perllocale. You may use \f(CW\*(C`\ew\*(C'\fR, \f(CW\*(C`\eW\*(C'\fR, \f(CW\*(C`\es\*(C'\fR, \f(CW\*(C`\eS\*(C'\fR,\&\f(CW\*(C`\ed\*(C'\fR, and \f(CW\*(C`\eD\*(C'\fR within character classes, but they aren't usableas either end of a range. If any of them precedes or follows a \*(L"\-\*(R",the \*(L"\-\*(R" is understood literally. If Unicode is in effect, \f(CW\*(C`\es\*(C'\fR matchesalso \*(L"\ex{85}\*(R", \*(L"\ex{2028}\*(R", and \*(L"\ex{2029}\*(R". See perlunicode for moredetails about \f(CW\*(C`\epP\*(C'\fR, \f(CW\*(C`\ePP\*(C'\fR, \f(CW\*(C`\eX\*(C'\fR and the possibility of definingyour own \f(CW\*(C`\ep\*(C'\fR and \f(CW\*(C`\eP\*(C'\fR properties, and perluniintro about Unicodein general..IX Xref "\ew \eW word".PP\&\f(CW\*(C`\eR\*(C'\fR will atomically match a linebreak, including the network line-ending\&\*(L"\ex0D\ex0A\*(R". Specifically, is exactly equivalent to.IX Xref "\eR".PP.Vb 1\& (?>\ex0D\ex0A?|[\ex0A\-\ex0C\ex85\ex{2028}\ex{2029}]).Ve.PP\&\fBNote:\fR \f(CW\*(C`\eR\*(C'\fR has no special meaning inside of a character class;use \f(CW\*(C`\ev\*(C'\fR instead (vertical whitespace)..IX Xref "\eR".PPThe \s-1POSIX\s0 character class syntax.IX Xref "character class".PP.Vb 1\& [:class:].Ve.PPis also available. Note that the \f(CW\*(C`[\*(C'\fR and \f(CW\*(C`]\*(C'\fR brackets are \fIliteral\fR;they must always be used within a character class expression..PP.Vb 2\& # this is correct:\& $string =~ /[[:alpha:]]/;\&\& # this is not, and will generate a warning:\& $string =~ /[:alpha:]/;.Ve.PPThe available classes and their backslash equivalents (if available) areas follows:.IX Xref "character class alpha alnum ascii blank cntrl digit graph lower print punct space upper word xdigit".PP.Vb 10\& alpha\& alnum\& ascii\& blank [1]\& cntrl\& digit \ed\& graph\& lower\& print\& punct\& space \es [2]\& upper\& word \ew [3]\& xdigit.Ve.IP "[1]" 4.IX Item "[1]"A \s-1GNU\s0 extension equivalent to \f(CW\*(C`[ \et]\*(C'\fR, \*(L"all horizontal whitespace\*(R"..IP "[2]" 4.IX Item "[2]"Not exactly equivalent to \f(CW\*(C`\es\*(C'\fR since the \f(CW\*(C`[[:space:]]\*(C'\fR includesalso the (very rare) \*(L"vertical tabulator\*(R", \*(L"\ecK\*(R" or chr(11) in \s-1ASCII\s0..IP "[3]" 4.IX Item "[3]"A Perl extension, see above..PPFor example use \f(CW\*(C`[:upper:]\*(C'\fR to match all the uppercase characters.Note that the \f(CW\*(C`[]\*(C'\fR are part of the \f(CW\*(C`[::]\*(C'\fR construct, not part of thewhole character class. For example:.PP.Vb 1\& [01[:alpha:]%].Ve.PPmatches zero, one, any alphabetic character, and the percent sign..PPThe following equivalences to Unicode \ep{} constructs and equivalentbackslash character classes (if available), will hold:.IX Xref "character class \ep \ep{}".PP.Vb 1\& [[:...:]] \ep{...} backslash\&\& alpha IsAlpha\& alnum IsAlnum\& ascii IsASCII\& blank\& cntrl IsCntrl\& digit IsDigit \ed\& graph IsGraph\& lower IsLower\& print IsPrint\& punct IsPunct\& space IsSpace\& IsSpacePerl \es\& upper IsUpper\& word IsWord\& xdigit IsXDigit.Ve.PPFor example \f(CW\*(C`[[:lower:]]\*(C'\fR and \f(CW\*(C`\ep{IsLower}\*(C'\fR are equivalent..PPIf the \f(CW\*(C`utf8\*(C'\fR pragma is not used but the \f(CW\*(C`locale\*(C'\fR pragma is, theclasses correlate with the usual \fIisalpha\fR\|(3) interface (except for\&\*(L"word\*(R" and \*(L"blank\*(R")..PPThe other named classes are:.IP "cntrl" 4.IX Xref "cntrl".IX Item "cntrl"Any control character. Usually characters that don't produce output assuch but instead control the terminal somehow: for example newline andbackspace are control characters. All characters with \fIord()\fR less than32 are usually classified as control characters (assuming \s-1ASCII\s0,the \s-1ISO\s0 Latin character sets, and Unicode), as is the character withthe \fIord()\fR value of 127 (\f(CW\*(C`DEL\*(C'\fR)..IP "graph" 4.IX Xref "graph".IX Item "graph"Any alphanumeric or punctuation (special) character..IP "print" 4.IX Xref "print".IX Item "print"Any alphanumeric or punctuation (special) character or the space character..IP "punct" 4.IX Xref "punct".IX Item "punct"Any punctuation (special) character..IP "xdigit" 4.IX Xref "xdigit".IX Item "xdigit"Any hexadecimal digit. Though this may feel silly ([0\-9A\-Fa\-f] wouldwork just fine) it is included for completeness..PPYou can negate the [::] character classes by prefixing the class namewith a '^'. This is a Perl extension. For example:.IX Xref "character class, negation".PP.Vb 1\& POSIX traditional Unicode\&\& [[:^digit:]] \eD \eP{IsDigit}\& [[:^space:]] \eS \eP{IsSpace}\& [[:^word:]] \eW \eP{IsWord}.Ve.PPPerl respects the \s-1POSIX\s0 standard in that \s-1POSIX\s0 character classes areonly supported within a character class. The \s-1POSIX\s0 character classes[.cc.] and [=cc=] are recognized but \fBnot\fR supported and trying touse them will cause an error..PP\fIAssertions\fR.IX Subsection "Assertions".PPPerl defines the following zero-width assertions:.IX Xref "zero-width assertion assertion regex, zero-width assertion regexp, zero-width assertion regular expression, zero-width assertion \eb \eB \eA \eZ \ez \eG".PP.Vb 7\& \eb Match a word boundary\& \eB Match except at a word boundary\& \eA Match only at beginning of string\& \eZ Match only at end of string, or before newline at the end\& \ez Match only at end of string\& \eG Match only at pos() (e.g. at the end\-of\-match position\& of prior m//g).Ve.PPA word boundary (\f(CW\*(C`\eb\*(C'\fR) is a spot between two charactersthat has a \f(CW\*(C`\ew\*(C'\fR on one side of it and a \f(CW\*(C`\eW\*(C'\fR on the other sideof it (in either order), counting the imaginary characters off thebeginning and end of the string as matching a \f(CW\*(C`\eW\*(C'\fR. (Withincharacter classes \f(CW\*(C`\eb\*(C'\fR represents backspace rather than a wordboundary, just as it normally does in any double-quoted string.)The \f(CW\*(C`\eA\*(C'\fR and \f(CW\*(C`\eZ\*(C'\fR are just like \*(L"^\*(R" and \*(L"$\*(R", except that theywon't match multiple times when the \f(CW\*(C`/m\*(C'\fR modifier is used, while\&\*(L"^\*(R" and \*(L"$\*(R" will match at every internal line boundary. To matchthe actual end of the string and not ignore an optional trailingnewline, use \f(CW\*(C`\ez\*(C'\fR..IX Xref "\eb \eA \eZ \ez m".PPThe \f(CW\*(C`\eG\*(C'\fR assertion can be used to chain global matches (using\&\f(CW\*(C`m//g\*(C'\fR), as described in \*(L"Regexp Quote-Like Operators\*(R" in perlop.It is also useful when writing \f(CW\*(C`lex\*(C'\fR\-like scanners, when you haveseveral patterns that you want to match against consequent substringsof your string, see the previous reference. The actual locationwhere \f(CW\*(C`\eG\*(C'\fR will match can also be influenced by using \f(CW\*(C`pos()\*(C'\fR asan lvalue: see \*(L"pos\*(R" in perlfunc. Note that the rule for zero-length
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -