📄 perlre.1
字号:
\& \e(\& ( # paren group 3 (contents of parens)\& (?:\& (?> [^()]+ ) # Non\-parens without backtracking\& |\& (?2) # Recurse to start of paren group 2\& )*\& )\& \e)\& )\& )\& }x;.Ve.SpIf the pattern was used as follows.Sp.Vb 4\& \*(Aqfoo(bar(baz)+baz(bop))\*(Aq=~/$re/\& and print "\e$1 = $1\en",\& "\e$2 = $2\en",\& "\e$3 = $3\en";.Ve.Spthe output produced should be the following:.Sp.Vb 3\& $1 = foo(bar(baz)+baz(bop))\& $2 = (bar(baz)+baz(bop))\& $3 = bar(baz)+baz(bop).Ve.SpIf there is no corresponding capture buffer defined, then it is afatal error. Recursing deeper than 50 times without consuming any inputstring will also result in a fatal error. The maximum depth is compiledinto perl, so changing it requires a custom build..SpThe following shows how using negative indexing can make iteasier to embed recursive patterns inside of a \f(CW\*(C`qr//\*(C'\fR constructfor later use:.Sp.Vb 4\& my $parens = qr/(\e((?:[^()]++|(?\-1))*+\e))/;\& if (/foo $parens \es+ + \es+ bar $parens/x) {\& # do something here...\& }.Ve.Sp\&\fBNote\fR that this pattern does not behave the same way as the equivalent\&\s-1PCRE\s0 or Python construct of the same form. In Perl you can backtrack intoa recursed group, in \s-1PCRE\s0 and Python the recursed into group is treatedas atomic. Also, modifiers are resolved at compile time, so constructslike (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern willbe processed..ie n .IP """(?&NAME)""" 10.el .IP "\f(CW(?&NAME)\fR" 10.IX Xref "(?&NAME)".IX Item "(?&NAME)"Recurse to a named subpattern. Identical to \f(CW\*(C`(?PARNO)\*(C'\fR except that theparenthesis to recurse to is determined by name. If multiple parentheses havethe same name, then it recurses to the leftmost..SpIt is an error to refer to a name that is not declared somewhere in thepattern..Sp\&\fB\s-1NOTE:\s0\fR In order to make things easier for programmers with experiencewith the Python or \s-1PCRE\s0 regex engines the pattern \f(CW\*(C`(?P>NAME)\*(C'\fRmay be used instead of \f(CW\*(C`(?&NAME)\*(C'\fR..ie n .IP """(?(condition)yes\-pattern|no\-pattern)""" 10.el .IP "\f(CW(?(condition)yes\-pattern|no\-pattern)\fR" 10.IX Xref "(?()".IX Item "(?(condition)yes-pattern|no-pattern)".PD 0.ie n .IP """(?(condition)yes\-pattern)""" 10.el .IP "\f(CW(?(condition)yes\-pattern)\fR" 10.IX Item "(?(condition)yes-pattern)".PDConditional expression. \f(CW\*(C`(condition)\*(C'\fR should be either an integer inparentheses (which is valid if the corresponding pair of parenthesesmatched), a look\-ahead/look\-behind/evaluate zero-width assertion, aname in angle brackets or single quotes (which is valid if a bufferwith the given name matched), or the special symbol (R) (true whenevaluated inside of recursion or eval). Additionally the R may befollowed by a number, (which will be true when evaluated when recursinginside of the appropriate group), or by \f(CW&NAME\fR, in which case it willbe true only when evaluated during recursion in the named group..SpHere's a summary of the possible predicates:.RS 10.IP "(1) (2) ..." 4.IX Item "(1) (2) ..."Checks if the numbered capturing buffer has matched something..IP "(<\s-1NAME\s0>) ('\s-1NAME\s0')" 4.IX Item "(<NAME>) ('NAME')"Checks if a buffer with the given name has matched something..IP "(?{ \s-1CODE\s0 })" 4.IX Item "(?{ CODE })"Treats the code block as the condition..IP "(R)" 4.IX Item "(R)"Checks if the expression has been evaluated inside of recursion..IP "(R1) (R2) ..." 4.IX Item "(R1) (R2) ..."Checks if the expression has been evaluated while executing directlyinside of the n\-th capture group. This check is the regex equivalent of.Sp.Vb 1\& if ((caller(0))[3] eq \*(Aqsubname\*(Aq) { ... }.Ve.SpIn other words, it does not check the full recursion stack..IP "(R&NAME)" 4.IX Item "(R&NAME)"Similar to \f(CW\*(C`(R1)\*(C'\fR, this predicate checks to see if we're executingdirectly inside of the leftmost group with a given name (this is the samelogic used by \f(CW\*(C`(?&NAME)\*(C'\fR to disambiguate). It does not check the fullstack, but only the name of the innermost active recursion..IP "(\s-1DEFINE\s0)" 4.IX Item "(DEFINE)"In this case, the yes-pattern is never directly executed, and nono-pattern is allowed. Similar in spirit to \f(CW\*(C`(?{0})\*(C'\fR but more efficient.See below for details..RE.RS 10.SpFor example:.Sp.Vb 4\& m{ ( \e( )?\& [^()]+\& (?(1) \e) )\& }x.Ve.Spmatches a chunk of non-parentheses, possibly included in parenthesesthemselves..SpA special form is the \f(CW\*(C`(DEFINE)\*(C'\fR predicate, which never executes directlyits yes-pattern, and does not allow a no-pattern. This allows to definesubpatterns which will be executed only by using the recursion mechanism.This way, you can define a set of regular expression rules that can bebundled into any pattern you choose..SpIt is recommended that for this usage you put the \s-1DEFINE\s0 block at theend of the pattern, and that you name any subpatterns defined within it..SpAlso, it's worth noting that patterns defined this way probably willnot be as efficient, as the optimiser is not very clever abouthandling them..SpAn example of how this might be used is as follows:.Sp.Vb 5\& /(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))\& (?(DEFINE)\& (?<NAME_PAT>....)\& (?<ADRESS_PAT>....)\& )/x.Ve.SpNote that capture buffers matched inside of recursion are not accessibleafter the recursion returns, so the extra layer of capturing buffers isnecessary. Thus \f(CW$+{NAME_PAT}\fR would not be defined even though\&\f(CW$+{NAME}\fR would be..RE.ie n .IP """(?>pattern)""" 10.el .IP "\f(CW(?>pattern)\fR" 10.IX Xref "backtrack backtracking atomic possessive".IX Item "(?>pattern)"An \*(L"independent\*(R" subexpression, one which matches the substringthat a \fIstandalone\fR \f(CW\*(C`pattern\*(C'\fR would match if anchored at the givenposition, and it matches \fInothing other than this substring\fR. Thisconstruct is useful for optimizations of what would otherwise be\&\*(L"eternal\*(R" matches, because it will not backtrack (see \*(L"Backtracking\*(R").It may also be useful in places where the \*(L"grab all you can, and do notgive anything back\*(R" semantic is desirable..SpFor example: \f(CW\*(C`^(?>a*)ab\*(C'\fR will never match, since \f(CW\*(C`(?>a*)\*(C'\fR(anchored at the beginning of string, as above) will match \fIall\fRcharacters \f(CW\*(C`a\*(C'\fR at the beginning of string, leaving no \f(CW\*(C`a\*(C'\fR for\&\f(CW\*(C`ab\*(C'\fR to match. In contrast, \f(CW\*(C`a*ab\*(C'\fR will match the same as \f(CW\*(C`a+b\*(C'\fR,since the match of the subgroup \f(CW\*(C`a*\*(C'\fR is influenced by the followinggroup \f(CW\*(C`ab\*(C'\fR (see \*(L"Backtracking\*(R"). In particular, \f(CW\*(C`a*\*(C'\fR inside\&\f(CW\*(C`a*ab\*(C'\fR will match fewer characters than a standalone \f(CW\*(C`a*\*(C'\fR, sincethis makes the tail match..SpAn effect similar to \f(CW\*(C`(?>pattern)\*(C'\fR may be achieved by writing\&\f(CW\*(C`(?=(pattern))\e1\*(C'\fR. This matches the same substring as a standalone\&\f(CW\*(C`a+\*(C'\fR, and the following \f(CW\*(C`\e1\*(C'\fR eats the matched string; it thereforemakes a zero-length assertion into an analogue of \f(CW\*(C`(?>...)\*(C'\fR.(The difference between these two constructs is that the second oneuses a capturing group, thus shifting ordinals of backreferencesin the rest of a regular expression.).SpConsider this pattern:.Sp.Vb 8\& m{ \e(\& (\& [^()]+ # x+\& |\& \e( [^()]* \e)\& )+\& \e)\& }x.Ve.SpThat will efficiently match a nonempty group with matching parenthesestwo levels deep or less. However, if there is no such group, itwill take virtually forever on a long string. That's because thereare so many different ways to split a long string into severalsubstrings. This is what \f(CW\*(C`(.+)+\*(C'\fR is doing, and \f(CW\*(C`(.+)+\*(C'\fR is similarto a subpattern of the above pattern. Consider how the patternabove detects no-match on \f(CW\*(C`((()aaaaaaaaaaaaaaaaaa\*(C'\fR in severalseconds, but that each extra letter doubles this time. Thisexponential performance will make it appear that your program hashung. However, a tiny change to this pattern.Sp.Vb 8\& m{ \e(\& (\& (?> [^()]+ ) # change x+ above to (?> x+ )\& |\& \e( [^()]* \e)\& )+\& \e)\& }x.Ve.Spwhich uses \f(CW\*(C`(?>...)\*(C'\fR matches exactly when the one above does (verifyingthis yourself would be a productive exercise), but finishes in a fourththe time when used on a similar string with 1000000 \f(CW\*(C`a\*(C'\fRs. Be aware,however, that this pattern currently triggers a warning message underthe \f(CW\*(C`use warnings\*(C'\fR pragma or \fB\-w\fR switch saying it\&\f(CW"matches null string many times in regex"\fR..SpOn simple groups, such as the pattern \f(CW\*(C`(?> [^()]+ )\*(C'\fR, a comparableeffect may be achieved by negative look-ahead, as in \f(CW\*(C`[^()]+ (?! [^()] )\*(C'\fR.This was only 4 times slower on a string with 1000000 \f(CW\*(C`a\*(C'\fRs..SpThe \*(L"grab all you can, and do not give anything back\*(R" semantic is desirablein many situations where on the first sight a simple \f(CW\*(C`()*\*(C'\fR looks likethe correct solution. Suppose we parse text with comments being delimitedby \f(CW\*(C`#\*(C'\fR followed by some optional (horizontal) whitespace. Contrary toits appearance, \f(CW\*(C`#[ \et]*\*(C'\fR \fIis not\fR the correct subexpression to matchthe comment delimiter, because it may \*(L"give up\*(R" some whitespace ifthe remainder of the pattern can be made to match that way. The correctanswer is either one of these:.Sp.Vb 2\& (?>#[ \et]*)\& #[ \et]*(?![ \et]).Ve.SpFor example, to grab non-empty comments into \f(CW$1\fR, one should use eitherone of these:.Sp.Vb 2\& / (?> \e# [ \et]* ) ( .+ ) /x;\& / \e# [ \et]* ( [^ \et] .* ) /x;.Ve.SpWhich one you pick depends on which of these expressions better reflectsthe above specification of comments..SpIn some literature this construct is called \*(L"atomic matching\*(R" or\&\*(L"possessive matching\*(R"..SpPossessive quantifiers are equivalent to putting the item they are appliedto inside of one of these constructs. The following equivalences apply:.Sp.Vb 6\& Quantifier Form Bracketing Form\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\& PAT*+ (?>PAT*)\& PAT++ (?>PAT+)\& PAT?+ (?>PAT?)\& PAT{min,max}+ (?>PAT{min,max}).Ve.Sh "Special Backtracking Control Verbs".IX Subsection "Special Backtracking Control Verbs"\&\fB\s-1WARNING:\s0\fR These patterns are experimental and subject to change orremoval in a future version of Perl. Their usage in production code shouldbe noted to avoid problems during upgrades..PPThese special patterns are generally of the form \f(CW\*(C`(*VERB:ARG)\*(C'\fR. Unlessotherwise stated the \s-1ARG\s0 argument is optional; in some cases, it isforbidden..PPAny pattern containing a special backtracking verb that allows an argumenthas the special behaviour that when executed it sets the current packages'\&\f(CW$REGERROR\fR and \f(CW$REGMARK\fR variables. When doing so the followingrules apply:.PPOn failure, the \f(CW$REGERROR\fR variable will be set to the \s-1ARG\s0 value of theverb pattern, if the verb was involved in the failure of the match. If the\&\s-1ARG\s0 part of the pattern was omitted, then \f(CW$REGERROR\fR will be set to thename of the last \f(CW\*(C`(*MARK:NAME)\*(C'\fR pattern executed, or to \s-1TRUE\s0 if there wasnone. Also, the \f(CW$REGMARK\fR variable will be set to \s-1FALSE\s0..PPOn a successful match, the \f(CW$REGERROR\fR variable will be set to \s-1FALSE\s0, andthe \f(CW$REGMARK\fR variable will be set to the name of the last\&\f(CW\*(C`(*MARK:NAME)\*(C'\fR pattern executed. See the explanation for the\&\f(CW\*(C`(*MARK:NAME)\*(C'\fR verb below for more details..PP\&\fB\s-1NOTE:\s0\fR \f(CW$REGERROR\fR and \f(CW$REGMARK\fR are not magic variables like \f(CW$1\fRand most other regex related variables. They are not local to a scope, norreadonly, but instead are volatile package variables similar to \f(CW$AUTOLOAD\fR.Use \f(CW\*(C`local\*(C'\fR to localize changes to them to a
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -