📄 pcre.3
字号:
(?i:saturday|sunday) (?:(?i)saturday|sunday)match exactly the same set of strings. Because alternative branches are triedfrom left to right, and options are not reset until the end of the subpatternis reached, an option setting in one branch does affect subsequent branches, sothe above patterns match "SUNDAY" as well as "Saturday"..SH REPETITIONRepetition is specified by quantifiers, which can follow any of the followingitems: a single character, possibly escaped the . metacharacter a character class a back reference (see next section) a parenthesized subpattern (unless it is an assertion - see below)The general repetition quantifier specifies a minimum and maximum number ofpermitted matches, by giving the two numbers in curly brackets (braces),separated by a comma. The numbers must be less than 65536, and the first mustbe less than or equal to the second. For example: z{2,4}matches "zz", "zzz", or "zzzz". A closing brace on its own is not a specialcharacter. If the second number is omitted, but the comma is present, there isno upper limit; if the second number and the comma are both omitted, thequantifier specifies an exact number of required matches. Thus [aeiou]{3,}matches at least 3 successive vowels, but may match many more, while \\d{8}matches exactly 8 digits. An opening curly bracket that appears in a positionwhere a quantifier is not allowed, or one that does not match the syntax of aquantifier, is taken as a literal character. For example, {,6} is not aquantifier, but a literal string of four characters.The quantifier {0} is permitted, causing the expression to behave as if theprevious item and the quantifier were not present.For convenience (and historical compatibility) the three most commonquantifiers have single-character abbreviations: * is equivalent to {0,} + is equivalent to {1,} ? is equivalent to {0,1}It is possible to construct infinite loops by following a subpattern that canmatch no characters with a quantifier that has no upper limit, for example: (a?)*Earlier versions of Perl and PCRE used to give an error at compile time forsuch patterns. However, because there are cases where this can be useful, suchpatterns are now accepted, but if any repetition of the subpattern does in factmatch no characters, the loop is forcibly broken.By default, the quantifiers are "greedy", that is, they match as much aspossible (up to the maximum number of permitted times), without causing therest of the pattern to fail. The classic example of where this gives problemsis in trying to match comments in C programs. These appear between thesequences /* and */ and within the sequence, individual * and / characters mayappear. An attempt to match C comments by applying the pattern /\\*.*\\*/to the string /* first command */ not comment /* second comment */fails, because it matches the entire string due to the greediness of the .*item.However, if a quantifier is followed by a question mark, it ceases to begreedy, and instead matches the minimum number of times possible, so thepattern /\\*.*?\\*/does the right thing with the C comments. The meaning of the variousquantifiers is not otherwise changed, just the preferred number of matches.Do not confuse this use of question mark with its use as a quantifier in itsown right. Because it has two uses, it can sometimes appear doubled, as in \\d??\\dwhich matches one digit by preference, but can match two if that is the onlyway the rest of the pattern matches.If the PCRE_UNGREEDY option is set (an option which is not available in Perl),the quantifiers are not greedy by default, but individual ones can be madegreedy by following them with a question mark. In other words, it inverts thedefault behaviour.When a parenthesized subpattern is quantified with a minimum repeat count thatis greater than 1 or with a limited maximum, more store is required for thecompiled pattern, in proportion to the size of the minimum or maximum.If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalentto Perl's /s) is set, thus allowing the . to match newlines, the pattern isimplicitly anchored, because whatever follows will be tried against everycharacter position in the subject string, so there is no point in retrying theoverall match at any position after the first. PCRE treats such a pattern asthough it were preceded by \\A. In cases where it is known that the subjectstring contains no newlines, it is worth setting PCRE_DOTALL when the patternbegins with .* in order to obtain this optimization, or alternatively using ^to indicate anchoring explicitly.When a capturing subpattern is repeated, the value captured is the substringthat matched the final iteration. For example, after (tweedle[dume]{3}\\s*)+has matched "tweedledum tweedledee" the value of the captured substring is"tweedledee". However, if there are nested capturing subpatterns, thecorresponding captured values may have been set in previous iterations. Forexample, after /(a|(b))+/matches "aba" the value of the second captured substring is "b"..SH BACK REFERENCESOutside a character class, a backslash followed by a digit greater than 0 (andpossibly further digits) is a back reference to a capturing subpattern earlier(i.e. to its left) in the pattern, provided there have been that many previouscapturing left parentheses.However, if the decimal number following the backslash is less than 10, it isalways taken as a back reference, and causes an error only if there are notthat many capturing left parentheses in the entire pattern. In other words, theparentheses that are referenced need not be to the left of the reference fornumbers less than 10. See the section entitled "Backslash" above for furtherdetails of the handling of digits following a backslash.A back reference matches whatever actually matched the capturing subpattern inthe current subject string, rather than anything matching the subpatternitself. So the pattern (sens|respons)e and \\1ibilitymatches "sense and sensibility" and "response and responsibility", but not"sense and responsibility". If caseful matching is in force at the time of theback reference, the case of letters is relevant. For example, ((?i)rah)\\s+\\1matches "rah rah" and "RAH RAH", but not "RAH rah", even though the originalcapturing subpattern is matched caselessly.There may be more than one back reference to the same subpattern. If asubpattern has not actually been used in a particular match, any backreferences to it always fail. For example, the pattern (a|(bc))\\2always fails if it starts to match "a" rather than "bc". Because there may beup to 99 back references, all digits following the backslash are takenas part of a potential back reference number. If the pattern continues with adigit character, some delimiter must be used to terminate the back reference.If the PCRE_EXTENDED option is set, this can be whitespace. Otherwise an emptycomment can be used.A back reference that occurs inside the parentheses to which it refers failswhen the subpattern is first used, so, for example, (a\\1) never matches.However, such references can be useful inside repeated subpatterns. Forexample, the pattern (a|b\\1)+matches any number of "a"s and also "aba", "ababaa" etc. At each iteration ofthe subpattern, the back reference matches the character string correspondingto the previous iteration. In order for this to work, the pattern must be suchthat the first iteration does not need to match the back reference. This can bedone using alternation, as in the example above, or by a quantifier with aminimum of zero..SH ASSERTIONSAn assertion is a test on the characters following or preceding the currentmatching point that does not actually consume any characters. The simpleassertions coded as \\b, \\B, \\A, \\Z, \\z, ^ and $ are described above. Morecomplicated assertions are coded as subpatterns. There are two kinds: thosethat look ahead of the current position in the subject string, and those thatlook behind it.An assertion subpattern is matched in the normal way, except that it does notcause the current matching position to be changed. Lookahead assertions startwith (?= for positive assertions and (?! for negative assertions. For example, \\w+(?=;)matches a word followed by a semicolon, but does not include the semicolon inthe match, and foo(?!bar)matches any occurrence of "foo" that is not followed by "bar". Note that theapparently similar pattern (?!foo)bardoes not find an occurrence of "bar" that is preceded by something other than"foo"; it finds any occurrence of "bar" whatsoever, because the assertion(?!foo) is always true when the next three characters are "bar". Alookbehind assertion is needed to achieve this effect.Lookbehind assertions start with (?<= for positive assertions and (?<! fornegative assertions. For example, (?<!foo)bardoes find an occurrence of "bar" that is not preceded by "foo". The contents ofa lookbehind assertion are restricted such that all the strings it matches musthave a fixed length. However, if there are several alternatives, they do notall have to have the same fixed length. Thus (?<=bullock|donkey)is permitted, but (?<!dogs?|cats?)causes an error at compile time. Branches that match different length stringsare permitted only at the top level of a lookbehind assertion. This is anextension compared with Perl 5.005, which requires all branches to match thesame length of string. An assertion such as (?<=ab(c|de))is not permitted, because its single top-level branch can match two differentlengths, but it is acceptable if rewritten to use two top-level branches: (?<=abc|abde)The implementation of lookbehind assertions is, for each alternative, totemporarily move the current position back by the fixed width and then try tomatch. If there are insufficient characters before the current position, thematch is deemed to fail. Lookbehinds in conjunction with once-only subpatternscan be particularly useful for matching at the ends of strings; an example isgiven at the end of the section on once-only subpatterns.Several assertions (of any sort) may occur in succession. For example, (?<=\\d{3})(?<!999)foomatches "foo" preceded by three digits that are not "999". Notice that each ofthe assertions is applied independently at the same point in the subjectstring. First there is a check that the previous three characters are alldigits, and then there is a check that the same three characters are not "999".This pattern does \fInot\fR match "foo" preceded by six characters, the firstof which are digits and the last three of which are not "999". For example, itdoesn't match "123abcfoo". A pattern to do that is (?<=\\d{3}...)(?<!999)fooThis time the first assertion looks at the preceding six characters, checkingthat the first three are digits, and then the second assertion checks that thepreceding three characters are not "999".Assertions can be nested in any combination. For example, (?<=(?<!foo)bar)bazmatches an occurrence of "baz" that is preceded by "bar" which in turn is notpreceded by "foo", while (?<=\\d{3}(?!999)...)foois another pattern which matches "foo" preceded by three digits and any threecharacters that are not "999".Assertion subpatterns are not capturing subpatterns, and may not be repeated,because it makes no
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -