📄 regexp.reference.html
字号:
<h2 class="title">Square brackets</h2> <p class="para"> An opening square bracket introduces a character class, terminated by a closing square bracket. A closing square bracket on its own is not special. If a closing square bracket is required as a member of the class, it should be the first data character in the class (after an initial circumflex, if present) or escaped with a backslash. </p> <p class="para"> A character class matches a single character in the subject; the character must be in the set of characters defined by the class, unless the first character in the class is a circumflex, in which case the subject character must not be in the set defined by the class. If a circumflex is actually required as a member of the class, ensure it is not the first character, or escape it with a backslash. </p> <p class="para"> For example, the character class [aeiou] matches any lower case vowel, while [^aeiou] matches any character that is not a lower case vowel. Note that a circumflex is just a convenient notation for specifying the characters which are in the class by enumerating those that are not. It is not an assertion: it still consumes a character from the subject string, and fails if the current pointer is at the end of the string. </p> <p class="para"> When caseless matching is set, any letters in a class represent both their upper case and lower case versions, so for example, a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a caseful version would. </p> <p class="para"> The newline character is never treated in any special way in character classes, whatever the setting of the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a> or <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> options is. A class such as [^a] will always match a newline. </p> <p class="para"> The minus (hyphen) character can be used to specify a range of characters in a character class. For example, [d-m] matches any letter between d and m, inclusive. If a minus character is required in a class, it must be escaped with a backslash or appear in a position where it cannot be interpreted as indicating a range, typically as the first or last character in the class. </p> <p class="para"> It is not possible to have the literal character "]" as the end character of a range. A pattern such as [W-]46] is interpreted as a class of two characters ("W" and "-") followed by a literal string "46]", so it would match "W46]" or "-46]". However, if the "]" is escaped with a backslash it is interpreted as the end of range, so [W-\]46] is interpreted as a single class containing a range followed by two separate characters. The octal or hexadecimal representation of "]" can also be used to end a range. </p> <p class="para"> Ranges operate in ASCII collating sequence. They can also be used for characters specified numerically, for example [\000-\037]. If a range that includes letters is used when caseless matching is set, it matches the letters in either case. For example, [W-c] is equivalent to [][\^_`wxyzabc], matched caselessly, and if character tables for the "fr" locale are in use, [\xc8-\xcb] matches accented E characters in both cases. </p> <p class="para"> The character types \d, \D, \s, \S, \w, and \W may also appear in a character class, and add the characters that they match to the class. For example, [\dABCDEF] matches any hexadecimal digit. A circumflex can conveniently be used with the upper case character types to specify a more restricted set of characters than the matching lower case type. For example, the class [^\W_] matches any letter or digit, but not underscore. </p> <p class="para"> All non-alphanumeric characters other than \, -, ^ (at the start) and the terminating ] are non-special in character classes, but it does no harm if they are escaped. </p> </div> <div id="regexp.reference.verticalbar" class="section"> <h2 class="title">Vertical bar</h2> <p class="para"> Vertical bar characters are used to separate alternative patterns. For example, the pattern <i>gilbert|sullivan</i> matches either "gilbert" or "sullivan". Any number of alternatives may appear, and an empty alternative is permitted (matching the empty string). The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. If the alternatives are within a subpattern (defined below), "succeeds" means matching the rest of the main pattern as well as the alternative in the subpattern. </p> </div> <div id="regexp.reference.internal-options" class="section"> <h2 class="title">Internal option setting</h2> <p class="para"> The settings of <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_CASELESS</a>, <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a>, <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>, <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_UNGREEDY</a>, <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTRA</a>, <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTENDED</a> and PCRE_DUPNAMES can be changed from within the pattern by a sequence of Perl option letters enclosed between "(?" and ")". The option letters are: <table border="5"> <caption><b>Internal option letters</b></caption> <colgroup> <tbody valign="middle" class="tbody"> <tr valign="middle"> <td colspan="1" rowspan="1" align="left"><i>i</i></td> <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_CASELESS</a></td> </tr> <tr valign="middle"> <td colspan="1" rowspan="1" align="left"><i>m</i></td> <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a></td> </tr> <tr valign="middle"> <td colspan="1" rowspan="1" align="left"><i>s</i></td> <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a></td> </tr> <tr valign="middle"> <td colspan="1" rowspan="1" align="left"><i>x</i></td> <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTENDED</a></td> </tr> <tr valign="middle"> <td colspan="1" rowspan="1" align="left"><i>U</i></td> <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_UNGREEDY</a></td> </tr> <tr valign="middle"> <td colspan="1" rowspan="1" align="left"><i>X</i></td> <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTRA</a></td> </tr> <tr valign="middle"> <td colspan="1" rowspan="1" align="left"><i>J</i></td> <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_INFO_JCHANGED</a></td> </tr> </tbody> </colgroup> </table> </p> <p class="para"> For example, (?im) sets caseless, multiline matching. It is also possible to unset these options by preceding the letter with a hyphen, and a combined setting and unsetting such as (?im-sx), which sets <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_CASELESS</a> and <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> while unsetting <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a> and <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTENDED</a>, is also permitted. If a letter appears both before and after the hyphen, the option is unset. </p> <p class="para"> When an option change occurs at top level (that is, not inside subpattern parentheses), the change applies to the remainder of the pattern that follows. So <i>/ab(?i)c/</i> matches only "abc" and "abC". This behaviour has been changed in PCRE 4.0, which is bundled since PHP 4.3.3. Before those versions, <i>/ab(?i)c/</i> would perform as <i>/abc/i</i> (e.g. matching "ABC" and "aBc"). </p> <p class="para"> If an option change occurs inside a subpattern, the effect is different. This is a change of behaviour in Perl 5.005. An option change inside a subpattern affects only that part of the subpattern that follows it, so <i>(a(?i)b)c</i> matches abc and aBc and no other strings (assuming <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_CASELESS</a> is not used). By this means, options can be made to have different settings in different parts of the pattern. Any changes made in one alternative do carry on into subsequent branches within the same subpattern. For example, <i>(a(?i)b|c)</i> matches "ab", "aB", "c", and "C", even though when matching "C" the first branch is abandoned before the option setting. This is because the effects of option settings happen at compile time. There would be some very weird behaviour otherwise. </p> <p class="para"> The PCRE-specific options <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_UNGREEDY</a> and <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTRA</a> can be changed in the same way as the Perl-compatible options by using the characters U and X respectively. The (?X) flag setting is special in that it must always occur earlier in the pattern than any of the additional features it turns on, even when it is at top level. It is best put at the start. </p> </div> <div id="regexp.reference.subpatterns" class="section"> <h2 class="title">Subpatterns</h2> <p class="para"> Subpatterns are delimited by parentheses (round brackets), which can be nested. Marking part of a pattern as a subpattern does two things: </p> <p class="para"> 1. It localizes a set of alternatives. For example, the pattern <i>cat(aract|erpillar|)</i> matches one of the words "cat", "cataract", or "caterpillar". Without the parentheses, it would match "cataract", "erpillar" or the empty string. </p> <p class="para"> 2. It sets up the subpattern as a capturing subpattern (as defined above). When the whole pattern matches, that portion of the subject string that matched the subpattern is passed back to the caller via the <em class="emphasis">ovector</em> argument of <b>pcre_exec()</b>. Opening parentheses are counted from left to right (starting from 1) to obtain the numbers of the capturing subpatterns. </p> <p class="para"> For example, if the string "the red king" is matched against the pattern <i>the ((red|white) (king|queen))</i> the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3. </p> <p class="para"> The fact that plain parentheses fulfil two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern <i>the ((?:red|white) (king|queen))</i> the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 99, and the maximum number of all subpatterns, both capturing and non-capturing, is 200. </p> <p class="para"> As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns </p> <pre class="literallayout"> (?i:saturday|sunday) (?:(?i)saturday|sunday) </pre> <p class="para"> match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday". </p> <p class="para"> It is possible to name the subpattern with <i>(?P<name>pattern)</i> since PHP 4.3.3. Array with matches will contain the match indexed by the string alongside the match indexed by a number, then. </p> </div> <div id="regexp.reference.repetition" class="section"> <h2 class="title">Repetition</h2> <p class="para"> Repetition is specified by quantifiers, which can follow any of the following items: <ul class="itemizedlist"> <li class="listitem"><span class="simpara">a single character, possibly escaped</span></li> <li class="listitem"><span class="simpara">the . metacharacter</span></li> <li class="listitem"><span class="simpara">a character class</span></li> <li class="listitem"><span class="simpara">a back reference (see next section)</span></li> <li class="listitem"><span class="simpara">a parenthesized subpattern (unless it is an assertion - see below)</span></li> </ul> </p> <p class="para"> The general repetition quantifier specifies a minimum and maximum number of permitted matches, by giving the two numbers in curly brackets (braces), separated by a comma. The numbers must be less than 65536, and the first must be less than or equal to the second. For example: <i>z{2,4}</i> matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special character. If the second number is omitted, but the comma is present, there is no upper limit; if the second number and the comma are both omitted, the quantifier specifies an exact number of required matches. Thus <i>[aeiou]{3,}</i> matches at least 3 successive vowels, but may match many more, while <i>\d{8}</i> matches exactly 8 digits. An opening curly bracket that appears in a position where a quantifier is not allowed, or one that does not match the syntax of a quantifier, is taken as a literal character. For example, {,6} is not a quantifier, but a literal string of four characters. </p> <p class="para">
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -