📄 regexp.reference.html
字号:
<dt> <span class="term"><em class="emphasis">\Z</em></span> <dd> <span class="simpara"> end of subject or newline at end (independent of multiline mode) </span> </dd> </dt> <dt> <span class="term"><em class="emphasis">\z</em></span> <dd><span class="simpara">end of subject (independent of multiline mode)</span></dd> </dt> <dt> <span class="term"><em class="emphasis">\G</em></span> <dd><span class="simpara">first matching position in subject</span></dd> </dt> </dl> </p> <p class="para"> These assertions may not appear in character classes (but note that "<i>\b</i>" has a different meaning, namely the backspace character, inside a character class). </p> <p class="para"> A word boundary is a position in the subject string where the current character and the previous character do not both match <i>\w</i> or <i>\W</i> (i.e. one matches <i>\w</i> and the other matches <i>\W</i>), or the start or end of the string if the first or last character matches <i>\w</i>, respectively. </p> <p class="para"> The <i>\A</i>, <i>\Z</i>, and <i>\z</i> assertions differ from the traditional circumflex and dollar (described below) in that they only ever match at the very start and end of the subject string, whatever options are set. They are not affected by the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> or <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOLLAR_ENDONLY</a> options. The difference between <i>\Z</i> and <i>\z</i> is that <i>\Z</i> matches before a newline that is the last character of the string as well as at the end of the string, whereas <i>\z</i> matches only at the end. </p> <p class="para"> The <i>\G</i> assertion is true only when the current matching position is at the start point of the match, as specified by the <i><tt class="parameter">offset</tt></i> argument of <a href="function.preg-match.html" class="function">preg_match()</a>. It differs from <i>\A</i> when the value of <i><tt class="parameter">offset</tt></i> is non-zero. It is available since PHP 4.3.3. </p> <p class="para"> <i>\Q</i> and <i>\E</i> can be used to ignore regexp metacharacters in the pattern since PHP 4.3.3. For example: <i>\w+\Q.$.\E$</i> will match one or more word characters, followed by literals <i>.$.</i> and anchored at the end of the string. </p> <p class="para"> <i>\K</i> can be used to reset the match start since PHP 5.2.4. For example, the pattern <i>foo\Kbar</i> matches "foobar", but reports that it has matched "bar". The use of <i>\K</i> does not interfere with the setting of captured substrings. For example, when the pattern <i>(foo)\Kbar</i> matches "foobar", the first substring is still set to "foo". </p> </div> <div id="regexp.reference.unicode" class="section"> <h2 class="title">Unicode character properties</h2> <p class="para"> Since PHP 4.4.0 and 5.1.0, three additional escape sequences to match generic character types are available when <em class="emphasis">UTF-8 mode</em> is selected. They are: </p> <dl> <dt> <span class="term"><em class="emphasis">\p{xx}</em></span> <dd><span class="simpara">a character with the xx property</span></dd> </dt> <dt> <span class="term"><em class="emphasis">\P{xx}</em></span> <dd><span class="simpara">a character without the xx property</span></dd> </dt> <dt> <span class="term"><em class="emphasis">\X</em></span> <dd><span class="simpara">an extended Unicode sequence</span></dd> </dt> </dl> <p class="para"> The property names represented by <i>xx</i> above are limited to the Unicode general category properties. Each character has exactly one such property, specified by a two-letter abbreviation. For compatibility with Perl, negation can be specified by including a circumflex between the opening brace and the property name. For example, <i>\p{^Lu}</i> is the same as <i>\P{Lu}</i>. </p> <p class="para"> If only one letter is specified with <i>\p</i> or <i>\P</i>, it includes all the properties that start with that letter. In this case, in the absence of negation, the curly brackets in the escape sequence are optional; these two examples have the same effect: </p> <pre class="literallayout"> \p{L} \pL </pre> <table border="5"> <caption><b>Supported property codes</b></caption> <colgroup> <tbody valign="middle" class="tbody"> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>C</i></td><td colspan="1" rowspan="1" align="left">Other</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Cc</i></td><td colspan="1" rowspan="1" align="left">Control</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Cf</i></td><td colspan="1" rowspan="1" align="left">Format</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Cn</i></td><td colspan="1" rowspan="1" align="left">Unassigned</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Co</i></td><td colspan="1" rowspan="1" align="left">Private use</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Cs</i></td><td colspan="1" rowspan="1" align="left">Surrogate</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>L</i></td><td colspan="1" rowspan="1" align="left">Letter</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Ll</i></td><td colspan="1" rowspan="1" align="left">Lower case letter</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Lm</i></td><td colspan="1" rowspan="1" align="left">Modifier letter</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Lo</i></td><td colspan="1" rowspan="1" align="left">Other letter</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Lt</i></td><td colspan="1" rowspan="1" align="left">Title case letter</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Lu</i></td><td colspan="1" rowspan="1" align="left">Upper case letter</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>M</i></td><td colspan="1" rowspan="1" align="left">Mark</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Mc</i></td><td colspan="1" rowspan="1" align="left">Spacing mark</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Me</i></td><td colspan="1" rowspan="1" align="left">Enclosing mark</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Mn</i></td><td colspan="1" rowspan="1" align="left">Non-spacing mark</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>N</i></td><td colspan="1" rowspan="1" align="left">Number</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Nd</i></td><td colspan="1" rowspan="1" align="left">Decimal number</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Nl</i></td><td colspan="1" rowspan="1" align="left">Letter number</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>No</i></td><td colspan="1" rowspan="1" align="left">Other number</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>P</i></td><td colspan="1" rowspan="1" align="left">Punctuation</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pc</i></td><td colspan="1" rowspan="1" align="left">Connector punctuation</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pd</i></td><td colspan="1" rowspan="1" align="left">Dash punctuation</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pe</i></td><td colspan="1" rowspan="1" align="left">Close punctuation</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pf</i></td><td colspan="1" rowspan="1" align="left">Final punctuation</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pi</i></td><td colspan="1" rowspan="1" align="left">Initial punctuation</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Po</i></td><td colspan="1" rowspan="1" align="left">Other punctuation</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Ps</i></td><td colspan="1" rowspan="1" align="left">Open punctuation</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>S</i></td><td colspan="1" rowspan="1" align="left">Symbol</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Sc</i></td><td colspan="1" rowspan="1" align="left">Currency symbol</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Sk</i></td><td colspan="1" rowspan="1" align="left">Modifier symbol</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Sm</i></td><td colspan="1" rowspan="1" align="left">Mathematical symbol</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>So</i></td><td colspan="1" rowspan="1" align="left">Other symbol</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Z</i></td><td colspan="1" rowspan="1" align="left">Separator</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Zl</i></td><td colspan="1" rowspan="1" align="left">Line separator</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Zp</i></td><td colspan="1" rowspan="1" align="left">Paragraph separator</td></tr> <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Zs</i></td><td colspan="1" rowspan="1" align="left">Space separator</td></tr> </tbody> </colgroup> </table> <p class="para"> Extended properties such as "Greek" or "InMusicalSymbols" are not supported by PCRE. </p> <p class="para"> Specifying caseless matching does not affect these escape sequences. For example, <i>\p{Lu}</i> always matches only upper case letters. </p> <p class="para"> The <i>\X</i> escape matches any number of Unicode characters that form an extended Unicode sequence. <i>\X</i> is equivalent to <i>(?>\PM\pM*)</i>. </p> <p class="para"> That is, it matches a character without the "mark" property, followed by zero or more characters with the "mark" property, and treats the sequence as an atomic group (see below). Characters with the "mark" property are typically accents that affect the preceding character. </p> <p class="para"> Matching characters by Unicode property is not fast, because PCRE has to search a structure that contains data for over fifteen thousand characters. That is why the traditional escape sequences such as <i>\d</i> and <i>\w</i> do not use Unicode properties in PCRE. </p> </div> <div id="regexp.reference.circudollar" class="section"> <h2 class="title">Circumflex and dollar</h2> <p class="para"> Outside a character class, in the default matching mode, the circumflex character is an assertion which is true only if the current matching point is at the start of the subject string. Inside a character class, circumflex has an entirely different meaning (see below). </p> <p class="para"> Circumflex need not be the first character of the pattern if a number of alternatives are involved, but it should be the first thing in each alternative in which it appears if the pattern is ever to match that branch. If all possible alternatives start with a circumflex, that is, if the pattern is constrained to match only at the start of the subject, it is said to be an "anchored" pattern. (There are also other constructs that can cause a pattern to be anchored.) </p> <p class="para"> A dollar character is an assertion which is <b><tt>TRUE</tt></b> only if the current matching point is at the end of the subject string, or immediately before a newline character that is the last character in the string (by default). Dollar need not be the last character of the pattern if a number of alternatives are involved, but it should be the last item in any branch in which it appears. Dollar has no special meaning in a character class. </p> <p class="para"> The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOLLAR_ENDONLY</a> option at compile or matching time. This does not affect the \Z assertion. </p> <p class="para"> The meanings of the circumflex and dollar characters are changed if the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> option is set. When this is the case, they match immediately after and immediately before an internal "\n" character, respectively, in addition to matching at the start and end of the subject string. For example, the pattern /^abc$/ matches the subject string "def\nabc" in multiline mode, but not otherwise. Consequently, patterns that are anchored in single line mode because all branches start with "^" are not anchored in multiline mode. The <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOLLAR_ENDONLY</a> option is ignored if <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> is set. </p> <p class="para"> Note that the sequences \A, \Z, and \z can be used to match the start and end of the subject in both modes, and if all branches of a pattern start with \A is it always anchored, whether <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> is set or not. </p> </div> <div id="regexp.reference.dot" class="section"> <h2 class="title">Full stop</h2> <p class="para"> Outside a character class, a dot in the pattern matches any one character in the subject, including a non-printing character, but not (by default) newline. If the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a> option is set, then dots match newlines as well. The handling of dot is entirely independent of the handling of circumflex and dollar, the only relationship being that they both involve newline characters. Dot has no special meaning in a character class. </p> <p class="para"> <em class="emphasis">\C</em> can be used to match single byte. It makes sense in <em class="emphasis">UTF-8 mode</em> where full stop matches the whole character which can consist of multiple bytes. </p> </div> <div id="regexp.reference.squarebrackets" class="section">
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -