📄 regexp.reference.html

📁 php的帮助文档,涉及到PHP的案例和基本语法,以及实际应用内容
💻 HTML
📖 第 1 页 / 共 5 页
字号:
      <dt>       <span class="term"><em class="emphasis">\Z</em></span>       <dd>        <span class="simpara">         end of subject or newline at end (independent of         multiline mode)        </span>       </dd>      </dt>      <dt>       <span class="term"><em class="emphasis">\z</em></span>       <dd><span class="simpara">end of subject (independent of multiline mode)</span></dd>      </dt>      <dt>       <span class="term"><em class="emphasis">\G</em></span>       <dd><span class="simpara">first matching position in subject</span></dd>      </dt>     </dl>    </p>    <p class="para">     These assertions may not appear in  character  classes  (but     note that &quot;<i>\b</i>&quot; has a different meaning, namely the backspace     character, inside a character class).    </p>    <p class="para">     A word boundary is a position in the subject string where     the current character and the previous character do not both     match <i>\w</i> or <i>\W</i> (i.e. one matches      <i>\w</i> and  the  other  matches     <i>\W</i>), or the start or end of the string if the first     or last character matches <i>\w</i>, respectively.    </p>    <p class="para">     The <i>\A</i>, <i>\Z</i>, and     <i>\z</i> assertions differ  from  the  traditional     circumflex  and  dollar  (described below) in that they only     ever match at the very start and end of the subject  string,     whatever  options  are  set.  They  are  not affected by the     <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> or     <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOLLAR_ENDONLY</a>     options. The  difference  between <i>\Z</i> and     <i>\z</i>  is that <i>\Z</i> matches before a     newline that is the last character of the string as well as at the end of     the string, whereas <i>\z</i> matches only at the end.     </p>     <p class="para">      The <i>\G</i> assertion is true only when the current      matching position is at the start point of the match, as specified by      the <i><tt class="parameter">offset</tt></i> argument of      <a href="function.preg-match.html" class="function">preg_match()</a>. It differs from <i>\A</i>      when the value of <i><tt class="parameter">offset</tt></i> is non-zero.      It is available since PHP 4.3.3.     </p>          <p class="para">      <i>\Q</i> and <i>\E</i> can be used to ignore      regexp metacharacters in the pattern since PHP 4.3.3. For example:      <i>\w+\Q.$.\E$</i> will match one or more word characters,      followed by literals <i>.$.</i> and anchored at the end of      the string.     </p>          <p class="para">      <i>\K</i> can be used to reset the match start since      PHP 5.2.4. For example, the pattern <i>foo\Kbar</i> matches      &quot;foobar&quot;, but reports that it has matched &quot;bar&quot;. The use of      <i>\K</i> does not interfere with the setting of captured      substrings. For example, when the pattern <i>(foo)\Kbar</i>      matches &quot;foobar&quot;, the first substring is still set to &quot;foo&quot;.     </p>         </div>     <div id="regexp.reference.unicode" class="section">     <h2 class="title">Unicode character properties</h2>     <p class="para">      Since PHP 4.4.0 and 5.1.0, three      additional escape sequences to match generic character types are available      when <em class="emphasis">UTF-8 mode</em> is selected. They are:     </p>     <dl>      <dt>       <span class="term"><em class="emphasis">\p{xx}</em></span>       <dd><span class="simpara">a character with the xx property</span></dd>      </dt>      <dt>       <span class="term"><em class="emphasis">\P{xx}</em></span>       <dd><span class="simpara">a character without the xx property</span></dd>      </dt>      <dt>       <span class="term"><em class="emphasis">\X</em></span>       <dd><span class="simpara">an extended Unicode sequence</span></dd>      </dt>     </dl>     <p class="para">      The property names represented by <i>xx</i> above are limited       to the Unicode general category properties. Each character has exactly one       such property, specified by a two-letter abbreviation. For compatibility with      Perl, negation can be specified by including a circumflex between the      opening brace and the property name. For example, <i>\p{^Lu}</i>       is the same as <i>\P{Lu}</i>.     </p>     <p class="para">      If only one letter is specified with <i>\p</i> or       <i>\P</i>, it includes all the properties that start with that      letter. In this case, in the absence of negation, the curly brackets in the       escape sequence are optional; these two examples have the same effect:     </p>     <pre class="literallayout">      \p{L}      \pL     </pre>     <table border="5">      <caption><b>Supported property codes</b></caption>      <colgroup>       <tbody valign="middle" class="tbody">        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>C</i></td><td colspan="1" rowspan="1" align="left">Other</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Cc</i></td><td colspan="1" rowspan="1" align="left">Control</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Cf</i></td><td colspan="1" rowspan="1" align="left">Format</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Cn</i></td><td colspan="1" rowspan="1" align="left">Unassigned</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Co</i></td><td colspan="1" rowspan="1" align="left">Private use</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Cs</i></td><td colspan="1" rowspan="1" align="left">Surrogate</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>L</i></td><td colspan="1" rowspan="1" align="left">Letter</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Ll</i></td><td colspan="1" rowspan="1" align="left">Lower case letter</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Lm</i></td><td colspan="1" rowspan="1" align="left">Modifier letter</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Lo</i></td><td colspan="1" rowspan="1" align="left">Other letter</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Lt</i></td><td colspan="1" rowspan="1" align="left">Title case letter</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Lu</i></td><td colspan="1" rowspan="1" align="left">Upper case letter</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>M</i></td><td colspan="1" rowspan="1" align="left">Mark</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Mc</i></td><td colspan="1" rowspan="1" align="left">Spacing mark</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Me</i></td><td colspan="1" rowspan="1" align="left">Enclosing mark</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Mn</i></td><td colspan="1" rowspan="1" align="left">Non-spacing mark</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>N</i></td><td colspan="1" rowspan="1" align="left">Number</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Nd</i></td><td colspan="1" rowspan="1" align="left">Decimal number</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Nl</i></td><td colspan="1" rowspan="1" align="left">Letter number</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>No</i></td><td colspan="1" rowspan="1" align="left">Other number</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>P</i></td><td colspan="1" rowspan="1" align="left">Punctuation</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pc</i></td><td colspan="1" rowspan="1" align="left">Connector punctuation</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pd</i></td><td colspan="1" rowspan="1" align="left">Dash punctuation</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pe</i></td><td colspan="1" rowspan="1" align="left">Close punctuation</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pf</i></td><td colspan="1" rowspan="1" align="left">Final punctuation</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Pi</i></td><td colspan="1" rowspan="1" align="left">Initial punctuation</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Po</i></td><td colspan="1" rowspan="1" align="left">Other punctuation</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Ps</i></td><td colspan="1" rowspan="1" align="left">Open punctuation</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>S</i></td><td colspan="1" rowspan="1" align="left">Symbol</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Sc</i></td><td colspan="1" rowspan="1" align="left">Currency symbol</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Sk</i></td><td colspan="1" rowspan="1" align="left">Modifier symbol</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Sm</i></td><td colspan="1" rowspan="1" align="left">Mathematical symbol</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>So</i></td><td colspan="1" rowspan="1" align="left">Other symbol</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Z</i></td><td colspan="1" rowspan="1" align="left">Separator</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Zl</i></td><td colspan="1" rowspan="1" align="left">Line separator</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Zp</i></td><td colspan="1" rowspan="1" align="left">Paragraph separator</td></tr>        <tr valign="middle"><td colspan="1" rowspan="1" align="left"><i>Zs</i></td><td colspan="1" rowspan="1" align="left">Space separator</td></tr>       </tbody>      </colgroup>     </table>     <p class="para">      Extended properties such as &quot;Greek&quot; or &quot;InMusicalSymbols&quot; are not      supported by PCRE.     </p>     <p class="para">      Specifying caseless matching does not affect these escape sequences.      For example, <i>\p{Lu}</i> always matches only upper case letters.     </p>     <p class="para">      The <i>\X</i> escape matches any number of Unicode characters       that form an extended Unicode sequence. <i>\X</i> is equivalent       to <i>(?&gt;\PM\pM*)</i>.     </p>     <p class="para">      That is, it matches a character without the &quot;mark&quot; property, followed      by zero or more characters with the &quot;mark&quot; property, and treats the      sequence as an atomic group (see below). Characters with the &quot;mark&quot;      property are typically accents that affect the preceding character.     </p>     <p class="para">      Matching characters by Unicode property is not fast, because PCRE has      to search a structure that contains data for over fifteen thousand      characters. That is why the traditional escape sequences such as       <i>\d</i> and <i>\w</i> do not use Unicode properties       in PCRE.     </p>    </div>    <div id="regexp.reference.circudollar" class="section">     <h2 class="title">Circumflex and dollar</h2>     <p class="para">      Outside a character class, in the default matching mode, the      circumflex  character  is an assertion which is true only if      the current matching point is at the start  of  the  subject      string. Inside a character class, circumflex has an entirely      different meaning (see below).     </p>     <p class="para">      Circumflex need not be the first character of the pattern if      a number of alternatives are involved, but it should be the      first thing in each alternative in which it appears  if  the      pattern is ever to match that branch. If all possible      alternatives start with a circumflex, that is, if the pattern is      constrained to match only at the start of the subject, it is      said to be an &quot;anchored&quot; pattern. (There are also other      constructs that can cause a pattern to be anchored.)     </p>     <p class="para">      A dollar character is an assertion which is <b><tt>TRUE</tt></b> only if the      current  matching point is at the end of the subject string,      or immediately before a newline character that is  the  last      character in the string (by default). Dollar need not be the      last character of the pattern if a  number  of  alternatives      are  involved,  but it should be the last item in any branch      in which it appears.  Dollar has no  special  meaning  in  a      character class.     </p>     <p class="para">      The meaning of dollar can be changed so that it matches only      at the very end of the string, by setting the      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOLLAR_ENDONLY</a>      option at compile or matching time. This does not affect the \Z assertion.     </p>     <p class="para">      The meanings of the circumflex and dollar characters are      changed if the      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> option      is set. When this is the case, they match immediately after and      immediately before an internal &quot;\n&quot; character, respectively, in addition      to matching at the start and end of the subject string. For example, the      pattern /^abc$/ matches the subject string &quot;def\nabc&quot; in multiline mode,      but not otherwise. Consequently, patterns that are anchored in single      line mode because all branches start with &quot;^&quot; are not anchored in      multiline mode. The      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOLLAR_ENDONLY</a>      option is ignored if      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a> is      set.     </p>     <p class="para">      Note that the sequences \A, \Z, and \z can be used to  match      the  start  and end of the subject in both modes, and if all      branches of a pattern start with \A is it  always  anchored,      whether <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a>        is set or not.     </p>    </div>    <div id="regexp.reference.dot" class="section">     <h2 class="title">Full stop</h2>     <p class="para">     Outside a character class, a dot in the pattern matches  any     one  character  in  the  subject,  including  a non-printing     character, but not (by default) newline.  If the     <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>      option  is  set,  then dots match newlines as well. The     handling of dot is entirely independent of the handling of     circumflex  and  dollar,  the only relationship being that they     both involve newline characters.  Dot has no special meaning     in a character class.     </p>     <p class="para">      <em class="emphasis">\C</em> can be used to match single byte. It makes sense      in <em class="emphasis">UTF-8 mode</em> where full stop matches the whole      character which can consist of multiple bytes.     </p>    </div>    <div id="regexp.reference.squarebrackets" class="section">
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -