📄 regexp.reference.html

📁 php的帮助文档,涉及到PHP的案例和基本语法,以及实际应用内容
💻 HTML
📖 第 1 页 / 共 5 页
字号:
     <h2 class="title">Square brackets</h2>     <p class="para">      An opening square bracket introduces a character class,      terminated  by  a  closing  square  bracket.  A  closing square      bracket on its own is  not  special.  If  a  closing  square      bracket  is  required as a member of the class, it should be      the first data character in the class (after an initial      circumflex, if present) or escaped with a backslash.     </p>     <p class="para">      A character class matches a single character in the subject;      the  character  must  be in the set of characters defined by      the class, unless the first character in the class is a      circumflex,  in which case the subject character must not be in      the set defined by the class. If a  circumflex  is  actually      required  as  a  member  of  the class, ensure it is not the      first character, or escape it with a backslash.     </p>     <p class="para">      For example, the character class [aeiou] matches  any  lower      case vowel, while [^aeiou] matches any character that is not      a lower case vowel. Note that a circumflex is  just  a      convenient  notation for specifying the characters which are in      the class by enumerating those that are not. It  is  not  an      assertion:  it  still  consumes a character from the subject      string, and fails if the current pointer is at  the  end  of      the string.     </p>     <p class="para">      When caseless matching  is  set,  any  letters  in  a  class      represent  both their upper case and lower case versions, so      for example, a caseless [aeiou] matches &quot;A&quot; as well as  &quot;a&quot;,      and  a caseless [^aeiou] does not match &quot;A&quot;, whereas a      caseful version would.     </p>     <p class="para">      The newline character is never treated in any special way in      character  classes,  whatever the setting of the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>       or <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a>      options is. A class such as [^a] will always match a newline.     </p>     <p class="para">      The minus (hyphen) character can be used to specify a  range      of  characters  in  a  character  class.  For example, [d-m]      matches any letter between d and m, inclusive.  If  a  minus      character  is required in a class, it must be escaped with a      backslash or appear in a position where it cannot be      interpreted as indicating a range, typically as the first or last      character in the class.     </p>     <p class="para">      It is not possible to have the literal character &quot;]&quot; as  the      end  character  of  a  range.  A  pattern such as [W-]46] is      interpreted as a class of two characters (&quot;W&quot; and &quot;-&quot;)      followed by a literal string &quot;46]&quot;, so it would match &quot;W46]&quot; or      &quot;-46]&quot;. However, if the &quot;]&quot; is escaped with a  backslash  it      is  interpreted  as  the end of range, so [W-\]46] is      interpreted as a single class containing a range followed by  two      separate characters. The octal or hexadecimal representation      of &quot;]&quot; can also be used to end a range.     </p>     <p class="para">      Ranges operate in ASCII collating sequence. They can also be      used  for  characters  specified  numerically,  for  example      [\000-\037]. If a range that includes letters is  used  when      caseless  matching  is set, it matches the letters in either      case. For example, [W-c] is equivalent  to  [][\^_`wxyzabc],      matched  caselessly,  and  if  character tables for the &quot;fr&quot;      locale are in use, [\xc8-\xcb] matches accented E characters      in both cases.     </p>     <p class="para">      The character types \d, \D, \s, \S,  \w,  and  \W  may  also      appear  in  a  character  class, and add the characters that      they match to the class. For example, [\dABCDEF] matches any      hexadecimal  digit.  A  circumflex  can conveniently be used      with the upper case character types to specify a  more      restricted set of characters than the matching lower case type.      For example, the class [^\W_] matches any letter  or  digit,      but not underscore.     </p>     <p class="para">      All non-alphanumeric characters other than \,  -,  ^  (at  the      start)  and  the  terminating ] are non-special in character      classes, but it does no harm if they are escaped.     </p>    </div>    <div id="regexp.reference.verticalbar" class="section">     <h2 class="title">Vertical bar</h2>     <p class="para">     Vertical bar characters are  used  to  separate  alternative     patterns. For example, the pattern      <i>gilbert|sullivan</i>     matches either &quot;gilbert&quot; or &quot;sullivan&quot;. Any number of alternatives     may  appear,  and an empty alternative is permitted     (matching the empty string).   The  matching  process  tries     each  alternative in turn, from left to right, and the first     one that succeeds is used. If the alternatives are within  a     subpattern  (defined  below),  &quot;succeeds&quot; means matching the     rest of the main pattern as well as the alternative  in  the     subpattern.     </p>    </div>    <div id="regexp.reference.internal-options" class="section">     <h2 class="title">Internal option setting</h2>     <p class="para">      The settings of <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_CASELESS</a>,       <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a>,        <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>,      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_UNGREEDY</a>,      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTRA</a>,      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTENDED</a>      and PCRE_DUPNAMES can be changed from within the pattern by      a sequence of Perl option letters enclosed between &quot;(?&quot;  and      &quot;)&quot;. The option letters are:      <table border="5">       <caption><b>Internal option letters</b></caption>       <colgroup>        <tbody valign="middle" class="tbody">         <tr valign="middle">          <td colspan="1" rowspan="1" align="left"><i>i</i></td>          <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_CASELESS</a></td>         </tr>         <tr valign="middle">          <td colspan="1" rowspan="1" align="left"><i>m</i></td>          <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a></td>         </tr>         <tr valign="middle">          <td colspan="1" rowspan="1" align="left"><i>s</i></td>          <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a></td>         </tr>         <tr valign="middle">          <td colspan="1" rowspan="1" align="left"><i>x</i></td>          <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTENDED</a></td>         </tr>         <tr valign="middle">          <td colspan="1" rowspan="1" align="left"><i>U</i></td>          <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_UNGREEDY</a></td>         </tr>         <tr valign="middle">          <td colspan="1" rowspan="1" align="left"><i>X</i></td>          <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTRA</a></td>         </tr>         <tr valign="middle">          <td colspan="1" rowspan="1" align="left"><i>J</i></td>          <td colspan="1" rowspan="1" align="left">for <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_INFO_JCHANGED</a></td>         </tr>        </tbody>       </colgroup>      </table>     </p>     <p class="para">      For example, (?im) sets caseless, multiline matching. It  is      also possible to unset these options by preceding the letter      with a hyphen, and a combined setting and unsetting such  as      (?im-sx),  which sets <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_CASELESS</a> and      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a>      while unsetting <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a> and      <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTENDED</a>,      is also  permitted. If  a  letter  appears both before and after the      hyphen, the option is unset.     </p>     <p class="para">      When an option change occurs at top level (that is, not inside      subpattern parentheses), the change applies to the remainder of the      pattern that follows. So <i>/ab(?i)c/</i> matches only &quot;abc&quot;      and &quot;abC&quot;. This behaviour has been changed in PCRE 4.0, which is bundled      since PHP 4.3.3. Before those versions, <i>/ab(?i)c/</i> would      perform as <i>/abc/i</i> (e.g. matching &quot;ABC&quot; and &quot;aBc&quot;).     </p>     <p class="para">      If an option change occurs inside a subpattern,  the  effect      is  different.  This is a change of behaviour in Perl 5.005.      An option change inside a subpattern affects only that  part      of the subpattern that follows it, so        <i>(a(?i)b)c</i>      matches  abc  and  aBc  and  no  other   strings   (assuming <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_CASELESS</a> is not      used). By this means, options can be made to have different settings in      different parts of the pattern. Any changes made in one alternative do      carry on into subsequent branches within the same subpattern. For      example,        <i>(a(?i)b|c)</i>      matches &quot;ab&quot;, &quot;aB&quot;, &quot;c&quot;, and &quot;C&quot;, even though when  matching      &quot;C&quot; the first branch is abandoned before the option setting.      This is because the effects of  option  settings  happen  at      compile  time. There would be some very weird behaviour otherwise.     </p>     <p class="para">      The PCRE-specific options <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_UNGREEDY</a>  and        <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTRA</a>   can      be changed in the same way as the Perl-compatible options by      using the characters U and X  respectively.  The  (?X)  flag      setting  is  special in that it must always occur earlier in      the pattern than any of the additional features it turns on,      even when it is at top level. It is best put at the start.     </p>    </div>    <div id="regexp.reference.subpatterns" class="section">     <h2 class="title">Subpatterns</h2>     <p class="para">     Subpatterns are delimited by parentheses  (round  brackets),     which can be nested.  Marking part of a pattern as a subpattern     does two things:    </p>    <p class="para">     1. It localizes a set of alternatives. For example, the     pattern       <i>cat(aract|erpillar|)</i>     matches one of the words &quot;cat&quot;,  &quot;cataract&quot;,  or  &quot;caterpillar&quot;.     Without  the  parentheses, it would match &quot;cataract&quot;,     &quot;erpillar&quot; or the empty string.    </p>    <p class="para">     2. It sets up the subpattern as a capturing  subpattern  (as     defined  above).   When the whole pattern matches, that portion     of the subject string that matched  the  subpattern  is     passed  back  to  the  caller  via  the  <em class="emphasis">ovector</em>     argument of     <b>pcre_exec()</b>. Opening parentheses are counted     from  left  to right (starting from 1) to obtain the numbers of the     capturing subpatterns.    </p>    <p class="para">     For example, if the string &quot;the red king&quot; is matched against     the pattern       <i>the ((red|white) (king|queen))</i>     the captured substrings are &quot;red king&quot;, &quot;red&quot;,  and  &quot;king&quot;,     and are numbered 1, 2, and 3.    </p>    <p class="para">     The fact that plain parentheses fulfil two functions is  not     always  helpful.  There are often times when a grouping subpattern     is required without a capturing requirement.  If  an     opening parenthesis is followed by &quot;?:&quot;, the subpattern does     not do any capturing, and is not counted when computing  the     number of any subsequent capturing subpatterns. For example,     if the string &quot;the  white  queen&quot;  is  matched  against  the     pattern       <i>the ((?:red|white) (king|queen))</i>     the captured substrings are &quot;white queen&quot; and  &quot;queen&quot;,  and     are  numbered  1  and 2. The maximum number of captured substrings     is 99, and the maximum number  of  all  subpatterns,     both capturing and non-capturing, is 200.    </p>    <p class="para">     As a  convenient  shorthand,  if  any  option  settings  are     required  at  the  start  of a non-capturing subpattern, the     option letters may appear between the &quot;?&quot; and the &quot;:&quot;.  Thus     the two patterns    </p>    <pre class="literallayout">       (?i:saturday|sunday)       (?:(?i)saturday|sunday)    </pre>    <p class="para">     match exactly the same set of strings.  Because  alternative     branches  are  tried from left to right, and options are not     reset until the end of the subpattern is reached, an  option     setting  in  one  branch does affect subsequent branches, so     the above patterns match &quot;SUNDAY&quot; as well as &quot;Saturday&quot;.     </p>          <p class="para">      It is possible to name the subpattern with      <i>(?P&lt;name&gt;pattern)</i> since PHP 4.3.3. Array with       matches will contain the match indexed by the string alongside the match       indexed by a number, then.     </p>    </div>    <div id="regexp.reference.repetition" class="section">     <h2 class="title">Repetition</h2>     <p class="para">     Repetition is specified by quantifiers, which can follow any     of the following items:      <ul class="itemizedlist">       <li class="listitem"><span class="simpara">a single character, possibly escaped</span></li>       <li class="listitem"><span class="simpara">the . metacharacter</span></li>       <li class="listitem"><span class="simpara">a character class</span></li>       <li class="listitem"><span class="simpara">a back reference (see next section)</span></li>       <li class="listitem"><span class="simpara">a parenthesized subpattern (unless it is  an  assertion  -     see below)</span></li>      </ul>    </p>    <p class="para">     The general repetition quantifier specifies  a  minimum  and     maximum  number  of  permitted  matches,  by  giving the two     numbers in curly brackets (braces), separated  by  a  comma.     The  numbers  must be less than 65536, and the first must be     less than or equal to the second. For example:       <i>z{2,4}</i>     matches &quot;zz&quot;, &quot;zzz&quot;, or &quot;zzzz&quot;. A closing brace on  its  own     is not a special character. If the second number is omitted,     but the comma is present, there is no upper  limit;  if  the     second number and the comma are both omitted, the quantifier     specifies an exact number of required matches. Thus       <i>[aeiou]{3,}</i>     matches at least 3 successive vowels,  but  may  match  many     more, while       <i>\d{8}</i>     matches exactly 8 digits.  An  opening  curly  bracket  that     appears  in a position where a quantifier is not allowed, or     one that does not match the syntax of a quantifier, is taken     as  a literal character. For example, {,6} is not a quantifier,     but a literal string of four characters.    </p>    <p class="para">
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -