📄 regexp.reference.html

📁 php的帮助文档,涉及到PHP的案例和基本语法,以及实际应用内容
💻 HTML
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
     The quantifier {0} is permitted, causing the  expression  to     behave  as  if the previous item and the quantifier were not     present.    </p>    <p class="para">     For convenience (and  historical  compatibility)  the  three     most common quantifiers have single-character abbreviations:     <table border="5">      <caption><b>Single-character quantifiers</b></caption>      <colgroup>       <tbody valign="middle" class="tbody">        <tr valign="middle">         <td colspan="1" rowspan="1" align="left"><i>*</i></td>         <td colspan="1" rowspan="1" align="left">equivalent to <i>{0,}</i></td>        </tr>        <tr valign="middle">         <td colspan="1" rowspan="1" align="left"><i>+</i></td>         <td colspan="1" rowspan="1" align="left">equivalent to <i>{1,}</i></td>        </tr>        <tr valign="middle">         <td colspan="1" rowspan="1" align="left"><i>?</i></td>         <td colspan="1" rowspan="1" align="left">equivalent to <i>{0,1}</i></td>        </tr>       </tbody>      </colgroup>     </table>    </p>    <p class="para">     It is possible to construct infinite loops  by  following  a     subpattern  that  can  match no characters with a quantifier     that has no upper limit, for example:       <i>(a?)*</i>    </p>    <p class="para">     Earlier versions of Perl and PCRE used to give an  error  at     compile  time  for such patterns. However, because there are     cases where this  can  be  useful,  such  patterns  are  now     accepted,  but  if  any repetition of the subpattern does in     fact match no characters, the loop is forcibly broken.    </p>    <p class="para">     By default, the quantifiers  are  &quot;greedy&quot;,  that  is,  they     match  as much as possible (up to the maximum number of permitted     times), without causing the rest of  the  pattern  to     fail. The classic example of where this gives problems is in     trying to match comments in C programs. These appear between     the  sequences /* and */ and within the sequence, individual     * and / characters may appear. An attempt to  match  C  comments     by applying the pattern       <i>/\*.*\*/</i>     to the string       <i>/* first comment */  not comment  /* second comment */</i>     fails, because it matches  the  entire  string  due  to  the     greediness of the .*  item.    </p>    <p class="para">     However, if a quantifier is followed  by  a  question  mark,     then it ceases to be greedy, and instead matches the minimum     number of times possible, so the pattern       <i>/\*.*?\*/</i>     does the right thing with the C comments. The meaning of the     various  quantifiers is not otherwise changed, just the preferred     number of matches.  Do not confuse this use of     question  mark  with  its  use as a quantifier in its own right.     Because it has two uses, it can sometimes appear doubled, as     in       <i>\d??\d</i>     which matches one digit by preference, but can match two  if     that is the only way the rest of the pattern matches.    </p>    <p class="para">     If the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_UNGREEDY</a>       option is set (an option which  is  not     available  in  Perl)  then the quantifiers are not greedy by     default, but individual ones can be made greedy by following     them  with  a  question mark. In other words, it inverts the     default behaviour.    </p>    <p class="para">     Quantifiers followed by <i>+</i> are &quot;possessive&quot;. They eat     as many characters as possible and don&#039;t return to match the rest of the     pattern. Thus <i>.*abc</i> matches &quot;aabc&quot; but     <i>.*+abc</i> doesn&#039;t because <i>.*+</i> eats the     whole string. Possessive quantifiers can be used to speed up processing      since PHP 4.3.3.    </p>    <p class="para">     When a parenthesized subpattern is quantified with a minimum     repeat  count  that is greater than 1 or with a limited maximum,     more store is required for the  compiled  pattern,  in     proportion to the size of the minimum or maximum.    </p>    <p class="para">     If a pattern starts with .* or  .{0,}  and  the  <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>      option (equivalent to Perl&#039;s /s) is set, thus allowing the .     to match newlines, then the pattern is implicitly  anchored,     because whatever follows will be tried against every character     position in the subject string, so there is no point  in     retrying  the overall match at any position after the first.     PCRE treats such a pattern as though it were preceded by \A.     In  cases where it is known that the subject string contains     no newlines, it is worth setting <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>  when  the       pattern begins with .* in order to     obtain this optimization, or     alternatively using ^ to indicate anchoring explicitly.    </p>    <p class="para">     When a capturing subpattern is repeated, the value  captured     is the substring that matched the final iteration. For example, after       <i>(tweedle[dume]{3}\s*)+</i>     has matched &quot;tweedledum tweedledee&quot; the value  of  the  captured     substring  is  &quot;tweedledee&quot;.  However,  if  there are     nested capturing  subpatterns,  the  corresponding  captured     values  may  have been set in previous iterations. For example,     after            <i>/(a|(b))+/</i>     matches &quot;aba&quot; the value of the second captured substring  is     &quot;b&quot;.     </p>    </div>    <div id="regexp.reference.back-references" class="section">     <h2 class="title">Back references</h2>     <p class="para">     Outside a character class, a backslash followed by  a  digit     greater  than  0  (and  possibly  further  digits) is a back     reference to a capturing subpattern  earlier  (i.e.  to  its     left)  in  the  pattern,  provided there have been that many     previous capturing left parentheses.    </p>    <p class="para">     However, if the decimal number following  the  backslash  is     less  than  10,  it is always taken as a back reference, and     causes an error only if there are not  that  many  capturing     left  parentheses in the entire pattern. In other words, the     parentheses that are referenced need not be to the  left  of     the  reference  for  numbers  less  than 10. See the section     entitled &quot;Backslash&quot; above for further details of  the  handling     of digits following a backslash.    </p>    <p class="para">     A back reference matches whatever actually matched the  capturing     subpattern in the current subject string, rather than     anything matching the subpattern itself. So the pattern       <i>(sens|respons)e and \1ibility</i>     matches &quot;sense and sensibility&quot; and &quot;response and  responsibility&quot;,     but  not  &quot;sense  and  responsibility&quot;. If caseful     matching is in force at the time of the back reference, then     the case of letters is relevant. For example,       <i>((?i)rah)\s+\1</i>     matches &quot;rah rah&quot; and &quot;RAH RAH&quot;, but  not  &quot;RAH  rah&quot;,  even     though  the  original  capturing subpattern is matched caselessly.    </p>    <p class="para">     There may be more than one back reference to the  same  subpattern.     If  a  subpattern  has not actually been used in a     particular match, then any  back  references  to  it  always     fail. For example, the pattern       <i>(a|(bc))\2</i>     always fails if it starts to match  &quot;a&quot;  rather  than  &quot;bc&quot;.     Because  there  may  be up to 99 back references, all digits     following the backslash are taken as  part  of  a  potential     back reference number. If the pattern continues with a digit     character, then some delimiter must be used to terminate the     back reference. If the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_EXTENDED</a>  option      is set, this can be whitespace.  Otherwise an empty comment can be used.    </p>    <p class="para">     A back reference that occurs inside the parentheses to which     it  refers  fails when the subpattern is first used, so, for     example, (a\1) never matches.  However, such references  can     be useful inside repeated subpatterns. For example, the pattern       <i>(a|b\1)+</i>     matches any number of &quot;a&quot;s and also &quot;aba&quot;, &quot;ababaa&quot; etc.  At     each iteration of the subpattern, the back reference matches     the character string corresponding to  the  previous  iteration.     In order for this to work, the pattern must be such     that the first iteration does not need  to  match  the  back     reference.  This  can  be  done using alternation, as in the     example above, or by a quantifier with a minimum of zero.     </p>          <p class="para">      Back references to the named subpatterns can be achieved by      <i>(?P=name)</i> or, since PHP 5.2.4, also by      <i>\k&lt;name&gt;</i>, <i>\k&#039;name&#039;</i>,      <i>\k{name}</i> or <i>\g{name}</i>.     </p>    </div>    <div id="regexp.reference.assertions" class="section">     <h2 class="title">Assertions</h2>     <p class="para">     An assertion is  a  test  on  the  characters  following  or     preceding  the current matching point that does not actually     consume any characters. The simple assertions coded  as  \b,     \B,  \A,  \Z,  \z, ^ and $ are described above. More complicated     assertions are coded as  subpatterns.  There  are  two     kinds:  those that look ahead of the current position in the     subject string, and those that look behind it.    </p>    <p class="para">     An assertion subpattern is matched in the normal way, except     that  it  does not cause the current matching position to be     changed. Lookahead assertions start with  (?=  for  positive     assertions and (?! for negative assertions. For example,       <i>\w+(?=;)</i>     matches a word followed by a semicolon, but does not include     the semicolon in the match, and       <i>foo(?!bar)</i>     matches any occurrence of &quot;foo&quot;  that  is  not  followed  by     &quot;bar&quot;. Note that the apparently similar pattern       <i>(?!foo)bar</i>     does not find an occurrence of &quot;bar&quot;  that  is  preceded  by     something other than &quot;foo&quot;; it finds any occurrence of &quot;bar&quot;     whatsoever, because the assertion  (?!foo)  is  always  <b><tt>TRUE</tt></b>     when  the  next  three  characters  are  &quot;bar&quot;. A lookbehind     assertion is needed to achieve this effect.    </p>    <p class="para">     Lookbehind assertions start with (?&lt;=  for  positive  assertions     and (?&lt;! for negative assertions. For example,       <i>(?&lt;!foo)bar</i>     does find an occurrence of &quot;bar&quot; that  is  not  preceded  by     &quot;foo&quot;. The contents of a lookbehind assertion are restricted     such that all the strings  it  matches  must  have  a  fixed     length.  However, if there are several alternatives, they do     not all have to have the same fixed length. Thus       <i>(?&lt;=bullock|donkey)</i>     is permitted, but       <i>(?&lt;!dogs?|cats?)</i>     causes an error at compile time. Branches  that  match  different     length strings are permitted only at the top level of     a lookbehind assertion. This is an extension  compared  with     Perl  5.005,  which  requires all branches to match the same     length of string. An assertion such as       <i>(?&lt;=ab(c|de))</i>     is not permitted, because its single  top-level  branch  can     match two different lengths, but it is acceptable if rewritten     to use two top-level branches:       <i>(?&lt;=abc|abde)</i>     The implementation of lookbehind  assertions  is,  for  each     alternative,  to  temporarily move the current position back     by the fixed width and then  try  to  match.  If  there  are     insufficient  characters  before  the  current position, the     match is deemed to fail.  Lookbehinds  in  conjunction  with     once-only  subpatterns can be particularly useful for matching     at the ends of strings; an example is given at  the  end     of the section on once-only subpatterns.    </p>    <p class="para">     Several assertions (of any sort) may  occur  in  succession.     For example,       <i>(?&lt;=\d{3})(?&lt;!999)foo</i>     matches &quot;foo&quot; preceded by three digits that are  not  &quot;999&quot;.     Notice  that each of the assertions is applied independently     at the same point in the subject string. First  there  is  a     check  that  the  previous  three characters are all digits,     then there is a check that the same three characters are not     &quot;999&quot;.   This  pattern  does not match &quot;foo&quot; preceded by six     characters, the first of which are digits and the last three     of  which  are  not  &quot;999&quot;.  For  example,  it doesn&#039;t match     &quot;123abcfoo&quot;. A pattern to do that is       <i>(?&lt;=\d{3}...)(?&lt;!999)foo</i>    </p>    <p class="para">     This time the first assertion looks  at  the  preceding  six     characters,  checking  that  the first three are digits, and     then the second assertion checks that  the  preceding  three     characters are not &quot;999&quot;.    </p>    <p class="para">     Assertions can be nested in any combination. For example,       <i>(?&lt;=(?&lt;!foo)bar)baz</i>     matches an occurrence of &quot;baz&quot; that  is  preceded  by  &quot;bar&quot;     which in turn is not preceded by &quot;foo&quot;, while       <i>(?&lt;=\d{3}...(?&lt;!999))foo</i>     is another pattern which matches  &quot;foo&quot;  preceded  by  three     digits and any three characters that are not &quot;999&quot;.    </p>    <p class="para">     Assertion subpatterns are not capturing subpatterns, and may     not  be  repeated,  because  it makes no sense to assert the     same thing several times. If any kind of assertion  contains     capturing  subpatterns  within it, these are counted for the     purposes of numbering the capturing subpatterns in the whole     pattern.   However,  substring capturing is carried out only     for positive assertions, because it doe
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -