📄 pcre.txt

📁 Apache V2.0.15 Alpha For Linuxhttpd-2_0_15-alpha.tar.Z
💻 TXT
📖 第 1 页 / 共 5 页
字号:
     While running the pattern match, an unknown item was encoun-     tered in the compiled pattern. This error could be caused by     a bug in PCRE or by overwriting of the compiled pattern.       PCRE_ERROR_NOMEMORY       (-6)     If a pattern contains back references, but the ovector  that     is  passed  to pcre_exec() is not big enough to remember the     referenced substrings, PCRE gets a block of  memory  at  the     start  of  matching to use for this purpose. If the call via     pcre_malloc() fails, this error  is  given.  The  memory  is     freed at the end of matching.EXTRACTING CAPTURED SUBSTRINGS     Captured substrings can be accessed directly  by  using  the     offsets returned by pcre_exec() in ovector. For convenience,     the functions  pcre_copy_substring(),  pcre_get_substring(),     and  pcre_get_substring_list()  are  provided for extracting     captured  substrings  as  new,   separate,   zero-terminated     strings.   A  substring  that  contains  a  binary  zero  is     correctly extracted and has a further zero added on the end,     but the result does not, of course, function as a C string.     The first three arguments are the same for all  three  func-     tions:  subject  is  the  subject string which has just been     successfully matched, ovector is a pointer to the vector  of     integer   offsets   that  was  passed  to  pcre_exec(),  and     stringcount is the number of substrings that  were  captured     by  the  match,  including  the  substring  that matched the     entire regular expression. This is  the  value  returned  by     pcre_exec  if  it  is  greater  than  zero.  If  pcre_exec()     returned zero, indicating that it ran out of space in  ovec-     tor,  the  value passed as stringcount should be the size of     the vector divided by three.     The functions pcre_copy_substring() and pcre_get_substring()     extract a single substring, whose number is given as string-     number. A value of zero extracts the substring that  matched     the entire pattern, while higher values extract the captured     substrings. For pcre_copy_substring(), the string is  placed     in  buffer,  whose  length is given by buffersize, while for     pcre_get_substring() a new block of store  is  obtained  via     pcre_malloc,  and its address is returned via stringptr. The     yield of the function is  the  length  of  the  string,  not     including the terminating zero, or one of       PCRE_ERROR_NOMEMORY       (-6)     The buffer was too small for pcre_copy_substring(),  or  the     attempt to get memory failed for pcre_get_substring().       PCRE_ERROR_NOSUBSTRING    (-7)     There is no substring whose number is stringnumber.     The pcre_get_substring_list() function extracts  all  avail-     able  substrings  and builds a list of pointers to them. All     this is done in a single block of memory which  is  obtained     via pcre_malloc. The address of the memory block is returned     via listptr, which is also the start of the list  of  string     pointers.  The  end of the list is marked by a NULL pointer.     The yield of the function is zero if all went well, or       PCRE_ERROR_NOMEMORY       (-6)     if the attempt to get the memory block failed.     When any of these functions encounter a  substring  that  is     unset, which can happen when capturing subpattern number n+1     matches some part of the subject, but subpattern n  has  not     been  used  at all, they return an empty string. This can be     distinguished  from  a  genuine  zero-length  substring   by     inspecting the appropriate offset in ovector, which is nega-     tive for unset substrings.LIMITATIONS     There are some size limitations in PCRE but it is hoped that     they will never in practice be relevant.  The maximum length     of a compiled pattern is 65539 (sic) bytes.  All  values  in     repeating  quantifiers must be less than 65536.  The maximum     number of capturing subpatterns is 99.  The  maximum  number     of  all  parenthesized subpatterns, including capturing sub-     patterns, assertions, and other types of subpattern, is 200.     The maximum length of a subject string is the largest  posi-     tive number that an integer variable can hold. However, PCRE     uses recursion to handle subpatterns and indefinite  repeti-     tion.  This  means  that the available stack space may limit     the size of a subject string that can be processed  by  cer-     tain patterns.DIFFERENCES FROM PERL     The differences described here  are  with  respect  to  Perl     5.005.     1. By default, a whitespace character is any character  that     the  C  library  function isspace() recognizes, though it is     possible to compile PCRE  with  alternative  character  type     tables. Normally isspace() matches space, formfeed, newline,     carriage return, horizontal tab, and vertical tab. Perl 5 no     longer  includes vertical tab in its set of whitespace char-     acters. The \v escape that was in the Perl documentation for     a long time was never in fact recognized. However, the char-     acter itself was treated as whitespace at least up to 5.002.     In 5.004 and 5.005 it does not match \s.     2. PCRE does  not  allow  repeat  quantifiers  on  lookahead     assertions. Perl permits them, but they do not mean what you     might think. For example, (?!a){3} does not assert that  the     next  three characters are not "a". It just asserts that the     next character is not "a" three times.     3. Capturing subpatterns that occur inside  negative  looka-     head  assertions  are  counted,  but  their  entries  in the     offsets vector are never set. Perl sets its numerical  vari-     ables  from  any  such  patterns that are matched before the     assertion fails to match something (thereby succeeding), but     only  if  the negative lookahead assertion contains just one     branch.     4. Though binary zero characters are supported in  the  sub-     ject  string,  they  are  not  allowed  in  a pattern string     because it is passed as a normal  C  string,  terminated  by     zero. The escape sequence "\0" can be used in the pattern to     represent a binary zero.     5. The following Perl escape sequences  are  not  supported:     \l,  \u,  \L,  \U,  \E, \Q. In fact these are implemented by     Perl's general string-handling and are not part of its  pat-     tern matching engine.     6. The Perl \G assertion is  not  supported  as  it  is  not     relevant to single pattern matches.     7. Fairly obviously, PCRE does not support the (?{code}) and     (?p{code})  constructions. However, there is some experimen-     tal support for recursive patterns using the  non-Perl  item     (?R).     8. There are at the time of writing some  oddities  in  Perl     5.005_02  concerned  with  the  settings of captured strings     when part of a pattern is repeated.  For  example,  matching     "aba"  against the pattern /^(a(b)?)+$/ sets $2 to the value     "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves  $2     unset.    However,    if   the   pattern   is   changed   to     /^(aa(b(b))?)+$/ then $2 (and $3) are set.     In Perl 5.004 $2 is set in both cases, and that is also true     of PCRE. If in the future Perl changes to a consistent state     that is different, PCRE may change to follow.     9. Another as yet unresolved discrepancy  is  that  in  Perl     5.005_02  the  pattern /^(a)?(?(1)a|b)+$/ matches the string     "a", whereas in PCRE it does not.  However, in both Perl and     PCRE /^(a)?a/ matched against "a" leaves $1 unset.     10. PCRE  provides  some  extensions  to  the  Perl  regular     expression facilities:     (a) Although lookbehind assertions must match  fixed  length     strings,  each  alternative branch of a lookbehind assertion     can match a different length of string. Perl 5.005  requires     them all to have the same length.     (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is  not     set,  the  $ meta- character matches only at the very end of     the string.     (c) If PCRE_EXTRA is set, a backslash followed by  a  letter     with no special meaning is faulted.     (d) If PCRE_UNGREEDY is set, the greediness of  the  repeti-     tion  quantifiers  is inverted, that is, by default they are     not greedy, but if followed by a question mark they are.     (e) PCRE_ANCHORED can be used to force a pattern to be tried     only at the start of the subject.     (f) The PCRE_NOTBOL, PCRE_NOTEOL, and PCRE_NOTEMPTY  options     for pcre_exec() have no Perl equivalents.     (g) The (?R) construct allows for recursive pattern matching     (Perl  5.6 can do this using the (?p{code}) construct, which     PCRE cannot of course support.)REGULAR EXPRESSION DETAILS     The syntax and semantics of  the  regular  expressions  sup-     ported  by PCRE are described below. Regular expressions are     also described in the Perl documentation and in a number  of     other  books,  some  of which have copious examples. Jeffrey     Friedl's  "Mastering  Regular  Expressions",  published   by     O'Reilly  (ISBN  1-56592-257),  covers them in great detail.     The description here is intended as reference documentation.     A regular expression is a pattern that is matched against  a     subject string from left to right. Most characters stand for     themselves in a pattern, and match the corresponding charac-     ters in the subject. As a trivial example, the pattern       The quick brown fox     matches a portion of a subject string that is  identical  to     itself.  The  power  of  regular  expressions comes from the     ability to include alternatives and repetitions in the  pat-     tern.  These  are encoded in the pattern by the use of meta-     characters, which do not stand for  themselves  but  instead     are interpreted in some special way.     There are two different sets of meta-characters: those  that     are  recognized anywhere in the pattern except within square     brackets, and those that are recognized in square  brackets.     Outside square brackets, the meta-characters are as follows:       \      general escape character with several uses       ^      assert start of  subject  (or  line,  in  multiline     mode)       $      assert end of subject (or line, in multiline mode)       .      match any character except newline (by default)       [      start character class definition       |      start of alternative branch       (      start subpattern       )      end subpattern       ?      extends the meaning of (              also 0 or 1 quantifier              also quantifier minimizer       *      0 or more quantifier       +      1 or more quantifier       {      start min/max quantifier     Part of a pattern that is in square  brackets  is  called  a     "character  class".  In  a  character  class  the only meta-     characters are:       \      general escape character       ^      negate the class, but only if the first character       -      indicates character range       ]      terminates the character class     The following sections describe  the  use  of  each  of  the     meta-characters.BACKSLASH     The backslash character has several uses. Firstly, if it  is     followed  by  a  non-alphameric character, it takes away any     special  meaning  that  character  may  have.  This  use  of     backslash  as  an  escape  character applies both inside and     outside character classes.     For example, if you want to match a "*" character, you write     "\*" in the pattern. This applies whether or not the follow-     ing character would otherwise  be  interpreted  as  a  meta-     character,  so it is always safe to precede a non-alphameric     with "\" to specify that it stands for itself.  In  particu-     lar, if you want to match a backslash, you write "\\".     If a pattern is compiled with the PCRE_EXTENDED option, whi-     tespace in the pattern (other than in a character class) and     characters between a "#" outside a character class  and  the     next  newline  character  are ignored. An escaping backslash     can be used to include a whitespace or "#" character as part     of the pattern.     A second use of backslash provides a way  of  encoding  non-     printing  characters  in patterns in a visible manner. There     is no restriction on the appearance of non-printing  charac-     ters,  apart from the binary zero that terminates a pattern,     but when a pattern is being prepared by text editing, it  is     usually  easier to use one of the following escape sequences     than the binary character it represents:       \a     alarm, that is, the BEL character (hex 07)       \cx    "control-x", where x is any character       \e     escape (hex 1B)       \f     formfeed (hex 0C)       \n     newline (hex 0A)       \r     carriage return (hex 0D)       \t     tab (hex 09)       \xhh   character with hex code hh       \ddd   character with octal code ddd, or backreference     The precise effect of "\cx" is as follows: if "x" is a lower     case  letter,  it  is converted to upper case. Then bit 6 of     the character (hex 40) is inverted.  Thus "\cz" becomes  hex     1A, but "\c{" becomes hex 3B, while "\c;" becomes hex 7B.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -