⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pcre.txt

📁 ncbi源码
💻 TXT
📖 第 1 页 / 共 5 页
字号:
     tern that do not cause substrings to be captured.     Captured substrings are returned to the caller via a  vector     of  integer  offsets whose address is passed in ovector. The     number of elements in the vector is passed in ovecsize.  The     first two-thirds of the vector is used to pass back captured     substrings, each substring using a  pair  of  integers.  The     remaining  third  of  the  vector  is  used  as workspace by     pcre_exec() while matching capturing subpatterns, and is not     available for passing back information. The length passed in     ovecsize should always be a multiple of three. If it is not,     it is rounded down.     When a match has been successful, information about captured     substrings is returned in pairs of integers, starting at the     beginning of ovector, and continuing up to two-thirds of its     length  at  the  most. The first element of a pair is set to     the offset of the first character in a  substring,  and  the     second is set to the offset of the first character after the     end of a substring. The first  pair,  ovector[0]  and  ovec-     tor[1],  identify  the portion of the subject string matched     by the entire pattern. The next pair is used for  the  first     capturing  subpattern,  and  so  on.  The  value returned by     pcre_exec() is the number of pairs that have  been  set.  If     there  are no capturing subpatterns, the return value from a     successful match is 1, indicating that just the  first  pair     of offsets has been set.     Some convenience functions are provided for  extracting  the     captured substrings as separate strings. These are described     in the following section.     It is possible for an capturing  subpattern  number  n+1  to     match  some  part  of  the subject when subpattern n has not     been used at all.  For  example,  if  the  string  "abc"  is     matched  against the pattern (a|(z))(bc) subpatterns 1 and 3     are matched, but 2 is not. When this  happens,  both  offset     values corresponding to the unused subpattern are set to -1.     If a capturing subpattern is matched repeatedly, it  is  the     last  portion  of  the  string  that  it  matched  that gets     returned.     If the vector is too small to hold  all  the  captured  sub-     strings,  it is used as far as possible (up to two-thirds of     its length), and the function returns a value  of  zero.  In     particular,  if  the  substring offsets are not of interest,     pcre_exec() may be called with ovector passed  as  NULL  and     ovecsize  as  zero.  However,  if  the pattern contains back     references and the ovector isn't big enough to remember  the     related  substrings,  PCRE  has to get additional memory for     use during matching. Thus it is usually advisable to  supply     an ovector.     Note that pcre_info() can be used to find out how many  cap-     turing  subpatterns  there  are  in  a compiled pattern. The     smallest size for ovector that will  allow  for  n  captured     substrings  in  addition  to  the  offsets  of the substring     matched by the whole pattern is (n+1)*3.     If pcre_exec() fails, it returns a negative number. The fol-     lowing are defined in the header file:       PCRE_ERROR_NOMATCH        (-1)     The subject string did not match the pattern.       PCRE_ERROR_NULL           (-2)     Either code or subject was passed as NULL,  or  ovector  was     NULL and ovecsize was not zero.       PCRE_ERROR_BADOPTION      (-3)     An unrecognized bit was set in the options argument.       PCRE_ERROR_BADMAGIC       (-4)     PCRE stores a 4-byte "magic number" at the start of the com-     piled  code,  to  catch  the  case  when it is passed a junk     pointer. This is the error it gives when  the  magic  number     isn't present.       PCRE_ERROR_UNKNOWN_NODE   (-5)     While running the pattern match, an unknown item was encoun-     tered in the compiled pattern. This error could be caused by     a bug in PCRE or by overwriting of the compiled pattern.       PCRE_ERROR_NOMEMORY       (-6)     If a pattern contains back references, but the ovector  that     is  passed  to pcre_exec() is not big enough to remember the     referenced substrings, PCRE gets a block of  memory  at  the     start  of  matching to use for this purpose. If the call via     pcre_malloc() fails, this error  is  given.  The  memory  is     freed at the end of matching.EXTRACTING CAPTURED SUBSTRINGS     Captured substrings can be accessed directly  by  using  the     offsets returned by pcre_exec() in ovector. For convenience,     the functions  pcre_copy_substring(),  pcre_get_substring(),     and  pcre_get_substring_list()  are  provided for extracting     captured  substrings  as  new,   separate,   zero-terminated     strings.   A  substring  that  contains  a  binary  zero  is     correctly extracted and has a further zero added on the end,     but the result does not, of course, function as a C string.     The first three arguments are the same for all  three  func-     tions:  subject  is  the  subject string which has just been     successfully matched, ovector is a pointer to the vector  of     integer   offsets   that  was  passed  to  pcre_exec(),  and     stringcount is the number of substrings that  were  captured     by  the  match,  including  the  substring  that matched the     entire regular expression. This is  the  value  returned  by     pcre_exec  if  it  is  greater  than  zero.  If  pcre_exec()     returned zero, indicating that it ran out of space in  ovec-     tor,  the  value passed as stringcount should be the size of     the vector divided by three.     The functions pcre_copy_substring() and pcre_get_substring()     extract a single substring, whose number is given as string-     number. A value of zero extracts the substring that  matched     the entire pattern, while higher values extract the captured     substrings. For pcre_copy_substring(), the string is  placed     in  buffer,  whose  length is given by buffersize, while for     pcre_get_substring() a new block of memory is  obtained  via     pcre_malloc,  and its address is returned via stringptr. The     yield of the function is  the  length  of  the  string,  not     including the terminating zero, or one of       PCRE_ERROR_NOMEMORY       (-6)     The buffer was too small for pcre_copy_substring(),  or  the     attempt to get memory failed for pcre_get_substring().       PCRE_ERROR_NOSUBSTRING    (-7)     There is no substring whose number is stringnumber.     The pcre_get_substring_list() function extracts  all  avail-     able  substrings  and builds a list of pointers to them. All     this is done in a single block of memory which  is  obtained     via pcre_malloc. The address of the memory block is returned     via listptr, which is also the start of the list  of  string     pointers.  The  end of the list is marked by a NULL pointer.     The yield of the function is zero if all went well, or       PCRE_ERROR_NOMEMORY       (-6)     if the attempt to get the memory block failed.     When any of these functions encounter a  substring  that  is     unset, which can happen when capturing subpattern number n+1     matches some part of the subject, but subpattern n  has  not     been  used  at all, they return an empty string. This can be     distinguished  from  a  genuine  zero-length  substring   by     inspecting the appropriate offset in ovector, which is nega-     tive for unset substrings.     The  two  convenience  functions  pcre_free_substring()  and     pcre_free_substring_list()  can  be  used to free the memory     returned by  a  previous  call  of  pcre_get_substring()  or     pcre_get_substring_list(),  respectively.  They  do  nothing     more than call the function pointed to by  pcre_free,  which     of  course  could  be called directly from a C program. How-     ever, PCRE is used in some situations where it is linked via     a  special  interface  to another programming language which     cannot use pcre_free directly; it is for  these  cases  that     the functions are provided.LIMITATIONS     There are some size limitations in PCRE but it is hoped that     they will never in practice be relevant.  The maximum length     of a compiled pattern is 65539 (sic) bytes.  All  values  in     repeating  quantifiers  must be less than 65536.  There max-     imum number of capturing subpatterns is 65535.  There is  no     limit  to  the  number of non-capturing subpatterns, but the     maximum depth of nesting of all kinds of parenthesized  sub-     pattern,  including  capturing  subpatterns, assertions, and     other types of subpattern, is 200.     The maximum length of a subject string is the largest  posi-     tive number that an integer variable can hold. However, PCRE     uses recursion to handle subpatterns and indefinite  repeti-     tion.  This  means  that the available stack space may limit     the size of a subject string that can be processed  by  cer-     tain patterns.DIFFERENCES FROM PERL     The differences described here  are  with  respect  to  Perl     5.005.     1. By default, a whitespace character is any character  that     the  C  library  function isspace() recognizes, though it is     possible to compile PCRE  with  alternative  character  type     tables. Normally isspace() matches space, formfeed, newline,     carriage return, horizontal tab, and vertical tab. Perl 5 no     longer  includes vertical tab in its set of whitespace char-     acters. The \v escape that was in the Perl documentation for     a long time was never in fact recognized. However, the char-     acter itself was treated as whitespace at least up to 5.002.     In 5.004 and 5.005 it does not match \s.     2. PCRE does  not  allow  repeat  quantifiers  on  lookahead     assertions. Perl permits them, but they do not mean what you     might think. For example, (?!a){3} does not assert that  the     next  three characters are not "a". It just asserts that the     next character is not "a" three times.     3. Capturing subpatterns that occur inside  negative  looka-     head  assertions  are  counted,  but  their  entries  in the     offsets vector are never set. Perl sets its numerical  vari-     ables  from  any  such  patterns that are matched before the     assertion fails to match something (thereby succeeding), but     only  if  the negative lookahead assertion contains just one     branch.     4. Though binary zero characters are supported in  the  sub-     ject  string,  they  are  not  allowed  in  a pattern string     because it is passed as a normal  C  string,  terminated  by     zero. The escape sequence "\0" can be used in the pattern to     represent a binary zero.     5. The following Perl escape sequences  are  not  supported:     \l,  \u,  \L,  \U,  \E, \Q. In fact these are implemented by     Perl's general string-handling and are not part of its  pat-     tern matching engine.     6. The Perl \G assertion is  not  supported  as  it  is  not     relevant to single pattern matches.     7. Fairly obviously, PCRE does not support the (?{code}) and     (?p{code})  constructions. However, there is some experimen-     tal support for recursive patterns using the  non-Perl  item     (?R).     8. There are at the time of writing some  oddities  in  Perl     5.005_02  concerned  with  the  settings of captured strings     when part of a pattern is repeated.  For  example,  matching     "aba"  against the pattern /^(a(b)?)+$/ sets $2 to the value     "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves  $2     unset.    However,    if   the   pattern   is   changed   to     /^(aa(b(b))?)+$/ then $2 (and $3) are set.     In Perl 5.004 $2 is set in both cases, and that is also true     of PCRE. If in the future Perl changes to a consistent state     that is different, PCRE may change to follow.     9. Another as yet unresolved discrepancy  is  that  in  Perl     5.005_02  the  pattern /^(a)?(?(1)a|b)+$/ matches the string     "a", whereas in PCRE it does not.  However, in both Perl and     PCRE /^(a)?a/ matched against "a" leaves $1 unset.     10. PCRE  provides  some  extensions  to  the  Perl  regular     expression facilities:     (a) Although lookbehind assertions must match  fixed  length     strings,  each  alternative branch of a lookbehind assertion     can match a different length of string. Perl 5.005  requires     them all to have the same length.     (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is  not     set,  the  $ meta- character matches only at the very end of     the string.     (c) If PCRE_EXTRA is set, a backslash followed by  a  letter     with no special meaning is faulted.     (d) If PCRE_UNGREEDY is set, the greediness of  the  repeti-     tion  quantifiers  is inverted, that is, by default they are     not greedy, but if followed by a question mark they are.     (e) PCRE_ANCHORED can be used to force a pattern to be tried     only at the start of the subject.     (f) The PCRE_NOTBOL, PCRE_NOTEOL, and PCRE_NOTEMPTY  options     for pcre_exec() have no Perl equivalents.     (g) The (?R) construct allows for recursive pattern matching     (Perl  5.6 can do this using the (?p{code}) construct, which     PCRE cannot of course support.)REGULAR EXPRESSION DETAILS     The syntax and semantics of  the  regular  expressions  sup-     ported  by PCRE are described below. Regular expressions are

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -