⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pcre.3

📁 ncbi源码
💻 3
📖 第 1 页 / 共 5 页
字号:
Note that \fBpcre_info()\fR can be used to find out how many capturingsubpatterns there are in a compiled pattern. The smallest size for\fIovector\fR that will allow for \fIn\fR captured substrings in addition tothe offsets of the substring matched by the whole pattern is (\fIn\fR+1)*3.If \fBpcre_exec()\fR fails, it returns a negative number. The following aredefined in the header file:  PCRE_ERROR_NOMATCH        (-1)The subject string did not match the pattern.  PCRE_ERROR_NULL           (-2)Either \fIcode\fR or \fIsubject\fR was passed as NULL, or \fIovector\fR wasNULL and \fIovecsize\fR was not zero.  PCRE_ERROR_BADOPTION      (-3)An unrecognized bit was set in the \fIoptions\fR argument.  PCRE_ERROR_BADMAGIC       (-4)PCRE stores a 4-byte "magic number" at the start of the compiled code, to catchthe case when it is passed a junk pointer. This is the error it gives when themagic number isn't present.  PCRE_ERROR_UNKNOWN_NODE   (-5)While running the pattern match, an unknown item was encountered in thecompiled pattern. This error could be caused by a bug in PCRE or by overwritingof the compiled pattern.  PCRE_ERROR_NOMEMORY       (-6)If a pattern contains back references, but the \fIovector\fR that is passed to\fBpcre_exec()\fR is not big enough to remember the referenced substrings, PCREgets a block of memory at the start of matching to use for this purpose. If thecall via \fBpcre_malloc()\fR fails, this error is given. The memory is freed atthe end of matching..SH EXTRACTING CAPTURED SUBSTRINGSCaptured substrings can be accessed directly by using the offsets returned by\fBpcre_exec()\fR in \fIovector\fR. For convenience, the functions\fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and\fBpcre_get_substring_list()\fR are provided for extracting captured substringsas new, separate, zero-terminated strings. A substring that contains a binaryzero is correctly extracted and has a further zero added on the end, but theresult does not, of course, function as a C string.The first three arguments are the same for all three functions: \fIsubject\fRis the subject string which has just been successfully matched, \fIovector\fRis a pointer to the vector of integer offsets that was passed to\fBpcre_exec()\fR, and \fIstringcount\fR is the number of substrings thatwere captured by the match, including the substring that matched the entireregular expression. This is the value returned by \fBpcre_exec\fR if itis greater than zero. If \fBpcre_exec()\fR returned zero, indicating that itran out of space in \fIovector\fR, the value passed as \fIstringcount\fR shouldbe the size of the vector divided by three.The functions \fBpcre_copy_substring()\fR and \fBpcre_get_substring()\fRextract a single substring, whose number is given as \fIstringnumber\fR. Avalue of zero extracts the substring that matched the entire pattern, whilehigher values extract the captured substrings. For \fBpcre_copy_substring()\fR,the string is placed in \fIbuffer\fR, whose length is given by\fIbuffersize\fR, while for \fBpcre_get_substring()\fR a new block of memory isobtained via \fBpcre_malloc\fR, and its address is returned via\fIstringptr\fR. The yield of the function is the length of the string, notincluding the terminating zero, or one of  PCRE_ERROR_NOMEMORY       (-6)The buffer was too small for \fBpcre_copy_substring()\fR, or the attempt to getmemory failed for \fBpcre_get_substring()\fR.  PCRE_ERROR_NOSUBSTRING    (-7)There is no substring whose number is \fIstringnumber\fR.The \fBpcre_get_substring_list()\fR function extracts all available substringsand builds a list of pointers to them. All this is done in a single block ofmemory which is obtained via \fBpcre_malloc\fR. The address of the memory blockis returned via \fIlistptr\fR, which is also the start of the list of stringpointers. The end of the list is marked by a NULL pointer. The yield of thefunction is zero if all went well, or  PCRE_ERROR_NOMEMORY       (-6)if the attempt to get the memory block failed.When any of these functions encounter a substring that is unset, which canhappen when capturing subpattern number \fIn+1\fR matches some part of thesubject, but subpattern \fIn\fR has not been used at all, they return an emptystring. This can be distinguished from a genuine zero-length substring byinspecting the appropriate offset in \fIovector\fR, which is negative for unsetsubstrings.The two convenience functions \fBpcre_free_substring()\fR and\fBpcre_free_substring_list()\fR can be used to free the memory returned bya previous call of \fBpcre_get_substring()\fR or\fBpcre_get_substring_list()\fR, respectively. They do nothing more than callthe function pointed to by \fBpcre_free\fR, which of course could be calleddirectly from a C program. However, PCRE is used in some situations where it islinked via a special interface to another programming language which cannot use\fBpcre_free\fR directly; it is for these cases that the functions areprovided..SH LIMITATIONSThere are some size limitations in PCRE but it is hoped that they will never inpractice be relevant.The maximum length of a compiled pattern is 65539 (sic) bytes.All values in repeating quantifiers must be less than 65536.There maximum number of capturing subpatterns is 65535.There is no limit to the number of non-capturing subpatterns, but the maximumdepth of nesting of all kinds of parenthesized subpattern, including capturingsubpatterns, assertions, and other types of subpattern, is 200.The maximum length of a subject string is the largest positive number that aninteger variable can hold. However, PCRE uses recursion to handle subpatternsand indefinite repetition. This means that the available stack space may limitthe size of a subject string that can be processed by certain patterns..SH DIFFERENCES FROM PERLThe differences described here are with respect to Perl 5.005.1. By default, a whitespace character is any character that the C libraryfunction \fBisspace()\fR recognizes, though it is possible to compile PCRE withalternative character type tables. Normally \fBisspace()\fR matches space,formfeed, newline, carriage return, horizontal tab, and vertical tab. Perl 5no longer includes vertical tab in its set of whitespace characters. The \\vescape that was in the Perl documentation for a long time was never in factrecognized. However, the character itself was treated as whitespace at leastup to 5.002. In 5.004 and 5.005 it does not match \\s.2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permitsthem, but they do not mean what you might think. For example, (?!a){3} doesnot assert that the next three characters are not "a". It just asserts that thenext character is not "a" three times.3. Capturing subpatterns that occur inside negative lookahead assertions arecounted, but their entries in the offsets vector are never set. Perl sets itsnumerical variables from any such patterns that are matched before theassertion fails to match something (thereby succeeding), but only if thenegative lookahead assertion contains just one branch.4. Though binary zero characters are supported in the subject string, they arenot allowed in a pattern string because it is passed as a normal C string,terminated by zero. The escape sequence "\\0" can be used in the pattern torepresent a binary zero.5. The following Perl escape sequences are not supported: \\l, \\u, \\L, \\U,\\E, \\Q. In fact these are implemented by Perl's general string-handling andare not part of its pattern matching engine.6. The Perl \\G assertion is not supported as it is not relevant to singlepattern matches.7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})constructions. However, there is some experimental support for recursivepatterns using the non-Perl item (?R).8. There are at the time of writing some oddities in Perl 5.005_02 concernedwith the settings of captured strings when part of a pattern is repeated. Forexample, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, ifthe pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) are set.In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in thefuture Perl changes to a consistent state that is different, PCRE may change tofollow.9. Another as yet unresolved discrepancy is that in Perl 5.005_02 the pattern/^(a)?(?(1)a|b)+$/ matches the string "a", whereas in PCRE it does not.However, in both Perl and PCRE /^(a)?a/ matched against "a" leaves $1 unset.10. PCRE provides some extensions to the Perl regular expression facilities:(a) Although lookbehind assertions must match fixed length strings, eachalternative branch of a lookbehind assertion can match a different length ofstring. Perl 5.005 requires them all to have the same length.(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $ meta-character matches only at the very end of the string.(c) If PCRE_EXTRA is set, a backslash followed by a letter with no specialmeaning is faulted.(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers isinverted, that is, by default they are not greedy, but if followed by aquestion mark they are.(e) PCRE_ANCHORED can be used to force a pattern to be tried only at the startof the subject.(f) The PCRE_NOTBOL, PCRE_NOTEOL, and PCRE_NOTEMPTY options for\fBpcre_exec()\fR have no Perl equivalents.(g) The (?R) construct allows for recursive pattern matching (Perl 5.6 can dothis using the (?p{code}) construct, which PCRE cannot of course support.).SH REGULAR EXPRESSION DETAILSThe syntax and semantics of the regular expressions supported by PCRE aredescribed below. Regular expressions are also described in the Perldocumentation and in a number of other books, some of which have copiousexamples. Jeffrey Friedl's "Mastering Regular Expressions", published byO'Reilly (ISBN 1-56592-257), covers them in great detail.The description here is intended as reference documentation. The basicoperation of PCRE is on strings of bytes. However, there is the beginnings ofsome support for UTF-8 character strings. To use this support you mustconfigure PCRE to include it, and then call \fBpcre_compile()\fR with thePCRE_UTF8 option. How this affects the pattern matching is described in thefinal section of this document.A regular expression is a pattern that is matched against a subject string fromleft to right. Most characters stand for themselves in a pattern, and match thecorresponding characters in the subject. As a trivial example, the pattern  The quick brown foxmatches a portion of a subject string that is identical to itself. The power ofregular expressions comes from the ability to include alternatives andrepetitions in the pattern. These are encoded in the pattern by the use of\fImeta-characters\fR, which do not stand for themselves but instead areinterpreted in some special way.There are two different sets of meta-characters: those that are recognizedanywhere in the pattern except within square brackets, and those that arerecognized in square brackets. Outside square brackets, the meta-characters areas follows:  \\      general escape character with several uses  ^      assert start of subject (or line, in multiline mode)  $      assert end of subject (or line, in multiline mode)  .      match any character except newline (by default)  [      start character class definition  |      start of alternative branch  (      start subpattern  )      end subpattern  ?      extends the meaning of (         also 0 or 1 quantifier         also quantifier minimizer  *      0 or more quantifier  +      1 or more quantifier  {      start min/max quantifierPart of a pattern that is in square brackets is called a "character class". Ina character class the only meta-characters are:  \\      general escape character  ^      negate the class, but only if the first character  -      indicates character range  ]      terminates the character classThe following sections describe the use of each of the meta-characters..SH BACKSLASHThe backslash character has several uses. Firstly, if it is followed by anon-alphameric character, it takes away any special meaning that character mayhave. This use of backslash as an escape character applies both inside andoutside character classes.For example, if you want to match a "*" character, you write "\\*" in thepattern. This applies whether or not the following character would otherwise beinterpreted as a meta-character, so it is always safe to precede anon-alphameric with "\\" to specify that it stands for itself. In particular,if you want to match a backslash, you write "\\\\".If a pattern is compiled with the PCRE_EXTENDED option, whitespace in thepattern (other than in a character class) and characters between a "#" outsidea character class and the next newline character are ignored. An escapingbackslash can be used to include a whitespace or "#" character as part of thepattern.A second use of backslash provides a way of encoding non-printing charactersin patterns in a visible manner. There is no restriction on the appearance of

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -