⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pcre.3

📁 Apache V2.0.15 Alpha For Linuxhttpd-2_0_15-alpha.tar.Z
💻 3
📖 第 1 页 / 共 5 页
字号:
memory failed for \fBpcre_get_substring()\fR.  PCRE_ERROR_NOSUBSTRING    (-7)There is no substring whose number is \fIstringnumber\fR.The \fBpcre_get_substring_list()\fR function extracts all available substringsand builds a list of pointers to them. All this is done in a single block ofmemory which is obtained via \fBpcre_malloc\fR. The address of the memory blockis returned via \fIlistptr\fR, which is also the start of the list of stringpointers. The end of the list is marked by a NULL pointer. The yield of thefunction is zero if all went well, or  PCRE_ERROR_NOMEMORY       (-6)if the attempt to get the memory block failed.When any of these functions encounter a substring that is unset, which canhappen when capturing subpattern number \fIn+1\fR matches some part of thesubject, but subpattern \fIn\fR has not been used at all, they return an emptystring. This can be distinguished from a genuine zero-length substring byinspecting the appropriate offset in \fIovector\fR, which is negative for unsetsubstrings..SH LIMITATIONSThere are some size limitations in PCRE but it is hoped that they will never inpractice be relevant.The maximum length of a compiled pattern is 65539 (sic) bytes.All values in repeating quantifiers must be less than 65536.The maximum number of capturing subpatterns is 99.The maximum number of all parenthesized subpatterns, including capturingsubpatterns, assertions, and other types of subpattern, is 200.The maximum length of a subject string is the largest positive number that aninteger variable can hold. However, PCRE uses recursion to handle subpatternsand indefinite repetition. This means that the available stack space may limitthe size of a subject string that can be processed by certain patterns..SH DIFFERENCES FROM PERLThe differences described here are with respect to Perl 5.005.1. By default, a whitespace character is any character that the C libraryfunction \fBisspace()\fR recognizes, though it is possible to compile PCRE withalternative character type tables. Normally \fBisspace()\fR matches space,formfeed, newline, carriage return, horizontal tab, and vertical tab. Perl 5no longer includes vertical tab in its set of whitespace characters. The \\vescape that was in the Perl documentation for a long time was never in factrecognized. However, the character itself was treated as whitespace at leastup to 5.002. In 5.004 and 5.005 it does not match \\s.2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permitsthem, but they do not mean what you might think. For example, (?!a){3} doesnot assert that the next three characters are not "a". It just asserts that thenext character is not "a" three times.3. Capturing subpatterns that occur inside negative lookahead assertions arecounted, but their entries in the offsets vector are never set. Perl sets itsnumerical variables from any such patterns that are matched before theassertion fails to match something (thereby succeeding), but only if thenegative lookahead assertion contains just one branch.4. Though binary zero characters are supported in the subject string, they arenot allowed in a pattern string because it is passed as a normal C string,terminated by zero. The escape sequence "\\0" can be used in the pattern torepresent a binary zero.5. The following Perl escape sequences are not supported: \\l, \\u, \\L, \\U,\\E, \\Q. In fact these are implemented by Perl's general string-handling andare not part of its pattern matching engine.6. The Perl \\G assertion is not supported as it is not relevant to singlepattern matches.7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})constructions. However, there is some experimental support for recursivepatterns using the non-Perl item (?R).8. There are at the time of writing some oddities in Perl 5.005_02 concernedwith the settings of captured strings when part of a pattern is repeated. Forexample, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, ifthe pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) are set.In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in thefuture Perl changes to a consistent state that is different, PCRE may change tofollow.9. Another as yet unresolved discrepancy is that in Perl 5.005_02 the pattern/^(a)?(?(1)a|b)+$/ matches the string "a", whereas in PCRE it does not.However, in both Perl and PCRE /^(a)?a/ matched against "a" leaves $1 unset.10. PCRE provides some extensions to the Perl regular expression facilities:(a) Although lookbehind assertions must match fixed length strings, eachalternative branch of a lookbehind assertion can match a different length ofstring. Perl 5.005 requires them all to have the same length.(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $ meta-character matches only at the very end of the string.(c) If PCRE_EXTRA is set, a backslash followed by a letter with no specialmeaning is faulted.(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers isinverted, that is, by default they are not greedy, but if followed by aquestion mark they are.(e) PCRE_ANCHORED can be used to force a pattern to be tried only at the startof the subject.(f) The PCRE_NOTBOL, PCRE_NOTEOL, and PCRE_NOTEMPTY options for\fBpcre_exec()\fR have no Perl equivalents.(g) The (?R) construct allows for recursive pattern matching (Perl 5.6 can dothis using the (?p{code}) construct, which PCRE cannot of course support.).SH REGULAR EXPRESSION DETAILSThe syntax and semantics of the regular expressions supported by PCRE aredescribed below. Regular expressions are also described in the Perldocumentation and in a number of other books, some of which have copiousexamples. Jeffrey Friedl's "Mastering Regular Expressions", published byO'Reilly (ISBN 1-56592-257), covers them in great detail. The descriptionhere is intended as reference documentation.A regular expression is a pattern that is matched against a subject string fromleft to right. Most characters stand for themselves in a pattern, and match thecorresponding characters in the subject. As a trivial example, the pattern  The quick brown foxmatches a portion of a subject string that is identical to itself. The power ofregular expressions comes from the ability to include alternatives andrepetitions in the pattern. These are encoded in the pattern by the use of\fImeta-characters\fR, which do not stand for themselves but instead areinterpreted in some special way.There are two different sets of meta-characters: those that are recognizedanywhere in the pattern except within square brackets, and those that arerecognized in square brackets. Outside square brackets, the meta-characters areas follows:  \\      general escape character with several uses  ^      assert start of subject (or line, in multiline mode)  $      assert end of subject (or line, in multiline mode)  .      match any character except newline (by default)  [      start character class definition  |      start of alternative branch  (      start subpattern  )      end subpattern  ?      extends the meaning of (         also 0 or 1 quantifier         also quantifier minimizer  *      0 or more quantifier  +      1 or more quantifier  {      start min/max quantifierPart of a pattern that is in square brackets is called a "character class". Ina character class the only meta-characters are:  \\      general escape character  ^      negate the class, but only if the first character  -      indicates character range  ]      terminates the character classThe following sections describe the use of each of the meta-characters..SH BACKSLASHThe backslash character has several uses. Firstly, if it is followed by anon-alphameric character, it takes away any special meaning that character mayhave. This use of backslash as an escape character applies both inside andoutside character classes.For example, if you want to match a "*" character, you write "\\*" in thepattern. This applies whether or not the following character would otherwise beinterpreted as a meta-character, so it is always safe to precede anon-alphameric with "\\" to specify that it stands for itself. In particular,if you want to match a backslash, you write "\\\\".If a pattern is compiled with the PCRE_EXTENDED option, whitespace in thepattern (other than in a character class) and characters between a "#" outsidea character class and the next newline character are ignored. An escapingbackslash can be used to include a whitespace or "#" character as part of thepattern.A second use of backslash provides a way of encoding non-printing charactersin patterns in a visible manner. There is no restriction on the appearance ofnon-printing characters, apart from the binary zero that terminates a pattern,but when a pattern is being prepared by text editing, it is usually easier touse one of the following escape sequences than the binary character itrepresents:  \\a     alarm, that is, the BEL character (hex 07)  \\cx    "control-x", where x is any character  \\e     escape (hex 1B)  \\f     formfeed (hex 0C)  \\n     newline (hex 0A)  \\r     carriage return (hex 0D)  \\t     tab (hex 09)  \\xhh   character with hex code hh  \\ddd   character with octal code ddd, or backreferenceThe precise effect of "\\cx" is as follows: if "x" is a lower case letter, itis converted to upper case. Then bit 6 of the character (hex 40) is inverted.Thus "\\cz" becomes hex 1A, but "\\c{" becomes hex 3B, while "\\c;" becomes hex7B.After "\\x", up to two hexadecimal digits are read (letters can be in upper orlower case).After "\\0" up to two further octal digits are read. In both cases, if thereare fewer than two digits, just those that are present are used. Thus thesequence "\\0\\x\\07" specifies two binary zeros followed by a BEL character.Make sure you supply two digits after the initial zero if the character thatfollows is itself an octal digit.The handling of a backslash followed by a digit other than 0 is complicated.Outside a character class, PCRE reads it and any following digits as a decimalnumber. If the number is less than 10, or if there have been at least that manyprevious capturing left parentheses in the expression, the entire sequence istaken as a \fIback reference\fR. A description of how this works is givenlater, following the discussion of parenthesized subpatterns.Inside a character class, or if the decimal number is greater than 9 and therehave not been that many capturing subpatterns, PCRE re-reads up to three octaldigits following the backslash, and generates a single byte from the leastsignificant 8 bits of the value. Any subsequent digits stand for themselves.For example:  \\040   is another way of writing a space  \\40    is the same, provided there are fewer than 40            previous capturing subpatterns  \\7     is always a back reference  \\11    might be a back reference, or another way of            writing a tab  \\011   is always a tab  \\0113  is a tab followed by the character "3"  \\113   is the character with octal code 113 (since there            can be no more than 99 back references)  \\377   is a byte consisting entirely of 1 bits  \\81    is either a back reference, or a binary zero            followed by the two characters "8" and "1"Note that octal values of 100 or greater must not be introduced by a leadingzero, because no more than three octal digits are ever read.All the sequences that define a single byte value can be used both inside andoutside character classes. In addition, inside a character class, the sequence"\\b" is interpreted as the backspace character (hex 08). Outside a characterclass it has a different meaning (see below).The third use of backslash is for specifying generic character types:  \\d     any decimal digit  \\D     any character that is not a decimal digit  \\s     any whitespace character  \\S     any character that is not a whitespace character  \\w     any "word" character  \\W     any "non-word" characterEach pair of escape sequences partitions the complete set of characters intotwo disjoint sets. Any given character matches one, and only one, of each pair.A "word" character is any letter or digit or the underscore character, that is,any character which can be part of a Perl "word". The definition of letters anddigits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place (see "Locale support" above). For example, inthe "fr" (French) locale, some character codes greater than 128 are used foraccented letters, and these are matched by \\w.These character type sequences can appear both inside and outside characterclasses. They each match one character of the appropriate type. If the currentmatching point is at the end of the subject string, all of them fail, sincethere is no character to match.The fourth use of backslash is for certain simple assertions. An assertionspecifies a condition that has to be met at a particular point in a match,without consuming any characters from the subject string. The use ofsubpatterns for more complicated assertions is described below. The backslashedassertions are

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -