⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pcre.3

📁 Apache V2.0.15 Alpha For Linuxhttpd-2_0_15-alpha.tar.Z
💻 3
📖 第 1 页 / 共 5 页
字号:
  \\b     word boundary  \\B     not a word boundary  \\A     start of subject (independent of multiline mode)  \\Z     end of subject or newline at end (independent of multiline mode)  \\z     end of subject (independent of multiline mode)These assertions may not appear in character classes (but note that "\\b" has adifferent meaning, namely the backspace character, inside a character class).A word boundary is a position in the subject string where the current characterand the previous character do not both match \\w or \\W (i.e. one matches\\w and the other matches \\W), or the start or end of the string if thefirst or last character matches \\w, respectively.The \\A, \\Z, and \\z assertions differ from the traditional circumflex anddollar (described below) in that they only ever match at the very start and endof the subject string, whatever options are set. They are not affected by thePCRE_NOTBOL or PCRE_NOTEOL options. If the \fIstartoffset\fR argument of\fBpcre_exec()\fR is non-zero, \\A can never match. The difference between \\Zand \\z is that \\Z matches before a newline that is the last character of thestring as well as at the end of the string, whereas \\z matches only at theend..SH CIRCUMFLEX AND DOLLAROutside a character class, in the default matching mode, the circumflexcharacter is an assertion which is true only if the current matching point isat the start of the subject string. If the \fIstartoffset\fR argument of\fBpcre_exec()\fR is non-zero, circumflex can never match. Inside a characterclass, circumflex has an entirely different meaning (see below).Circumflex need not be the first character of the pattern if a number ofalternatives are involved, but it should be the first thing in each alternativein which it appears if the pattern is ever to match that branch. If allpossible alternatives start with a circumflex, that is, if the pattern isconstrained to match only at the start of the subject, it is said to be an"anchored" pattern. (There are also other constructs that can cause a patternto be anchored.)A dollar character is an assertion which is true only if the current matchingpoint is at the end of the subject string, or immediately before a newlinecharacter that is the last character in the string (by default). Dollar neednot be the last character of the pattern if a number of alternatives areinvolved, but it should be the last item in any branch in which it appears.Dollar has no special meaning in a character class.The meaning of dollar can be changed so that it matches only at the very end ofthe string, by setting the PCRE_DOLLAR_ENDONLY option at compile or matchingtime. This does not affect the \\Z assertion.The meanings of the circumflex and dollar characters are changed if thePCRE_MULTILINE option is set. When this is the case, they match immediatelyafter and immediately before an internal "\\n" character, respectively, inaddition to matching at the start and end of the subject string. For example,the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode,but not otherwise. Consequently, patterns that are anchored in single line modebecause all branches start with "^" are not anchored in multiline mode, and amatch for circumflex is possible when the \fIstartoffset\fR argument of\fBpcre_exec()\fR is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored ifPCRE_MULTILINE is set.Note that the sequences \\A, \\Z, and \\z can be used to match the start andend of the subject in both modes, and if all branches of a pattern start with\\A is it always anchored, whether PCRE_MULTILINE is set or not..SH FULL STOP (PERIOD, DOT)Outside a character class, a dot in the pattern matches any one character inthe subject, including a non-printing character, but not (by default) newline.If the PCRE_DOTALL option is set, dots match newlines as well. The handling ofdot is entirely independent of the handling of circumflex and dollar, the onlyrelationship being that they both involve newline characters. Dot has nospecial meaning in a character class..SH SQUARE BRACKETSAn opening square bracket introduces a character class, terminated by a closingsquare bracket. A closing square bracket on its own is not special. If aclosing square bracket is required as a member of the class, it should be thefirst data character in the class (after an initial circumflex, if present) orescaped with a backslash.A character class matches a single character in the subject; the character mustbe in the set of characters defined by the class, unless the first character inthe class is a circumflex, in which case the subject character must not be inthe set defined by the class. If a circumflex is actually required as a memberof the class, ensure it is not the first character, or escape it with abackslash.For example, the character class [aeiou] matches any lower case vowel, while[^aeiou] matches any character that is not a lower case vowel. Note that acircumflex is just a convenient notation for specifying the characters whichare in the class by enumerating those that are not. It is not an assertion: itstill consumes a character from the subject string, and fails if the currentpointer is at the end of the string.When caseless matching is set, any letters in a class represent both theirupper case and lower case versions, so for example, a caseless [aeiou] matches"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas acaseful version would.The newline character is never treated in any special way in character classes,whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A classsuch as [^a] will always match a newline.The minus (hyphen) character can be used to specify a range of characters in acharacter class. For example, [d-m] matches any letter between d and m,inclusive. If a minus character is required in a class, it must be escaped witha backslash or appear in a position where it cannot be interpreted asindicating a range, typically as the first or last character in the class.It is not possible to have the literal character "]" as the end character of arange. A pattern such as [W-]46] is interpreted as a class of two characters("W" and "-") followed by a literal string "46]", so it would match "W46]" or"-46]". However, if the "]" is escaped with a backslash it is interpreted asthe end of range, so [W-\\]46] is interpreted as a single class containing arange followed by two separate characters. The octal or hexadecimalrepresentation of "]" can also be used to end a range.Ranges operate in ASCII collating sequence. They can also be used forcharacters specified numerically, for example [\\000-\\037]. If a range thatincludes letters is used when caseless matching is set, it matches the lettersin either case. For example, [W-c] is equivalent to [][\\^_`wxyzabc], matchedcaselessly, and if character tables for the "fr" locale are in use,[\\xc8-\\xcb] matches accented E characters in both cases.The character types \\d, \\D, \\s, \\S, \\w, and \\W may also appear in acharacter class, and add the characters that they match to the class. Forexample, [\\dABCDEF] matches any hexadecimal digit. A circumflex canconveniently be used with the upper case character types to specify a morerestricted set of characters than the matching lower case type. For example,the class [^\\W_] matches any letter or digit, but not underscore.All non-alphameric characters other than \\, -, ^ (at the start) and theterminating ] are non-special in character classes, but it does no harm if theyare escaped..SH POSIX CHARACTER CLASSESPerl 5.6 (not yet released at the time of writing) is going to support thePOSIX notation for character classes, which uses names enclosed by [: and :]within the enclosing square brackets. PCRE supports this notation. For example,  [01[:alpha:]%]matches "0", "1", any alphabetic character, or "%". The supported class namesare  alnum    letters and digits  alpha    letters  ascii    character codes 0 - 127  cntrl    control characters  digit    decimal digits (same as \\d)  graph    printing characters, excluding space  lower    lower case letters  print    printing characters, including space  punct    printing characters, excluding letters and digits  space    white space (same as \\s)  upper    upper case letters  word     "word" characters (same as \\w)  xdigit   hexadecimal digitsThe names "ascii" and "word" are Perl extensions. Another Perl extension isnegation, which is indicated by a ^ character after the colon. For example,  [12[:^digit:]]matches "1", "2", or any non-digit. PCRE (and Perl) also recogize the POSIXsyntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are notsupported, and an error is given if they are encountered..SH VERTICAL BARVertical bar characters are used to separate alternative patterns. For example,the pattern  gilbert|sullivanmatches either "gilbert" or "sullivan". Any number of alternatives may appear,and an empty alternative is permitted (matching the empty string).The matching process tries each alternative in turn, from left to right,and the first one that succeeds is used. If the alternatives are within asubpattern (defined below), "succeeds" means matching the rest of the mainpattern as well as the alternative in the subpattern..SH INTERNAL OPTION SETTINGThe settings of PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and PCRE_EXTENDEDcan be changed from within the pattern by a sequence of Perl option lettersenclosed between "(?" and ")". The option letters are  i  for PCRE_CASELESS  m  for PCRE_MULTILINE  s  for PCRE_DOTALL  x  for PCRE_EXTENDEDFor example, (?im) sets caseless, multiline matching. It is also possible tounset these options by preceding the letter with a hyphen, and a combinedsetting and unsetting such as (?im-sx), which sets PCRE_CASELESS andPCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED, is alsopermitted. If a letter appears both before and after the hyphen, the option isunset.The scope of these option changes depends on where in the pattern the settingoccurs. For settings that are outside any subpattern (defined below), theeffect is the same as if the options were set or unset at the start ofmatching. The following patterns all behave in exactly the same way:  (?i)abc  a(?i)bc  ab(?i)c  abc(?i)which in turn is the same as compiling the pattern abc with PCRE_CASELESS set.In other words, such "top level" settings apply to the whole pattern (unlessthere are other changes inside subpatterns). If there is more than one settingof the same option at top level, the rightmost setting is used.If an option change occurs inside a subpattern, the effect is different. Thisis a change of behaviour in Perl 5.005. An option change inside a subpatternaffects only that part of the subpattern that follows it, so  (a(?i)b)cmatches abc and aBc and no other strings (assuming PCRE_CASELESS is not used).By this means, options can be made to have different settings in differentparts of the pattern. Any changes made in one alternative do carry oninto subsequent branches within the same subpattern. For example,  (a(?i)b|c)matches "ab", "aB", "c", and "C", even though when matching "C" the firstbranch is abandoned before the option setting. This is because the effects ofoption settings happen at compile time. There would be some very weirdbehaviour otherwise.The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can be changed in thesame way as the Perl-compatible options by using the characters U and Xrespectively. The (?X) flag setting is special in that it must always occurearlier in the pattern than any of the additional features it turns on, evenwhen it is at top level. It is best put at the start..SH SUBPATTERNSSubpatterns are delimited by parentheses (round brackets), which can be nested.Marking part of a pattern as a subpattern does two things:1. It localizes a set of alternatives. For example, the pattern  cat(aract|erpillar|)matches one of the words "cat", "cataract", or "caterpillar". Without theparentheses, it would match "cataract", "erpillar" or the empty string.2. It sets up the subpattern as a capturing subpattern (as defined above).When the whole pattern matches, that portion of the subject string that matchedthe subpattern is passed back to the caller via the \fIovector\fR argument of\fBpcre_exec()\fR. Opening parentheses are counted from left to right (startingfrom 1) to obtain the numbers of the capturing subpatterns.For example, if the string "the red king" is matched against the pattern  the ((red|white) (king|queen))the captured substrings are "red king", "red", and "king", and are numbered 1,2, and 3.The fact that plain parentheses fulfil two functions is not always helpful.There are often times when a grouping subpattern is required without acapturing requirement. If an opening parenthesis is followed by "?:", thesubpattern does not do any capturing, and is not counted when computing thenumber of any subsequent capturing subpatterns. For example, if the string "thewhite queen" is matched against the pattern  the ((?:red|white) (king|queen))the captured substrings are "white queen" and "queen", and are numbered 1 and2. The maximum number of captured substrings is 99, and the maximum number ofall subpatterns, both capturing and non-capturing, is 200.As a convenient shorthand, if any option settings are required at the start ofa non-capturing subpattern, the option letters may appear between the "?" andthe ":". Thus the two patterns

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -