⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pcre.3

📁 apache的软件linux版本
💻 3
📖 第 1 页 / 共 5 页
字号:
non-printing characters, apart from the binary zero that terminates a pattern,but when a pattern is being prepared by text editing, it is usually easier touse one of the following escape sequences than the binary character itrepresents:  \\a     alarm, that is, the BEL character (hex 07)  \\cx    "control-x", where x is any character  \\e     escape (hex 1B)  \\f     formfeed (hex 0C)  \\n     newline (hex 0A)  \\r     carriage return (hex 0D)  \\t     tab (hex 09)  \\xhh   character with hex code hh  \\ddd   character with octal code ddd, or backreferenceThe precise effect of "\\cx" is as follows: if "x" is a lower case letter, itis converted to upper case. Then bit 6 of the character (hex 40) is inverted.Thus "\\cz" becomes hex 1A, but "\\c{" becomes hex 3B, while "\\c;" becomes hex7B.After "\\x", up to two hexadecimal digits are read (letters can be in upper orlower case).After "\\0" up to two further octal digits are read. In both cases, if thereare fewer than two digits, just those that are present are used. Thus thesequence "\\0\\x\\07" specifies two binary zeros followed by a BEL character.Make sure you supply two digits after the initial zero if the character thatfollows is itself an octal digit.The handling of a backslash followed by a digit other than 0 is complicated.Outside a character class, PCRE reads it and any following digits as a decimalnumber. If the number is less than 10, or if there have been at least that manyprevious capturing left parentheses in the expression, the entire sequence istaken as a \fIback reference\fR. A description of how this works is givenlater, following the discussion of parenthesized subpatterns.Inside a character class, or if the decimal number is greater than 9 and therehave not been that many capturing subpatterns, PCRE re-reads up to three octaldigits following the backslash, and generates a single byte from the leastsignificant 8 bits of the value. Any subsequent digits stand for themselves.For example:  \\040   is another way of writing a space  \\40    is the same, provided there are fewer than 40            previous capturing subpatterns  \\7     is always a back reference  \\11    might be a back reference, or another way of            writing a tab  \\011   is always a tab  \\0113  is a tab followed by the character "3"  \\113   is the character with octal code 113 (since there            can be no more than 99 back references)  \\377   is a byte consisting entirely of 1 bits  \\81    is either a back reference, or a binary zero            followed by the two characters "8" and "1"Note that octal values of 100 or greater must not be introduced by a leadingzero, because no more than three octal digits are ever read.All the sequences that define a single byte value can be used both inside andoutside character classes. In addition, inside a character class, the sequence"\\b" is interpreted as the backspace character (hex 08). Outside a characterclass it has a different meaning (see below).The third use of backslash is for specifying generic character types:  \\d     any decimal digit  \\D     any character that is not a decimal digit  \\s     any whitespace character  \\S     any character that is not a whitespace character  \\w     any "word" character  \\W     any "non-word" characterEach pair of escape sequences partitions the complete set of characters intotwo disjoint sets. Any given character matches one, and only one, of each pair.A "word" character is any letter or digit or the underscore character, that is,any character which can be part of a Perl "word". The definition of letters anddigits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place (see "Locale support" above). For example, inthe "fr" (French) locale, some character codes greater than 128 are used foraccented letters, and these are matched by \\w.These character type sequences can appear both inside and outside characterclasses. They each match one character of the appropriate type. If the currentmatching point is at the end of the subject string, all of them fail, sincethere is no character to match.The fourth use of backslash is for certain simple assertions. An assertionspecifies a condition that has to be met at a particular point in a match,without consuming any characters from the subject string. The use ofsubpatterns for more complicated assertions is described below. The backslashedassertions are  \\b     word boundary  \\B     not a word boundary  \\A     start of subject (independent of multiline mode)  \\Z     end of subject or newline at end (independent of multiline mode)  \\z     end of subject (independent of multiline mode)These assertions may not appear in character classes (but note that "\\b" has adifferent meaning, namely the backspace character, inside a character class).A word boundary is a position in the subject string where the current characterand the previous character do not both match \\w or \\W (i.e. one matches\\w and the other matches \\W), or the start or end of the string if thefirst or last character matches \\w, respectively.The \\A, \\Z, and \\z assertions differ from the traditional circumflex anddollar (described below) in that they only ever match at the very start and endof the subject string, whatever options are set. They are not affected by thePCRE_NOTBOL or PCRE_NOTEOL options. If the \fIstartoffset\fR argument of\fBpcre_exec()\fR is non-zero, \\A can never match. The difference between \\Zand \\z is that \\Z matches before a newline that is the last character of thestring as well as at the end of the string, whereas \\z matches only at theend..SH CIRCUMFLEX AND DOLLAROutside a character class, in the default matching mode, the circumflexcharacter is an assertion which is true only if the current matching point isat the start of the subject string. If the \fIstartoffset\fR argument of\fBpcre_exec()\fR is non-zero, circumflex can never match. Inside a characterclass, circumflex has an entirely different meaning (see below).Circumflex need not be the first character of the pattern if a number ofalternatives are involved, but it should be the first thing in each alternativein which it appears if the pattern is ever to match that branch. If allpossible alternatives start with a circumflex, that is, if the pattern isconstrained to match only at the start of the subject, it is said to be an"anchored" pattern. (There are also other constructs that can cause a patternto be anchored.)A dollar character is an assertion which is true only if the current matchingpoint is at the end of the subject string, or immediately before a newlinecharacter that is the last character in the string (by default). Dollar neednot be the last character of the pattern if a number of alternatives areinvolved, but it should be the last item in any branch in which it appears.Dollar has no special meaning in a character class.The meaning of dollar can be changed so that it matches only at the very end ofthe string, by setting the PCRE_DOLLAR_ENDONLY option at compile or matchingtime. This does not affect the \\Z assertion.The meanings of the circumflex and dollar characters are changed if thePCRE_MULTILINE option is set. When this is the case, they match immediatelyafter and immediately before an internal "\\n" character, respectively, inaddition to matching at the start and end of the subject string. For example,the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode,but not otherwise. Consequently, patterns that are anchored in single line modebecause all branches start with "^" are not anchored in multiline mode, and amatch for circumflex is possible when the \fIstartoffset\fR argument of\fBpcre_exec()\fR is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored ifPCRE_MULTILINE is set.Note that the sequences \\A, \\Z, and \\z can be used to match the start andend of the subject in both modes, and if all branches of a pattern start with\\A it is always anchored, whether PCRE_MULTILINE is set or not..SH FULL STOP (PERIOD, DOT)Outside a character class, a dot in the pattern matches any one character inthe subject, including a non-printing character, but not (by default) newline.If the PCRE_DOTALL option is set, dots match newlines as well. The handling ofdot is entirely independent of the handling of circumflex and dollar, the onlyrelationship being that they both involve newline characters. Dot has nospecial meaning in a character class..SH SQUARE BRACKETSAn opening square bracket introduces a character class, terminated by a closingsquare bracket. A closing square bracket on its own is not special. If aclosing square bracket is required as a member of the class, it should be thefirst data character in the class (after an initial circumflex, if present) orescaped with a backslash.A character class matches a single character in the subject; the character mustbe in the set of characters defined by the class, unless the first character inthe class is a circumflex, in which case the subject character must not be inthe set defined by the class. If a circumflex is actually required as a memberof the class, ensure it is not the first character, or escape it with abackslash.For example, the character class [aeiou] matches any lower case vowel, while[^aeiou] matches any character that is not a lower case vowel. Note that acircumflex is just a convenient notation for specifying the characters whichare in the class by enumerating those that are not. It is not an assertion: itstill consumes a character from the subject string, and fails if the currentpointer is at the end of the string.When caseless matching is set, any letters in a class represent both theirupper case and lower case versions, so for example, a caseless [aeiou] matches"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas acaseful version would.The newline character is never treated in any special way in character classes,whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A classsuch as [^a] will always match a newline.The minus (hyphen) character can be used to specify a range of characters in acharacter class. For example, [d-m] matches any letter between d and m,inclusive. If a minus character is required in a class, it must be escaped witha backslash or appear in a position where it cannot be interpreted asindicating a range, typically as the first or last character in the class.It is not possible to have the literal character "]" as the end character of arange. A pattern such as [W-]46] is interpreted as a class of two characters("W" and "-") followed by a literal string "46]", so it would match "W46]" or"-46]". However, if the "]" is escaped with a backslash it is interpreted asthe end of range, so [W-\\]46] is interpreted as a single class containing arange followed by two separate characters. The octal or hexadecimalrepresentation of "]" can also be used to end a range.Ranges operate in ASCII collating sequence. They can also be used forcharacters specified numerically, for example [\\000-\\037]. If a range thatincludes letters is used when caseless matching is set, it matches the lettersin either case. For example, [W-c] is equivalent to [][\\^_`wxyzabc], matchedcaselessly, and if character tables for the "fr" locale are in use,[\\xc8-\\xcb] matches accented E characters in both cases.The character types \\d, \\D, \\s, \\S, \\w, and \\W may also appear in acharacter class, and add the characters that they match to the class. Forexample, [\\dABCDEF] matches any hexadecimal digit. A circumflex canconveniently be used with the upper case character types to specify a morerestricted set of characters than the matching lower case type. For example,the class [^\\W_] matches any letter or digit, but not underscore.All non-alphameric characters other than \\, -, ^ (at the start) and theterminating ] are non-special in character classes, but it does no harm if theyare escaped..SH POSIX CHARACTER CLASSESPerl 5.6 (not yet released at the time of writing) is going to support thePOSIX notation for character classes, which uses names enclosed by [: and :]within the enclosing square brackets. PCRE supports this notation. For example,  [01[:alpha:]%]matches "0", "1", any alphabetic character, or "%". The supported class namesare  alnum    letters and digits  alpha    letters  ascii    character codes 0 - 127  cntrl    control characters  digit    decimal digits (same as \\d)  graph    printing characters, excluding space  lower    lower case letters  print    printing characters, including space  punct    printing characters, excluding letters and digits  space    white space (same as \\s)  upper    upper case letters  word     "word" characters (same as \\w)  xdigit   hexadecimal digitsThe names "ascii" and "word" are Perl extensions. Another Perl extension isnegation, which is indicated by a ^ character after the colon. For example,  [12[:^digit:]]matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIXsyntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are notsupported, and an error is given if they are encountered..SH VERTICAL BARVertical bar characters are used to separate alternative patterns. For example,the pattern  gilbert|sullivanmatches either "gilbert" or "sullivan". Any number of alternatives may appear,and an empty alternative is permitted (matching the empty string).The matching process tries each alternative in turn, from left to right,and the first one that succeeds is used. If the alternatives are within asubpattern (defined below), "succeeds" means matching the rest of the mainpattern as well as the alternative in the subpattern..SH INTERNAL OPTION SETTINGThe settings of PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and PCRE_EXTENDED

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -