📄 pcre.txt

📁 apache的软件linux版本
💻 TXT
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
     A character class matches a single character in the subject;     the  character  must  be in the set of characters defined by     the class, unless the first character in the class is a cir-     cumflex,  in which case the subject character must not be in     the set defined by the class. If a  circumflex  is  actually     required  as  a  member  of  the class, ensure it is not the     first character, or escape it with a backslash.     For example, the character class [aeiou] matches  any  lower     case vowel, while [^aeiou] matches any character that is not     a lower case vowel. Note that a circumflex is  just  a  con-     venient  notation for specifying the characters which are in     the class by enumerating those that are not. It  is  not  an     assertion:  it  still  consumes a character from the subject     string, and fails if the current pointer is at  the  end  of     the string.     When caseless matching  is  set,  any  letters  in  a  class     represent  both their upper case and lower case versions, so     for example, a caseless [aeiou] matches "A" as well as  "a",     and  a caseless [^aeiou] does not match "A", whereas a case-     ful version would.     The newline character is never treated in any special way in     character  classes,  whatever the setting of the PCRE_DOTALL     or PCRE_MULTILINE options is. A  class  such  as  [^a]  will     always match a newline.     The minus (hyphen) character can be used to specify a  range     of  characters  in  a  character  class.  For example, [d-m]     matches any letter between d and m, inclusive.  If  a  minus     character  is required in a class, it must be escaped with a     backslash or appear in a position where it cannot be  inter-     preted as indicating a range, typically as the first or last     character in the class.     It is not possible to have the literal character "]" as  the     end  character  of  a  range.  A  pattern such as [W-]46] is     interpreted as a class of two characters ("W" and "-")  fol-     lowed by a literal string "46]", so it would match "W46]" or     "-46]". However, if the "]" is escaped with a  backslash  it     is  interpreted  as  the end of range, so [W-\]46] is inter-     preted as a single class containing a range followed by  two     separate characters. The octal or hexadecimal representation     of "]" can also be used to end a range.     Ranges operate in ASCII collating sequence. They can also be     used  for  characters  specified  numerically,  for  example     [\000-\037]. If a range that includes letters is  used  when     caseless  matching  is set, it matches the letters in either     case. For example, [W-c] is equivalent  to  [][\^_`wxyzabc],     matched  caselessly,  and  if  character tables for the "fr"     locale are in use, [\xc8-\xcb] matches accented E characters     in both cases.     The character types \d, \D, \s, \S,  \w,  and  \W  may  also     appear  in  a  character  class, and add the characters that     they match to the class. For example, [\dABCDEF] matches any     hexadecimal  digit.  A  circumflex  can conveniently be used     with the upper case character types to specify a  more  res-     tricted set of characters than the matching lower case type.     For example, the class [^\W_] matches any letter  or  digit,     but not underscore.     All non-alphameric characters other than \,  -,  ^  (at  the     start)  and  the  terminating ] are non-special in character     classes, but it does no harm if they are escaped.POSIX CHARACTER CLASSES     Perl 5.6 (not yet released at the time of writing) is  going     to  support  the POSIX notation for character classes, which     uses names enclosed by  [:  and  :]   within  the  enclosing     square brackets. PCRE supports this notation. For example,       [01[:alpha:]%]     matches "0", "1", any alphabetic character, or "%". The sup-     ported class names are       alnum    letters and digits       alpha    letters       ascii    character codes 0 - 127       cntrl    control characters       digit    decimal digits (same as \d)       graph    printing characters, excluding space       lower    lower case letters       print    printing characters, including space       punct    printing characters, excluding letters and digits       space    white space (same as \s)       upper    upper case letters       word     "word" characters (same as \w)       xdigit   hexadecimal digits     The names "ascii" and "word" are  Perl  extensions.  Another     Perl  extension is negation, which is indicated by a ^ char-     acter after the colon. For example,       [12[:^digit:]]     matches "1", "2", or any non-digit.  PCRE  (and  Perl)  also     recognize the POSIX syntax [.ch.] and [=ch=] where "ch" is a     "collating element", but these are  not  supported,  and  an     error is given if they are encountered.VERTICAL BAR     Vertical bar characters are  used  to  separate  alternative     patterns. For example, the pattern       gilbert|sullivan     matches either "gilbert" or "sullivan". Any number of alter-     natives  may  appear,  and an empty alternative is permitted     (matching the empty string).   The  matching  process  tries     each  alternative in turn, from left to right, and the first     one that succeeds is used. If the alternatives are within  a     subpattern  (defined  below),  "succeeds" means matching the     rest of the main pattern as well as the alternative  in  the     subpattern.INTERNAL OPTION SETTING     The settings of PCRE_CASELESS, PCRE_MULTILINE,  PCRE_DOTALL,     and  PCRE_EXTENDED can be changed from within the pattern by     a sequence of Perl option letters enclosed between "(?"  and     ")". The option letters are       i  for PCRE_CASELESS       m  for PCRE_MULTILINE       s  for PCRE_DOTALL       x  for PCRE_EXTENDED     For example, (?im) sets caseless, multiline matching. It  is     also possible to unset these options by preceding the letter     with a hyphen, and a combined setting and unsetting such  as     (?im-sx),  which sets PCRE_CASELESS and PCRE_MULTILINE while     unsetting PCRE_DOTALL and PCRE_EXTENDED, is also  permitted.     If  a  letter  appears both before and after the hyphen, the     option is unset.     The scope of these option changes depends on  where  in  the     pattern  the  setting  occurs. For settings that are outside     any subpattern (defined below), the effect is the same as if     the  options were set or unset at the start of matching. The     following patterns all behave in exactly the same way:       (?i)abc       a(?i)bc       ab(?i)c       abc(?i)     which in turn is the same as compiling the pattern abc  with     PCRE_CASELESS  set.   In  other words, such "top level" set-     tings apply to the whole pattern  (unless  there  are  other     changes  inside subpatterns). If there is more than one set-     ting of the same option at top level, the rightmost  setting     is used.     If an option change occurs inside a subpattern,  the  effect     is  different.  This is a change of behaviour in Perl 5.005.     An option change inside a subpattern affects only that  part     of the subpattern that follows it, so       (a(?i)b)c     matches  abc  and  aBc  and  no  other   strings   (assuming     PCRE_CASELESS  is  not used).  By this means, options can be     made to have different settings in different  parts  of  the     pattern.  Any  changes  made  in one alternative do carry on     into subsequent branches within  the  same  subpattern.  For     example,       (a(?i)b|c)     matches "ab", "aB", "c", and "C", even though when  matching     "C" the first branch is abandoned before the option setting.     This is because the effects of  option  settings  happen  at     compile  time. There would be some very weird behaviour oth-     erwise.     The PCRE-specific options PCRE_UNGREEDY and  PCRE_EXTRA  can     be changed in the same way as the Perl-compatible options by     using the characters U and X  respectively.  The  (?X)  flag     setting  is  special in that it must always occur earlier in     the pattern than any of the additional features it turns on,     even when it is at top level. It is best put at the start.SUBPATTERNS     Subpatterns are delimited by parentheses  (round  brackets),     which can be nested.  Marking part of a pattern as a subpat-     tern does two things:     1. It localizes a set of alternatives. For example, the pat-     tern       cat(aract|erpillar|)     matches one of the words "cat",  "cataract",  or  "caterpil-     lar".  Without  the  parentheses, it would match "cataract",     "erpillar" or the empty string.     2. It sets up the subpattern as a capturing  subpattern  (as     defined  above).   When the whole pattern matches, that por-     tion of the subject string that matched  the  subpattern  is     passed  back  to  the  caller  via  the  ovector argument of     pcre_exec(). Opening parentheses are counted  from  left  to     right (starting from 1) to obtain the numbers of the captur-     ing subpatterns.     For example, if the string "the red king" is matched against     the pattern       the ((red|white) (king|queen))     the captured substrings are "red king", "red",  and  "king",     and are numbered 1, 2, and 3, respectively.     The fact that plain parentheses fulfil two functions is  not     always  helpful.  There are often times when a grouping sub-     pattern is required without a capturing requirement.  If  an     opening parenthesis is followed by "?:", the subpattern does     not do any capturing, and is not counted when computing  the     number of any subsequent capturing subpatterns. For example,     if the string "the white queen" is matched against the  pat-     tern       the ((?:red|white) (king|queen))     the captured substrings are "white queen" and  "queen",  and     are  numbered  1  and 2. The maximum number of captured sub-     strings is 99, and the maximum number  of  all  subpatterns,     both capturing and non-capturing, is 200.     As a  convenient  shorthand,  if  any  option  settings  are     required  at  the  start  of a non-capturing subpattern, the     option letters may appear between the "?" and the ":".  Thus     the two patterns       (?i:saturday|sunday)       (?:(?i)saturday|sunday)     match exactly the same set of strings.  Because  alternative     branches  are  tried from left to right, and options are not     reset until the end of the subpattern is reached, an  option     setting  in  one  branch does affect subsequent branches, so     the above patterns match "SUNDAY" as well as "Saturday".REPETITION     Repetition is specified by quantifiers, which can follow any     of the following items:       a single character, possibly escaped       the . metacharacter       a character class       a back reference (see next section)       a parenthesized subpattern (unless it is  an  assertion  -     see below)     The general repetition quantifier specifies  a  minimum  and     maximum  number  of  permitted  matches,  by  giving the two     numbers in curly brackets (braces), separated  by  a  comma.     The  numbers  must be less than 65536, and the first must be     less than or equal to the second. For example:       z{2,4}     matches "zz", "zzz", or "zzzz". A closing brace on  its  own     is not a special character. If the second number is omitted,     but the comma is present, there is no upper  limit;  if  the     second number and the comma are both omitted, the quantifier     specifies an exact number of required matches. Thus       [aeiou]{3,}     matches at least 3 successive vowels,  but  may  match  many     more, while       \d{8}     matches exactly 8 digits.  An  opening  curly  bracket  that     appears  in a position where a quantifier is not allowed, or     one that does not match the syntax of a quantifier, is taken     as  a literal character. For example, {,6} is not a quantif-     ier, but a literal string of four characters.     The quantifier {0} is permitted, causing the  expression  to     behave
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -