📄 pcre.txt

📁 Apache V2.0.15 Alpha For Linuxhttpd-2_0_15-alpha.tar.Z
💻 TXT
📖 第 1 页 / 共 5 页
字号:
     After "\x", up to two hexadecimal digits are  read  (letters     can be in upper or lower case).     After "\0" up to two further octal digits are read. In  both     cases,  if  there are fewer than two digits, just those that     are present are used. Thus the sequence "\0\x\07"  specifies     two binary zeros followed by a BEL character.  Make sure you     supply two digits after the initial zero  if  the  character     that follows is itself an octal digit.     The handling of a backslash followed by a digit other than 0     is  complicated.   Outside  a character class, PCRE reads it     and any following digits as a decimal number. If the  number     is  less  than  10, or if there have been at least that many     previous capturing left parentheses in the  expression,  the     entire  sequence is taken as a back reference. A description     of how this works is given later, following  the  discussion     of parenthesized subpatterns.     Inside a character  class,  or  if  the  decimal  number  is     greater  than  9 and there have not been that many capturing     subpatterns, PCRE re-reads up to three octal digits  follow-     ing  the  backslash,  and  generates  a single byte from the     least significant 8 bits of the value. Any subsequent digits     stand for themselves.  For example:       \040   is another way of writing a space       \40    is the same, provided there are fewer than 40                 previous capturing subpatterns       \7     is always a back reference       \11    might be a back reference, or another way of                 writing a tab       \011   is always a tab       \0113  is a tab followed by the character "3"       \113   is the character with octal code 113 (since there                 can be no more than 99 back references)       \377   is a byte consisting entirely of 1 bits       \81    is either a back reference, or a binary zero                 followed by the two characters "8" and "1"     Note that octal values of 100 or greater must not be  intro-     duced  by  a  leading zero, because no more than three octal     digits are ever read.     All the sequences that define a single  byte  value  can  be     used both inside and outside character classes. In addition,     inside a character class, the sequence "\b"  is  interpreted     as  the  backspace  character  (hex 08). Outside a character     class it has a different meaning (see below).     The third use of backslash is for specifying generic charac-     ter types:       \d     any decimal digit       \D     any character that is not a decimal digit       \s     any whitespace character       \S     any character that is not a whitespace character       \w     any "word" character       \W     any "non-word" character     Each pair of escape sequences partitions the complete set of     characters  into  two  disjoint  sets.  Any  given character     matches one, and only one, of each pair.     A "word" character is any letter or digit or the  underscore     character,  that  is,  any  character which can be part of a     Perl "word". The definition of letters and  digits  is  con-     trolled  by PCRE's character tables, and may vary if locale-     specific matching is  taking  place  (see  "Locale  support"     above). For example, in the "fr" (French) locale, some char-     acter codes greater than 128 are used for accented  letters,     and these are matched by \w.     These character type sequences can appear  both  inside  and     outside  character classes. They each match one character of     the appropriate type. If the current matching  point  is  at     the end of the subject string, all of them fail, since there     is no character to match.     The fourth use of backslash is  for  certain  simple  asser-     tions. An assertion specifies a condition that has to be met     at a particular point in  a  match,  without  consuming  any     characters  from  the subject string. The use of subpatterns     for more complicated  assertions  is  described  below.  The     backslashed assertions are       \b     word boundary       \B     not a word boundary       \A     start of subject (independent of multiline mode)       \Z     end of subject or newline at  end  (independent  of     multiline mode)       \z     end of subject (independent of multiline mode)     These assertions may not appear in  character  classes  (but     note that "\b" has a different meaning, namely the backspace     character, inside a character class).     A word boundary is a position in the  subject  string  where     the current character and the previous character do not both     match \w or \W (i.e. one matches \w and  the  other  matches     \W),  or the start or end of the string if the first or last     character matches \w, respectively.     The \A, \Z, and \z assertions differ  from  the  traditional     circumflex  and  dollar  (described below) in that they only     ever match at the very start and end of the subject  string,     whatever  options  are  set.  They  are  not affected by the     PCRE_NOTBOL or PCRE_NOTEOL options. If the startoffset argu-     ment  of  pcre_exec()  is  non-zero, \A can never match. The     difference between \Z and \z is that  \Z  matches  before  a     newline  that is the last character of the string as well as     at the end of the string, whereas \z  matches  only  at  the     end.CIRCUMFLEX AND DOLLAR     Outside a character class, in the default matching mode, the     circumflex  character  is an assertion which is true only if     the current matching point is at the start  of  the  subject     string.  If  the startoffset argument of pcre_exec() is non-     zero, circumflex can never match. Inside a character  class,     circumflex has an entirely different meaning (see below).     Circumflex need not be the first character of the pattern if     a  number of alternatives are involved, but it should be the     first thing in each alternative in which it appears  if  the     pattern is ever to match that branch. If all possible alter-     natives start with a circumflex, that is, if the pattern  is     constrained to match only at the start of the subject, it is     said to be an "anchored" pattern. (There are also other con-     structs that can cause a pattern to be anchored.)     A dollar character is an assertion which is true only if the     current  matching point is at the end of the subject string,     or immediately before a newline character that is  the  last     character in the string (by default). Dollar need not be the     last character of the pattern if a  number  of  alternatives     are  involved,  but it should be the last item in any branch     in which it appears.  Dollar has no  special  meaning  in  a     character class.     The meaning of dollar can be changed so that it matches only     at   the   very   end   of   the   string,  by  setting  the     PCRE_DOLLAR_ENDONLY option at compile or matching time. This     does not affect the \Z assertion.     The meanings of the circumflex  and  dollar  characters  are     changed  if  the  PCRE_MULTILINE option is set. When this is     the case,  they  match  immediately  after  and  immediately     before an internal "\n" character, respectively, in addition     to matching at the start and end of the subject string.  For     example,  the  pattern  /^abc$/  matches  the subject string     "def\nabc" in multiline  mode,  but  not  otherwise.  Conse-     quently,  patterns  that  are  anchored  in single line mode     because all branches start with "^" are not anchored in mul-     tiline mode, and a match for circumflex is possible when the     startoffset  argument  of  pcre_exec()  is   non-zero.   The     PCRE_DOLLAR_ENDONLY  option  is ignored if PCRE_MULTILINE is     set.     Note that the sequences \A, \Z, and \z can be used to  match     the  start  and end of the subject in both modes, and if all     branches of a pattern start with \A is it  always  anchored,     whether PCRE_MULTILINE is set or not.FULL STOP (PERIOD, DOT)     Outside a character class, a dot in the pattern matches  any     one character in the subject, including a non-printing char-     acter, but not (by default)  newline.   If  the  PCRE_DOTALL     option  is set, dots match newlines as well. The handling of     dot is entirely independent of the  handling  of  circumflex     and  dollar,  the  only  relationship  being  that they both     involve newline characters. Dot has no special meaning in  a     character class.SQUARE BRACKETS     An opening square bracket introduces a character class, ter-     minated  by  a  closing  square  bracket.  A  closing square     bracket on its own is  not  special.  If  a  closing  square     bracket  is  required as a member of the class, it should be     the first data character in the class (after an initial cir-     cumflex, if present) or escaped with a backslash.     A character class matches a single character in the subject;     the  character  must  be in the set of characters defined by     the class, unless the first character in the class is a cir-     cumflex,  in which case the subject character must not be in     the set defined by the class. If a  circumflex  is  actually     required  as  a  member  of  the class, ensure it is not the     first character, or escape it with a backslash.     For example, the character class [aeiou] matches  any  lower     case vowel, while [^aeiou] matches any character that is not     a lower case vowel. Note that a circumflex is  just  a  con-     venient  notation for specifying the characters which are in     the class by enumerating those that are not. It  is  not  an     assertion:  it  still  consumes a character from the subject     string, and fails if the current pointer is at  the  end  of     the string.     When caseless matching  is  set,  any  letters  in  a  class     represent  both their upper case and lower case versions, so     for example, a caseless [aeiou] matches "A" as well as  "a",     and  a caseless [^aeiou] does not match "A", whereas a case-     ful version would.     The newline character is never treated in any special way in     character  classes,  whatever the setting of the PCRE_DOTALL     or PCRE_MULTILINE options is. A  class  such  as  [^a]  will     always match a newline.     The minus (hyphen) character can be used to specify a  range     of  characters  in  a  character  class.  For example, [d-m]     matches any letter between d and m, inclusive.  If  a  minus     character  is required in a class, it must be escaped with a     backslash or appear in a position where it cannot be  inter-     preted as indicating a range, typically as the first or last     character in the class.     It is not possible to have the literal character "]" as  the     end  character  of  a  range.  A  pattern such as [W-]46] is     interpreted as a class of two characters ("W" and "-")  fol-     lowed by a literal string "46]", so it would match "W46]" or     "-46]". However, if the "]" is escaped with a  backslash  it     is  interpreted  as  the end of range, so [W-\]46] is inter-     preted as a single class containing a range followed by  two     separate characters. The octal or hexadecimal representation     of "]" can also be used to end a range.     Ranges operate in ASCII collating sequence. They can also be     used  for  characters  specified  numerically,  for  example     [\000-\037]. If a range that includes letters is  used  when     caseless  matching  is set, it matches the letters in either     case. For example, [W-c] is equivalent  to  [][\^_`wxyzabc],     matched  caselessly,  and  if  character tables for the "fr"     locale are in use, [\xc8-\xcb] matches accented E characters     in both cases.     The character types \d, \D, \s, \S,  \w,  and  \W  may  also     appear  in  a  character  class, and add the characters that     they match to the class. For example, [\dABCDEF] matches any     hexadecimal  digit.  A  circumflex  can conveniently be used     with the upper case character types to specify a  more  res-     tricted set of characters than the matching lower case type.     For example, the class [^\W_] matches any letter  or  digit,     but not underscore.     All non-alphameric characters other than \,  -,  ^  (at  the     start)  and  the  terminating ] are non-special in character     classes, but it does no harm if they are escaped.POSIX CHARACTER CLASSES     Perl 5.6 (not yet released at the time of writing) is  going     to  support  the POSIX notation for character classes, which     uses names enclosed by  [:  and  :]   within  the  enclosing     square brackets. PCRE supports this notation. For example,       [01[:alpha:]%]     matches "0", "1", any alphabetic character, or "%". The sup-     ported class names are       alnum    letters and digits       alpha    letters       ascii    character codes 0 - 127       cntrl    control characters       digit    decimal digits (same as \d)       graph    printing characters, excluding space       lower    lower case letters       print    printing characters, including space       punct    printing characters, excluding letters and digits       space    white space (same as \s)       upper    upper case letters       word     "word" characters (same as \w)       xdigit   hexadecimal digits     The names "ascii" and "word" are  Perl  extensions.  Another     Perl  extension is negation, which is indicated by a ^ char-     acter after the colon. For example,       [12[:^digit:]]     matches "1", "2", or any non-digit.  PCRE  (and  Perl)  also     recogize  the POSIX syntax [.ch.] and [=ch=] where "ch" is a     "collating element", but these are  not  supported,  and  an     error is given if they are encountered.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -