📄 pcre.txt

📁 PHP v6.0 For Linux 运行环境：Win9X/ WinME/ WinNT/ Win2K/ WinXP
💻 TXT
📖 第 1 页 / 共 5 页
字号:
       PCRE library, approximately doubling its size. Only the  general  cate-       gory  properties  such as Lu and Nd are supported. Details are given in       the pcrepattern documentation.CODE VALUE OF NEWLINE       By default, PCRE treats character 10 (linefeed) as the newline  charac-       ter. This is the normal newline character on Unix-like systems. You can       compile PCRE to use character 13 (carriage return) instead by adding         --enable-newline-is-cr       to the configure command. For completeness there is  also  a  --enable-       newline-is-lf  option,  which explicitly specifies linefeed as the new-       line character.BUILDING SHARED AND STATIC LIBRARIES       The PCRE building process uses libtool to build both shared and  static       Unix  libraries by default. You can suppress one of these by adding one       of         --disable-shared         --disable-static       to the configure command, as required.POSIX MALLOC USAGE       When PCRE is called through the POSIX interface (see the pcreposix doc-       umentation),  additional  working  storage  is required for holding the       pointers to capturing substrings, because PCRE requires three  integers       per  substring,  whereas  the POSIX interface provides only two. If the       number of expected substrings is small, the wrapper function uses space       on the stack, because this is faster than using malloc() for each call.       The default threshold above which the stack is no longer used is 10; it       can be changed by adding a setting such as         --with-posix-malloc-threshold=20       to the configure command.LIMITING PCRE RESOURCE USAGE       Internally,  PCRE has a function called match(), which it calls repeat-       edly  (possibly  recursively)  when  matching  a   pattern   with   the       pcre_exec()  function.  By controlling the maximum number of times this       function may be called during a single matching operation, a limit  can       be  placed  on  the resources used by a single call to pcre_exec(). The       limit can be changed at run time, as described in the pcreapi  documen-       tation.  The default is 10 million, but this can be changed by adding a       setting such as         --with-match-limit=500000       to  the  configure  command.  This  setting  has  no  effect   on   the       pcre_dfa_exec() matching function.HANDLING VERY LARGE PATTERNS       Within  a  compiled  pattern,  offset values are used to point from one       part to another (for example, from an opening parenthesis to an  alter-       nation  metacharacter).  By default, two-byte values are used for these       offsets, leading to a maximum size for a  compiled  pattern  of  around       64K.  This  is sufficient to handle all but the most gigantic patterns.       Nevertheless, some people do want to process enormous patterns,  so  it       is  possible  to compile PCRE to use three-byte or four-byte offsets by       adding a setting such as         --with-link-size=3       to the configure command. The value given must be 2,  3,  or  4.  Using       longer  offsets slows down the operation of PCRE because it has to load       additional bytes when handling them.       If you build PCRE with an increased link size, test 2 (and  test  5  if       you  are using UTF-8) will fail. Part of the output of these tests is a       representation of the compiled pattern, and this changes with the  link       size.AVOIDING EXCESSIVE STACK USAGE       When matching with the pcre_exec() function, PCRE implements backtrack-       ing by making recursive calls to an internal function  called  match().       In  environments  where  the size of the stack is limited, this can se-       verely limit PCRE's operation. (The Unix environment does  not  usually       suffer  from  this  problem.)  An alternative approach that uses memory       from the heap to remember data, instead  of  using  recursive  function       calls,  has been implemented to work round this problem. If you want to       build a version of PCRE that works this way, add         --disable-stack-for-recursion       to the configure command. With this configuration, PCRE  will  use  the       pcre_stack_malloc  and pcre_stack_free variables to call memory manage-       ment functions. Separate functions are provided because  the  usage  is       very  predictable:  the  block sizes requested are always the same, and       the blocks are always freed in reverse order. A calling  program  might       be  able  to implement optimized functions that perform better than the       standard malloc() and  free()  functions.  PCRE  runs  noticeably  more       slowly when built in this way. This option affects only the pcre_exec()       function; it is not relevant for the the pcre_dfa_exec() function.USING EBCDIC CODE       PCRE assumes by default that it will run in an  environment  where  the       character  code  is  ASCII  (or Unicode, which is a superset of ASCII).       PCRE can, however, be compiled to  run  in  an  EBCDIC  environment  by       adding         --enable-ebcdic       to the configure command.Last updated: 15 August 2005Copyright (c) 1997-2005 University of Cambridge.------------------------------------------------------------------------------PCREMATCHING(3)                                                PCREMATCHING(3)NAME       PCRE - Perl-compatible regular expressionsPCRE MATCHING ALGORITHMS       This document describes the two different algorithms that are available       in PCRE for matching a compiled regular expression against a given sub-       ject  string.  The  "standard"  algorithm  is  the  one provided by the       pcre_exec() function.  This works in the same was  as  Perl's  matching       function, and provides a Perl-compatible matching operation.       An  alternative  algorithm is provided by the pcre_dfa_exec() function;       this operates in a different way, and is not  Perl-compatible.  It  has       advantages  and disadvantages compared with the standard algorithm, and       these are described below.       When there is only one possible way in which a given subject string can       match  a pattern, the two algorithms give the same answer. A difference       arises, however, when there are multiple possibilities. For example, if       the pattern         ^<.*>       is matched against the string         <something> <something else> <something further>       there are three possible answers. The standard algorithm finds only one       of them, whereas the DFA algorithm finds all three.REGULAR EXPRESSIONS AS TREES       The set of strings that are matched by a regular expression can be rep-       resented  as  a  tree structure. An unlimited repetition in the pattern       makes the tree of infinite size, but it is still a tree.  Matching  the       pattern  to a given subject string (from a given starting point) can be       thought of as a search of the tree.  There are  two  standard  ways  to       search  a  tree: depth-first and breadth-first, and these correspond to       the two matching algorithms provided by PCRE.THE STANDARD MATCHING ALGORITHM       In the terminology of Jeffrey Friedl's book Mastering  Regular  Expres-       sions,  the  standard  algorithm  is  an "NFA algorithm". It conducts a       depth-first search of the pattern tree. That is, it  proceeds  along  a       single path through the tree, checking that the subject matches what is       required. When there is a mismatch, the algorithm  tries  any  alterna-       tives  at  the  current point, and if they all fail, it backs up to the       previous branch point in the  tree,  and  tries  the  next  alternative       branch  at  that  level.  This often involves backing up (moving to the       left) in the subject string as well.  The  order  in  which  repetition       branches  are  tried  is controlled by the greedy or ungreedy nature of       the quantifier.       If a leaf node is reached, a matching string has  been  found,  and  at       that  point the algorithm stops. Thus, if there is more than one possi-       ble match, this algorithm returns the first one that it finds.  Whether       this  is the shortest, the longest, or some intermediate length depends       on the way the greedy and ungreedy repetition quantifiers are specified       in the pattern.       Because  it  ends  up  with a single path through the tree, it is rela-       tively straightforward for this algorithm to keep  track  of  the  sub-       strings  that  are  matched  by portions of the pattern in parentheses.       This provides support for capturing parentheses and back references.THE DFA MATCHING ALGORITHM       DFA stands for "deterministic finite automaton", but you do not need to       understand the origins of that name. This algorithm conducts a breadth-       first search of the tree. Starting from the first matching point in the       subject,  it scans the subject string from left to right, once, charac-       ter by character, and as it does  this,  it  remembers  all  the  paths       through the tree that represent valid matches.       The  scan  continues until either the end of the subject is reached, or       there are no more unterminated paths. At this point,  terminated  paths       represent  the different matching possibilities (if there are none, the       match has failed).  Thus, if there is more  than  one  possible  match,       this algorithm finds all of them, and in particular, it finds the long-       est. In PCRE, there is an option to stop the algorithm after the  first       match (which is necessarily the shortest) has been found.       Note that all the matches that are found start at the same point in the       subject. If the pattern         cat(er(pillar)?)       is matched against the string "the caterpillar catchment",  the  result       will  be the three strings "cat", "cater", and "caterpillar" that start       at the fourth character of the subject. The algorithm does not automat-       ically move on to find matches that start at later positions.       There are a number of features of PCRE regular expressions that are not       supported by the DFA matching algorithm. They are as follows:       1. Because the algorithm finds all  possible  matches,  the  greedy  or       ungreedy  nature  of repetition quantifiers is not relevant. Greedy and       ungreedy quantifiers are treated in exactly the same way.       2. When dealing with multiple paths through the tree simultaneously, it       is  not  straightforward  to  keep track of captured substrings for the       different matching possibilities, and  PCRE's  implementation  of  this       algorithm does not attempt to do this. This means that no captured sub-       strings are available.       3. Because no substrings are captured, back references within the  pat-       tern are not supported, and cause errors if encountered.       4.  For  the same reason, conditional expressions that use a backrefer-       ence as the condition are not supported.       5. Callouts are supported, but the value of the  capture_top  field  is       always 1, and the value of the capture_last field is always -1.       6.  The \C escape sequence, which (in the standard algorithm) matches a       single byte, even in UTF-8 mode, is not supported because the DFA algo-       rithm moves through the subject string one character at a time, for all       active paths through the tree.ADVANTAGES OF THE DFA ALGORITHM       Using the DFA matching algorithm provides the following advantages:       1. All possible matches (at a single point in the subject) are automat-       ically  found,  and  in particular, the longest match is found. To find       more than one match using the standard algorithm, you have to do kludgy       things with callouts.       2.  There is much better support for partial matching. The restrictions       on the content of the pattern that apply when using the standard  algo-       rithm  for partial matching do not apply to the DFA algorithm. For non-       anchored patterns, the starting position of a partial match  is  avail-       able.       3.  Because  the  DFA algorithm scans the subject string just once, and       never needs to backtrack, it is possible  to  pass  very  long  subject       strings  to  the matching function in several pieces, checking for par-       tial matching each time.DISADVANTAGES OF THE DFA ALGORITHM
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -