📄 pcre.txt

📁 php-4.4.7学习linux时下载的源代码
💻 TXT
📖 第 1 页 / 共 5 页
字号:
       function.UNICODE CHARACTER PROPERTY SUPPORT       UTF-8 support allows PCRE to process character values greater than  255       in  the  strings that it handles. On its own, however, it does not pro-       vide any facilities for accessing the properties of such characters. If       you  want  to  be able to use the pattern escapes \P, \p, and \X, which       refer to Unicode character properties, you must add         --enable-unicode-properties       to the configure command. This implies UTF-8 support, even if you  have       not explicitly requested it.       Including  Unicode  property  support  adds around 90K of tables to the       PCRE library, approximately doubling its size. Only the  general  cate-       gory  properties  such as Lu and Nd are supported. Details are given in       the pcrepattern documentation.CODE VALUE OF NEWLINE       By default, PCRE interprets character 10 (linefeed, LF)  as  indicating       the  end  of  a line. This is the normal newline character on Unix-like       systems. You can compile PCRE to use character 13 (carriage return, CR)       instead, by adding         --enable-newline-is-cr       to  the  configure  command.  There  is  also  a --enable-newline-is-lf       option, which explicitly specifies linefeed as the newline character.       Alternatively, you can specify that line endings are to be indicated by       the two character sequence CRLF. If you want this, add         --enable-newline-is-crlf       to the configure command. There is a fourth option, specified by         --enable-newline-is-any       which causes PCRE to recognize any Unicode newline sequence.       Whatever  line  ending convention is selected when PCRE is built can be       overridden when the library functions are called. At build time  it  is       conventional to use the standard for your operating system.BUILDING SHARED AND STATIC LIBRARIES       The  PCRE building process uses libtool to build both shared and static       Unix libraries by default. You can suppress one of these by adding  one       of         --disable-shared         --disable-static       to the configure command, as required.POSIX MALLOC USAGE       When PCRE is called through the POSIX interface (see the pcreposix doc-       umentation), additional working storage is  required  for  holding  the       pointers  to capturing substrings, because PCRE requires three integers       per substring, whereas the POSIX interface provides only  two.  If  the       number of expected substrings is small, the wrapper function uses space       on the stack, because this is faster than using malloc() for each call.       The default threshold above which the stack is no longer used is 10; it       can be changed by adding a setting such as         --with-posix-malloc-threshold=20       to the configure command.HANDLING VERY LARGE PATTERNS       Within a compiled pattern, offset values are used  to  point  from  one       part  to another (for example, from an opening parenthesis to an alter-       nation metacharacter). By default, two-byte values are used  for  these       offsets,  leading  to  a  maximum size for a compiled pattern of around       64K. This is sufficient to handle all but the most  gigantic  patterns.       Nevertheless,  some  people do want to process enormous patterns, so it       is possible to compile PCRE to use three-byte or four-byte  offsets  by       adding a setting such as         --with-link-size=3       to  the  configure  command.  The value given must be 2, 3, or 4. Using       longer offsets slows down the operation of PCRE because it has to  load       additional bytes when handling them.       If  you  build  PCRE with an increased link size, test 2 (and test 5 if       you are using UTF-8) will fail. Part of the output of these tests is  a       representation  of the compiled pattern, and this changes with the link       size.AVOIDING EXCESSIVE STACK USAGE       When matching with the pcre_exec() function, PCRE implements backtrack-       ing  by  making recursive calls to an internal function called match().       In environments where the size of the stack is limited,  this  can  se-       verely  limit  PCRE's operation. (The Unix environment does not usually       suffer from this problem, but it may sometimes be necessary to increase       the  maximum  stack size.  There is a discussion in the pcrestack docu-       mentation.) An alternative approach to recursion that uses memory  from       the  heap  to remember data, instead of using recursive function calls,       has been implemented to work round the problem of limited  stack  size.       If you want to build a version of PCRE that works this way, add         --disable-stack-for-recursion       to  the  configure  command. With this configuration, PCRE will use the       pcre_stack_malloc and pcre_stack_free variables to call memory  manage-       ment  functions.  Separate  functions are provided because the usage is       very predictable: the block sizes requested are always  the  same,  and       the  blocks  are always freed in reverse order. A calling program might       be able to implement optimized functions that perform better  than  the       standard  malloc()  and  free()  functions.  PCRE  runs noticeably more       slowly when built in this way. This option affects only the pcre_exec()       function; it is not relevant for the the pcre_dfa_exec() function.LIMITING PCRE RESOURCE USAGE       Internally,  PCRE has a function called match(), which it calls repeat-       edly  (sometimes  recursively)  when  matching  a  pattern   with   the       pcre_exec()  function.  By controlling the maximum number of times this       function may be called during a single matching operation, a limit  can       be  placed  on  the resources used by a single call to pcre_exec(). The       limit can be changed at run time, as described in the pcreapi  documen-       tation.  The default is 10 million, but this can be changed by adding a       setting such as         --with-match-limit=500000       to  the  configure  command.  This  setting  has  no  effect   on   the       pcre_dfa_exec() matching function.       In  some  environments  it is desirable to limit the depth of recursive       calls of match() more strictly than the total number of calls, in order       to  restrict  the maximum amount of stack (or heap, if --disable-stack-       for-recursion is specified) that is used. A second limit controls this;       it  defaults  to  the  value  that is set for --with-match-limit, which       imposes no additional constraints. However, you can set a  lower  limit       by adding, for example,         --with-match-limit-recursion=10000       to  the  configure  command.  This  value can also be overridden at run       time.USING EBCDIC CODE       PCRE assumes by default that it will run in an  environment  where  the       character  code  is  ASCII  (or Unicode, which is a superset of ASCII).       PCRE can, however, be compiled to  run  in  an  EBCDIC  environment  by       adding         --enable-ebcdic       to the configure command.SEE ALSO       pcreapi(3), pcre_config(3).Last updated: 30 November 2006Copyright (c) 1997-2006 University of Cambridge.------------------------------------------------------------------------------PCREMATCHING(3)                                                PCREMATCHING(3)NAME       PCRE - Perl-compatible regular expressionsPCRE MATCHING ALGORITHMS       This document describes the two different algorithms that are available       in PCRE for matching a compiled regular expression against a given sub-       ject  string.  The  "standard"  algorithm  is  the  one provided by the       pcre_exec() function.  This works in the same was  as  Perl's  matching       function, and provides a Perl-compatible matching operation.       An  alternative  algorithm is provided by the pcre_dfa_exec() function;       this operates in a different way, and is not  Perl-compatible.  It  has       advantages  and disadvantages compared with the standard algorithm, and       these are described below.       When there is only one possible way in which a given subject string can       match  a pattern, the two algorithms give the same answer. A difference       arises, however, when there are multiple possibilities. For example, if       the pattern         ^<.*>       is matched against the string         <something> <something else> <something further>       there are three possible answers. The standard algorithm finds only one       of them, whereas the alternative algorithm finds all three.REGULAR EXPRESSIONS AS TREES       The set of strings that are matched by a regular expression can be rep-       resented  as  a  tree structure. An unlimited repetition in the pattern       makes the tree of infinite size, but it is still a tree.  Matching  the       pattern  to a given subject string (from a given starting point) can be       thought of as a search of the tree.  There are two  ways  to  search  a       tree:  depth-first  and  breadth-first, and these correspond to the two       matching algorithms provided by PCRE.THE STANDARD MATCHING ALGORITHM       In the terminology of Jeffrey Friedl's book Mastering  Regular  Expres-       sions,  the  standard  algorithm  is  an "NFA algorithm". It conducts a       depth-first search of the pattern tree. That is, it  proceeds  along  a       single path through the tree, checking that the subject matches what is       required. When there is a mismatch, the algorithm  tries  any  alterna-       tives  at  the  current point, and if they all fail, it backs up to the       previous branch point in the  tree,  and  tries  the  next  alternative       branch  at  that  level.  This often involves backing up (moving to the       left) in the subject string as well.  The  order  in  which  repetition       branches  are  tried  is controlled by the greedy or ungreedy nature of       the quantifier.       If a leaf node is reached, a matching string has  been  found,  and  at       that  point the algorithm stops. Thus, if there is more than one possi-       ble match, this algorithm returns the first one that it finds.  Whether       this  is the shortest, the longest, or some intermediate length depends       on the way the greedy and ungreedy repetition quantifiers are specified       in the pattern.       Because  it  ends  up  with a single path through the tree, it is rela-       tively straightforward for this algorithm to keep  track  of  the  sub-       strings  that  are  matched  by portions of the pattern in parentheses.       This provides support for capturing parentheses and back references.THE ALTERNATIVE MATCHING ALGORITHM       This algorithm conducts a breadth-first search of  the  tree.  Starting       from  the  first  matching  point  in the subject, it scans the subject       string from left to right, once, character by character, and as it does       this,  it remembers all the paths through the tree that represent valid       matches. In Friedl's terminology, this is a kind  of  "DFA  algorithm",       though  it is not implemented as a traditional finite state machine (it       keeps multiple states active simultaneously).       The scan continues until either the end of the subject is  reached,  or       there  are  no more unterminated paths. At this point, terminated paths       represent the different matching possibilities (if there are none,  the       match  has  failed).   Thus,  if there is more than one possible match,       this algorithm finds all of them, and in particular, it finds the long-       est.  In PCRE, there is an option to stop the algorithm after the first       match (which is necessarily the shortest) has been found.       Note that all the matches that are found start at the same point in the       subject. If the pattern
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -