📄 pcre.txt
字号:
cat(er(pillar)?) is matched against the string "the caterpillar catchment", the result will be the three strings "cat", "cater", and "caterpillar" that start at the fourth character of the subject. The algorithm does not automat- ically move on to find matches that start at later positions. There are a number of features of PCRE regular expressions that are not supported by the alternative matching algorithm. They are as follows: 1. Because the algorithm finds all possible matches, the greedy or ungreedy nature of repetition quantifiers is not relevant. Greedy and ungreedy quantifiers are treated in exactly the same way. However, pos- sessive quantifiers can make a difference when what follows could also match what is quantified, for example in a pattern like this: ^a++\w! This pattern matches "aaab!" but not "aaa!", which would be matched by a non-possessive quantifier. Similarly, if an atomic group is present, it is matched as if it were a standalone pattern at the current point, and the longest match is then "locked in" for the rest of the overall pattern. 2. When dealing with multiple paths through the tree simultaneously, it is not straightforward to keep track of captured substrings for the different matching possibilities, and PCRE's implementation of this algorithm does not attempt to do this. This means that no captured sub- strings are available. 3. Because no substrings are captured, back references within the pat- tern are not supported, and cause errors if encountered. 4. For the same reason, conditional expressions that use a backrefer- ence as the condition or test for a specific group recursion are not supported. 5. Callouts are supported, but the value of the capture_top field is always 1, and the value of the capture_last field is always -1. 6. The \C escape sequence, which (in the standard algorithm) matches a single byte, even in UTF-8 mode, is not supported because the alterna- tive algorithm moves through the subject string one character at a time, for all active paths through the tree.ADVANTAGES OF THE ALTERNATIVE ALGORITHM Using the alternative matching algorithm provides the following advan- tages: 1. All possible matches (at a single point in the subject) are automat- ically found, and in particular, the longest match is found. To find more than one match using the standard algorithm, you have to do kludgy things with callouts. 2. There is much better support for partial matching. The restrictions on the content of the pattern that apply when using the standard algo- rithm for partial matching do not apply to the alternative algorithm. For non-anchored patterns, the starting position of a partial match is available. 3. Because the alternative algorithm scans the subject string just once, and never needs to backtrack, it is possible to pass very long subject strings to the matching function in several pieces, checking for partial matching each time.DISADVANTAGES OF THE ALTERNATIVE ALGORITHM The alternative algorithm suffers from a number of disadvantages: 1. It is substantially slower than the standard algorithm. This is partly because it has to search for all possible matches, but is also because it is less susceptible to optimization. 2. Capturing parentheses and back references are not supported. 3. Although atomic groups are supported, their use does not provide the performance advantage that it does for the standard algorithm.Last updated: 24 November 2006Copyright (c) 1997-2006 University of Cambridge.------------------------------------------------------------------------------PCREAPI(3) PCREAPI(3)NAME PCRE - Perl-compatible regular expressionsPCRE NATIVE API #include <pcre.h> pcre *pcre_compile(const char *pattern, int options, const char **errptr, int *erroffset, const unsigned char *tableptr); pcre *pcre_compile2(const char *pattern, int options, int *errorcodeptr, const char **errptr, int *erroffset, const unsigned char *tableptr); pcre_extra *pcre_study(const pcre *code, int options, const char **errptr); int pcre_exec(const pcre *code, const pcre_extra *extra, const char *subject, int length, int startoffset, int options, int *ovector, int ovecsize); int pcre_dfa_exec(const pcre *code, const pcre_extra *extra, const char *subject, int length, int startoffset, int options, int *ovector, int ovecsize, int *workspace, int wscount); int pcre_copy_named_substring(const pcre *code, const char *subject, int *ovector, int stringcount, const char *stringname, char *buffer, int buffersize); int pcre_copy_substring(const char *subject, int *ovector, int stringcount, int stringnumber, char *buffer, int buffersize); int pcre_get_named_substring(const pcre *code, const char *subject, int *ovector, int stringcount, const char *stringname, const char **stringptr); int pcre_get_stringnumber(const pcre *code, const char *name); int pcre_get_stringtable_entries(const pcre *code, const char *name, char **first, char **last); int pcre_get_substring(const char *subject, int *ovector, int stringcount, int stringnumber, const char **stringptr); int pcre_get_substring_list(const char *subject, int *ovector, int stringcount, const char ***listptr); void pcre_free_substring(const char *stringptr); void pcre_free_substring_list(const char **stringptr); const unsigned char *pcre_maketables(void); int pcre_fullinfo(const pcre *code, const pcre_extra *extra, int what, void *where); int pcre_info(const pcre *code, int *optptr, int *firstcharptr); int pcre_refcount(pcre *code, int adjust); int pcre_config(int what, void *where); char *pcre_version(void); void *(*pcre_malloc)(size_t); void (*pcre_free)(void *); void *(*pcre_stack_malloc)(size_t); void (*pcre_stack_free)(void *); int (*pcre_callout)(pcre_callout_block *);PCRE API OVERVIEW PCRE has its own native API, which is described in this document. There are also some wrapper functions that correspond to the POSIX regular expression API. These are described in the pcreposix documentation. Both of these APIs define a set of C function calls. A C++ wrapper is distributed with PCRE. It is documented in the pcrecpp page. The native API C function prototypes are defined in the header file pcre.h, and on Unix systems the library itself is called libpcre. It can normally be accessed by adding -lpcre to the command for linking an application that uses PCRE. The header file defines the macros PCRE_MAJOR and PCRE_MINOR to contain the major and minor release num- bers for the library. Applications can use these to include support for different releases of PCRE. The functions pcre_compile(), pcre_compile2(), pcre_study(), and pcre_exec() are used for compiling and matching regular expressions in a Perl-compatible manner. A sample program that demonstrates the sim- plest way of using them is provided in the file called pcredemo.c in the source distribution. The pcresample documentation describes how to run it. A second matching function, pcre_dfa_exec(), which is not Perl-compati- ble, is also provided. This uses a different algorithm for the match- ing. The alternative algorithm finds all possible matches (at a given point in the subject), and scans the subject just once. However, this algorithm does not return captured substrings. A description of the two matching algorithms and their advantages and disadvantages is given in the pcrematching documentation. In addition to the main compiling and matching functions, there are convenience functions for extracting captured substrings from a subject string that is matched by pcre_exec(). They are: pcre_copy_substring() pcre_copy_named_substring() pcre_get_substring() pcre_get_named_substring() pcre_get_substring_list() pcre_get_stringnumber() pcre_get_stringtable_entries() pcre_free_substring() and pcre_free_substring_list() are also provided, to free the memory used for extracted strings. The function pcre_maketables() is used to build a set of character tables in the current locale for passing to pcre_compile(), pcre_exec(), or pcre_dfa_exec(). This is an optional facility that is provided for specialist use. Most commonly, no special tables are passed, in which case internal tables that are generated when PCRE is built are used. The function pcre_fullinfo() is used to find out information about a compiled pattern; pcre_info() is an obsolete version that returns only some of the available information, but is retained for backwards com- patibility. The function pcre_version() returns a pointer to a string containing the version of PCRE and its date of release. The function pcre_refcount() maintains a reference count in a data block containing a compiled pattern. This is provided for the benefit of object-oriented applications. The global variables pcre_malloc and pcre_free initially contain the entry points of the standard malloc() and free() functions, respec- tively. PCRE calls the memory management functions via these variables, so a calling program can replace them if it wishes to intercept the calls. This should be done before calling any PCRE functions. The global variables pcre_stack_malloc and pcre_stack_free are also indirections to memory management functions. These special functions are used only when PCRE is compiled to use the heap for remembering data, instead of recursive function calls, when running the pcre_exec() function. See the pcrebuild documentation for details of how to do this. It is a non-standard way of building PCRE, for use in environ- ments that have limited stacks. Because of the greater use of memory management, it runs more slowly. Separate functions are provided so that special-purpose external code can be used for this case. When used, these functions are always called in a stack-like manner (last obtained, first freed), and always for memory blocks of the same size. There is a discussion about PCRE's stack usage in the pcrestack docu- mentation. The global variable pcre_callout initially contains NULL. It can be set by the caller to a "callout" function, which PCRE will then call at specified points during a matching operation. Details are given in the pcrecallout documentation.NEWLINES PCRE supports four different conventions for indicating line breaks in strings: a single CR (carriage return) character, a single LF (line- feed) character, the two-character sequence CRLF, or any Unicode new- line sequence. The Unicode newline sequences are the three just men- tioned, plus the single characters VT (vertical tab, U+000B), FF (form- feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028),
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -