📄 pcre.txt
字号:
and PS (paragraph separator, U+2029). Each of the first three conventions is used by at least one operating system as its standard newline sequence. When PCRE is built, a default can be specified. The default default is LF, which is the Unix stan- dard. When PCRE is run, the default can be overridden, either when a pattern is compiled, or when it is matched. In the PCRE documentation the word "newline" is used to mean "the char- acter or pair of characters that indicate a line break". The choice of newline convention affects the handling of the dot, circumflex, and dollar metacharacters, the handling of #-comments in /x mode, and, when CRLF is a recognized line ending sequence, the match position advance- ment for a non-anchored pattern. The choice of newline convention does not affect the interpretation of the \n or \r escape sequences.MULTITHREADING The PCRE functions can be used in multi-threading applications, with the proviso that the memory management functions pointed to by pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the callout function pointed to by pcre_callout, are shared by all threads. The compiled form of a regular expression is not altered during match- ing, so the same compiled pattern can safely be used by several threads at once.SAVING PRECOMPILED PATTERNS FOR LATER USE The compiled form of a regular expression can be saved and re-used at a later time, possibly by a different program, and even on a host other than the one on which it was compiled. Details are given in the pcreprecompile documentation.CHECKING BUILD-TIME OPTIONS int pcre_config(int what, void *where); The function pcre_config() makes it possible for a PCRE client to dis- cover which optional features have been compiled into the PCRE library. The pcrebuild documentation has more details about these optional fea- tures. The first argument for pcre_config() is an integer, specifying which information is required; the second argument is a pointer to a variable into which the information is placed. The following information is available: PCRE_CONFIG_UTF8 The output is an integer that is set to one if UTF-8 support is avail- able; otherwise it is set to zero. PCRE_CONFIG_UNICODE_PROPERTIES The output is an integer that is set to one if support for Unicode character properties is available; otherwise it is set to zero. PCRE_CONFIG_NEWLINE The output is an integer whose value specifies the default character sequence that is recognized as meaning "newline". The four values that are supported are: 10 for LF, 13 for CR, 3338 for CRLF, and -1 for ANY. The default should normally be the standard sequence for your operating system. PCRE_CONFIG_LINK_SIZE The output is an integer that contains the number of bytes used for internal linkage in compiled regular expressions. The value is 2, 3, or 4. Larger values allow larger regular expressions to be compiled, at the expense of slower matching. The default value of 2 is sufficient for all but the most massive patterns, since it allows the compiled pattern to be up to 64K in size. PCRE_CONFIG_POSIX_MALLOC_THRESHOLD The output is an integer that contains the threshold above which the POSIX interface uses malloc() for output vectors. Further details are given in the pcreposix documentation. PCRE_CONFIG_MATCH_LIMIT The output is an integer that gives the default limit for the number of internal matching function calls in a pcre_exec() execution. Further details are given with pcre_exec() below. PCRE_CONFIG_MATCH_LIMIT_RECURSION The output is an integer that gives the default limit for the depth of recursion when calling the internal matching function in a pcre_exec() execution. Further details are given with pcre_exec() below. PCRE_CONFIG_STACKRECURSE The output is an integer that is set to one if internal recursion when running pcre_exec() is implemented by recursive function calls that use the stack to remember their state. This is the usual way that PCRE is compiled. The output is zero if PCRE was compiled to use blocks of data on the heap instead of recursive function calls. In this case, pcre_stack_malloc and pcre_stack_free are called to manage memory blocks on the heap, thus avoiding the use of the stack.COMPILING A PATTERN pcre *pcre_compile(const char *pattern, int options, const char **errptr, int *erroffset, const unsigned char *tableptr); pcre *pcre_compile2(const char *pattern, int options, int *errorcodeptr, const char **errptr, int *erroffset, const unsigned char *tableptr); Either of the functions pcre_compile() or pcre_compile2() can be called to compile a pattern into an internal form. The only difference between the two interfaces is that pcre_compile2() has an additional argument, errorcodeptr, via which a numerical error code can be returned. The pattern is a C string terminated by a binary zero, and is passed in the pattern argument. A pointer to a single block of memory that is obtained via pcre_malloc is returned. This contains the compiled code and related data. The pcre type is defined for the returned block; this is a typedef for a structure whose contents are not externally defined. It is up to the caller to free the memory (via pcre_free) when it is no longer required. Although the compiled code of a PCRE regex is relocatable, that is, it does not depend on memory location, the complete pcre data block is not fully relocatable, because it may contain a copy of the tableptr argu- ment, which is an address (see below). The options argument contains various bit settings that affect the com- pilation. It should be zero if no options are required. The available options are described below. Some of them, in particular, those that are compatible with Perl, can also be set and unset from within the pattern (see the detailed description in the pcrepattern documenta- tion). For these options, the contents of the options argument speci- fies their initial settings at the start of compilation and execution. The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the time of matching as well as at compile time. If errptr is NULL, pcre_compile() returns NULL immediately. Otherwise, if compilation of a pattern fails, pcre_compile() returns NULL, and sets the variable pointed to by errptr to point to a textual error mes- sage. This is a static string that is part of the library. You must not try to free it. The offset from the start of the pattern to the charac- ter where the error was discovered is placed in the variable pointed to by erroffset, which must not be NULL. If it is, an immediate error is given. If pcre_compile2() is used instead of pcre_compile(), and the error- codeptr argument is not NULL, a non-zero error code number is returned via this argument in the event of an error. This is in addition to the textual error message. Error codes and messages are listed below. If the final argument, tableptr, is NULL, PCRE uses a default set of character tables that are built when PCRE is compiled, using the default C locale. Otherwise, tableptr must be an address that is the result of a call to pcre_maketables(). This value is stored with the compiled pattern, and used again by pcre_exec(), unless another table pointer is passed to it. For more discussion, see the section on locale support below. This code fragment shows a typical straightforward call to pcre_com- pile(): pcre *re; const char *error; int erroffset; re = pcre_compile( "^A.*Z", /* the pattern */ 0, /* default options */ &error, /* for error message */ &erroffset, /* for error offset */ NULL); /* use default character tables */ The following names for option bits are defined in the pcre.h header file: PCRE_ANCHORED If this bit is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl. PCRE_AUTO_CALLOUT If this bit is set, pcre_compile() automatically inserts callout items, all with number 255, before each pattern item. For discussion of the callout facility, see the pcrecallout documentation. PCRE_CASELESS If this bit is set, letters in the pattern match both upper and lower case letters. It is equivalent to Perl's /i option, and it can be changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE always understands the concept of case for characters whose values are less than 128, so caseless matching is always possible. For characters with higher values, the concept of case is supported if PCRE is com- piled with Unicode property support, but not otherwise. If you want to use caseless matching for characters 128 and above, you must ensure that PCRE is compiled with Unicode property support as well as with UTF-8 support. PCRE_DOLLAR_ENDONLY If this bit is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before a newline at the end of the string (but not before any other newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set. There is no equivalent to this option in Perl, and no way to set it within a pattern. PCRE_DOTALL If this bit is set, a dot metacharater in the pattern matches all char- acters, including those that indicate newline. Without it, a dot does not match when the current position is at a newline. This option is equivalent to Perl's /s option, and it can be changed within a pattern by a (?s) option setting. A negative class such as [^a] always matches newline characters, independent of the setting of this option. PCRE_DUPNAMES If this bit is set, names used to identify capturing subpatterns need not be unique. This can be helpful for certain types of pattern when it is known that only one instance of the named subpattern can ever be matched. There are more details of named subpatterns below; see also the pcrepattern documentation. PCRE_EXTENDED If this bit is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class. White- space does not include the VT character (code 11). In addition, charac- ters between an unescaped # outside a character class and the next new- line, inclusive, are also ignored. This is equivalent to Perl's /x option, and it can be changed within a pattern by a (?x) option set- ting. This option makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern. PCRE_EXTRA This option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of very little use. When set, any backslash in a pattern that is followed by a letter that has no special meaning causes an error, thus reserving these combinations for future expansion. By default, as in Perl, a backslash followed by a letter with no special meaning is treated as a literal. (Perl can, however, be persuaded to give a warning for this.) There are at present no other features controlled by this option. It can also be set by a (?X) option setting within a pattern. PCRE_FIRSTLINE If this option is set, an unanchored pattern is required to match before or at the first newline in the subject string, though the matched text may continue over the newline.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -