📄 pcre.txt
字号:
PCRE_MULTILINE By default, PCRE treats the subject string as consisting of a single line of characters (even if it actually contains newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY is set). This is the same as Perl. When PCRE_MULTILINE it is set, the "start of line" and "end of line" constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a (?m) option setting. If there are no new- lines in a subject string, or no occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect. PCRE_NEWLINE_CR PCRE_NEWLINE_LF PCRE_NEWLINE_CRLF PCRE_NEWLINE_ANY These options override the default newline definition that was chosen when PCRE was built. Setting the first or the second specifies that a newline is indicated by a single character (CR or LF, respectively). Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character CRLF sequence. Setting PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be recognized. The Unicode newline sequences are the three just mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). The last two are recognized only in UTF-8 mode. The newline setting in the options word uses three bits that are treated as a number, giving eight possibilities. Currently only five are used (default plus the four values above). This means that if you set more than one newline option, the combination may or may not be sensible. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equiva- lent to PCRE_NEWLINE_CRLF, but other combinations yield unused numbers and cause an error. The only time that a line break is specially recognized when compiling a pattern is if PCRE_EXTENDED is set, and an unescaped # outside a character class is encountered. This indicates a comment that lasts until after the next line break sequence. In other circumstances, line break sequences are treated as literal data, except that in PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters and are therefore ignored. The newline option that is set at compile time becomes the default that is used for pcre_exec() and pcre_dfa_exec(), but it can be overridden. PCRE_NO_AUTO_CAPTURE If this option is set, it disables the use of numbered capturing paren- theses in the pattern. Any opening parenthesis that is not followed by ? behaves as if it were followed by ?: but named parentheses can still be used for capturing (and they acquire numbers in the usual way). There is no equivalent of this option in Perl. PCRE_UNGREEDY This option inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It is not compatible with Perl. It can also be set by a (?U) option setting within the pattern. PCRE_UTF8 This option causes PCRE to regard both the pattern and the subject as strings of UTF-8 characters instead of single-byte character strings. However, it is available only when PCRE is built to include UTF-8 sup- port. If not, the use of this option provokes an error. Details of how this option changes the behaviour of PCRE are given in the section on UTF-8 support in the main pcre page. PCRE_NO_UTF8_CHECK When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is automatically checked. If an invalid UTF-8 sequence of bytes is found, pcre_compile() returns an error. If you already know that your pattern is valid, and you want to skip this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid UTF-8 string as a pattern is undefined. It may cause your program to crash. Note that this option can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the UTF-8 validity check- ing of subject strings.COMPILATION ERROR CODES The following table lists the error codes than may be returned by pcre_compile2(), along with the error messages that may be returned by both compiling functions. As PCRE has developed, some error codes have fallen out of use. To avoid confusion, they have not been re-used. 0 no error 1 \ at end of pattern 2 \c at end of pattern 3 unrecognized character follows \ 4 numbers out of order in {} quantifier 5 number too big in {} quantifier 6 missing terminating ] for character class 7 invalid escape sequence in character class 8 range out of order in character class 9 nothing to repeat 10 [this code is not in use] 11 internal error: unexpected repeat 12 unrecognized character after (? 13 POSIX named classes are supported only within a class 14 missing ) 15 reference to non-existent subpattern 16 erroffset passed as NULL 17 unknown option bit(s) set 18 missing ) after comment 19 [this code is not in use] 20 regular expression too large 21 failed to get memory 22 unmatched parentheses 23 internal error: code overflow 24 unrecognized character after (?< 25 lookbehind assertion is not fixed length 26 malformed number or name after (?( 27 conditional group contains more than two branches 28 assertion expected after (?( 29 (?R or (?digits must be followed by ) 30 unknown POSIX class name 31 POSIX collating elements are not supported 32 this version of PCRE is not compiled with PCRE_UTF8 support 33 [this code is not in use] 34 character value in \x{...} sequence is too large 35 invalid condition (?(0) 36 \C not allowed in lookbehind assertion 37 PCRE does not support \L, \l, \N, \U, or \u 38 number after (?C is > 255 39 closing ) for (?C expected 40 recursive call could loop indefinitely 41 unrecognized character after (?P 42 syntax error in subpattern name (missing terminator) 43 two named subpatterns have the same name 44 invalid UTF-8 string 45 support for \P, \p, and \X has not been compiled 46 malformed \P or \p sequence 47 unknown property name after \P or \p 48 subpattern name is too long (maximum 32 characters) 49 too many named subpatterns (maximum 10,000) 50 repeated subpattern is too long 51 octal value is greater than \377 (not in UTF-8 mode) 52 internal error: overran compiling workspace 53 internal error: previously-checked referenced subpattern not found 54 DEFINE group contains more than one branch 55 repeating a DEFINE group is not allowed 56 inconsistent NEWLINE options"STUDYING A PATTERN pcre_extra *pcre_study(const pcre *code, int options const char **errptr); If a compiled pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. The function pcre_study() takes a pointer to a compiled pat- tern as its first argument. If studying the pattern produces additional information that will help speed up matching, pcre_study() returns a pointer to a pcre_extra block, in which the study_data field points to the results of the study. The returned value from pcre_study() can be passed directly to pcre_exec(). However, a pcre_extra block also contains other fields that can be set by the caller before the block is passed; these are described below in the section on matching a pattern. If studying the pattern does not produce any additional information pcre_study() returns NULL. In that circumstance, if the calling program wants to pass any of the other fields to pcre_exec(), it must set up its own pcre_extra block. The second argument of pcre_study() contains option bits. At present, no options are defined, and this argument should always be zero. The third argument for pcre_study() is a pointer for an error message. If studying succeeds (even if no data is returned), the variable it points to is set to NULL. Otherwise it is set to point to a textual error message. This is a static string that is part of the library. You must not try to free it. You should test the error pointer for NULL after calling pcre_study(), to be sure that it has run successfully. This is a typical call to pcre_study(): pcre_extra *pe; pe = pcre_study( re, /* result of pcre_compile() */ 0, /* no options exist */ &error); /* set to NULL or points to a message */ At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character. A bitmap of possi- ble starting bytes is created.LOCALE SUPPORT PCRE handles caseless matching, and determines whether characters are letters digits, or whatever, by reference to a set of tables, indexed by character value. When running in UTF-8 mode, this applies only to characters with codes less than 128. Higher-valued codes never match escapes such as \w or \d, but can be tested with \p if PCRE is built with Unicode character property support. The use of locales with Uni- code is discouraged. An internal set of tables is created in the default C locale when PCRE is built. This is used when the final argument of pcre_compile() is NULL, and is sufficient for many applications. An alternative set of tables can, however, be supplied. These may be created in a different locale from the default. As more and more applications change to using Unicode, the need for this locale support is expected to die away. External tables are built by calling the pcre_maketables() function, which has no arguments, in the relevant locale. The result can then be passed to pcre_compile() or pcre_exec() as often as necessary. For example, to build and use tables that are appropriate for the French locale (where accented characters with values greater than 128 are treated as letters), the following code could be used: setlocale(LC_CTYPE, "fr_FR"); tables = pcre_maketables(); re = pcre_compile(..., tables); When pcre_maketables() runs, the tables are built in memory that is obtained via pcre_malloc. It is the caller's responsibility to ensure that the memory containing the tables remains available for as long as it is needed. The pointer that is passed to pcre_compile() is saved with the compiled pattern, and the same tables are used via this pointer by pcre_study() and normally also by pcre_exec(). Thus, by default, for any single pat- tern, compilation, studying and matching all happen in the same locale, but different patterns can be compiled in different locales. It is possible to pass a table pointer or NULL (indicating the use of the internal tables) to pcre_exec(). Although not intended for this purpose, this facility could be used to match a pattern in a different locale from the one in which it was compiled. Passing table pointers at run time is discussed below in the section on matching a pattern.INFORMATION ABOUT A PATTERN int pcre_fullinfo(const pcre *code, const pcre_extra *extra, int what, void *where); The pcre_fullinfo() function returns information about a compiled pat- tern. It replaces the obsolete pcre_info() function, which is neverthe- less retained for backwards compability (and is documented below). The first argument for pcre_fullinfo() is a pointer to the compiled pattern. The second argument is the result of pcre_study(), or NULL if the pattern was not studied. The third argument specifies which piece of information is required, and the fourth argument is a pointer to a variable to receive the data. The yield of the function is zero for success, or one of the following negative numbers: PCRE_ERROR_NULL the argument code was NULL the argument where was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found PCRE_ERROR_BADOPTION the value of what was invalid The "magic number"
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -