📄 pcre.txt
字号:
The third argument for pcre_study() is a pointer to an error message. If studying succeeds (even if no data is returned), the variable it points to is set to NULL. Otherwise it points to a textual error message. This is a typical call to pcre_study(): pcre_extra *pe; pe = pcre_study( re, /* result of pcre_compile() */ 0, /* no options exist */ &error); /* set to NULL or points to a message */ At present, studying a pattern is useful only for non- anchored patterns that do not have a single fixed starting character. A bitmap of possible starting characters is created.LOCALE SUPPORT PCRE handles caseless matching, and determines whether char- acters are letters, digits, or whatever, by reference to a set of tables. The library contains a default set of tables which is created in the default C locale when PCRE is com- piled. This is used when the final argument of pcre_compile() is NULL, and is sufficient for many applica- tions. An alternative set of tables can, however, be supplied. Such tables are built by calling the pcre_maketables() function, which has no arguments, in the relevant locale. The result can then be passed to pcre_compile() as often as necessary. For example, to build and use tables that are appropriate for the French locale (where accented characters with codes greater than 128 are treated as letters), the following code could be used: setlocale(LC_CTYPE, "fr"); tables = pcre_maketables(); re = pcre_compile(..., tables); The tables are built in memory that is obtained via pcre_malloc. The pointer that is passed to pcre_compile is saved with the compiled pattern, and the same tables are used via this pointer by pcre_study() and pcre_exec(). Thus for any single pattern, compilation, studying and matching all happen in the same locale, but different patterns can be compiled in different locales. It is the caller's responsi- bility to ensure that the memory containing the tables remains available for as long as it is needed.INFORMATION ABOUT A PATTERN The pcre_fullinfo() function returns information about a compiled pattern. It replaces the obsolete pcre_info() func- tion, which is nevertheless retained for backwards compabil- ity (and is documented below). The first argument for pcre_fullinfo() is a pointer to the compiled pattern. The second argument is the result of pcre_study(), or NULL if the pattern was not studied. The third argument specifies which piece of information is required, while the fourth argument is a pointer to a vari- able to receive the data. The yield of the function is zero for success, or one of the following negative numbers: PCRE_ERROR_NULL the argument code was NULL the argument where was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found PCRE_ERROR_BADOPTION the value of what was invalid Here is a typical call of pcre_fullinfo(), to obtain the length of the compiled pattern: int rc; unsigned long int length; rc = pcre_fullinfo( re, /* result of pcre_compile() */ pe, /* result of pcre_study(), or NULL */ PCRE_INFO_SIZE, /* what is required */ &length); /* where to put the data */ The possible values for the third argument are defined in pcre.h, and are as follows: PCRE_INFO_OPTIONS Return a copy of the options with which the pattern was com- piled. The fourth argument should point to an unsigned long int variable. These option bits are those specified in the call to pcre_compile(), modified by any top-level option settings within the pattern itself, and with the PCRE_ANCHORED bit forcibly set if the form of the pattern implies that it can match only at the start of a subject string. PCRE_INFO_SIZE Return the size of the compiled pattern, that is, the value that was passed as the argument to pcre_malloc() when PCRE was getting memory in which to place the compiled data. The fourth argument should point to a size_t variable. PCRE_INFO_CAPTURECOUNT Return the number of capturing subpatterns in the pattern. The fourth argument should point to an int variable. PCRE_INFO_BACKREFMAX Return the number of the highest back reference in the pat- tern. The fourth argument should point to an int variable. Zero is returned if there are no back references. PCRE_INFO_FIRSTCHAR Return information about the first character of any matched string, for a non-anchored pattern. If there is a fixed first character, e.g. from a pattern such as (cat|cow|coyote), it is returned in the integer pointed to by where. Otherwise, if either (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch starts with "^", or (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set (if it were set, the pattern would be anchored), -1 is returned, indicating that the pattern matches only at the start of a subject string or after any "\n" within the string. Otherwise -2 is returned. For anchored patterns, -2 is returned. PCRE_INFO_FIRSTTABLE If the pattern was studied, and this resulted in the con- struction of a 256-bit table indicating a fixed set of char- acters for the first character in any matching string, a pointer to the table is returned. Otherwise NULL is returned. The fourth argument should point to an unsigned char * variable. PCRE_INFO_LASTLITERAL For a non-anchored pattern, return the value of the right- most literal character which must exist in any matched string, other than at its start. The fourth argument should point to an int variable. If there is no such character, or if the pattern is anchored, -1 is returned. For example, for the pattern /a\d+z\d+/ the returned value is 'z'. The pcre_info() function is now obsolete because its inter- face is too restrictive to return all the available data about a compiled pattern. New programs should use pcre_fullinfo() instead. The yield of pcre_info() is the number of capturing subpatterns, or one of the following negative numbers: PCRE_ERROR_NULL the argument code was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found If the optptr argument is not NULL, a copy of the options with which the pattern was compiled is placed in the integer it points to (see PCRE_INFO_OPTIONS above). If the pattern is not anchored and the firstcharptr argument is not NULL, it is used to pass back information about the first character of any matched string (see PCRE_INFO_FIRSTCHAR above).MATCHING A PATTERN The function pcre_exec() is called to match a subject stringSunOS 5.8 Last change: 9 against a pre-compiled pattern, which is passed in the code argument. If the pattern has been studied, the result of the study should be passed in the extra argument. Otherwise this must be NULL. Here is an example of a simple call to pcre_exec(): int rc; int ovector[30]; rc = pcre_exec( re, /* result of pcre_compile() */ NULL, /* we didn't study the pattern */ "some string", /* the subject string */ 11, /* the length of the subject string */ 0, /* start at offset 0 in the subject */ 0, /* default options */ ovector, /* vector for substring information */ 30); /* number of elements in the vector */ The PCRE_ANCHORED option can be passed in the options argu- ment, whose unused bits must be zero. However, if a pattern was compiled with PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it cannot be made unachored at matching time. There are also three further options that can be set only at matching time: PCRE_NOTBOL The first character of the string is not the beginning of a line, so the circumflex metacharacter should not match before it. Setting this without PCRE_MULTILINE (at compile time) causes circumflex never to match. PCRE_NOTEOL The end of the string is not the end of a line, so the dol- lar metacharacter should not match it nor (except in multi- line mode) a newline immediately before it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never to match. PCRE_NOTEMPTY An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pat- tern, they are tried. If all the alternatives match the empty string, the entire match fails. For example, if the pattern a?b? is applied to a string not beginning with "a" or "b", it matches the empty string at the start of the subject. With PCRE_NOTEMPTY set, this match is not valid, so PCRE searches further into the string for occurrences of "a" or "b". Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a special case of a pattern match of the empty string within its split() function, and when using the /g modifier. It is possible to emulate Perl's behaviour after matching a null string by first trying the match again at the same offset with PCRE_NOTEMPTY set, and then if that fails by advancing the starting offset (see below) and trying an ordinary match again. The subject string is passed as a pointer in subject, a length in length, and a starting offset in startoffset. Unlike the pattern string, the subject may contain binary zero characters. When the starting offset is zero, the search for a match starts at the beginning of the subject, and this is by far the most common case. A non-zero starting offset is useful when searching for another match in the same subject by calling pcre_exec() again after a previous success. Setting startoffset differs from just passing over a shortened string and setting PCRE_NOTBOL in the case of a pattern that begins with any kind of lookbehind. For example, consider the pattern \Biss\B which finds occurrences of "iss" in the middle of words. (\B matches only if the current position in the subject is not a word boundary.) When applied to the string "Mississipi" the first call to pcre_exec() finds the first occurrence. If pcre_exec() is called again with just the remainder of the subject, namely "issipi", it does not match, because \B is always false at the start of the subject, which is deemed to be a word boundary. However, if pcre_exec() is passed the entire string again, but with startoffset set to 4, it finds the second occurrence of "iss" because it is able to look behind the starting point to discover that it is preceded by a letter. If a non-zero starting offset is passed when the pattern is anchored, one attempt to match at the given offset is tried. This can only succeed if the pattern does not require the match to be at the start of the subject. In general, a pattern matches a certain portion of the sub- ject, and in addition, further substrings from the subject may be picked out by parts of the pattern. Following the usage in Jeffrey Friedl's book, this is called "capturing" in what follows, and the phrase "capturing subpattern" is used for a fragment of a pattern that picks out a substring. PCRE supports several other kinds of parenthesized subpat-
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -