📄 pcre.txt
字号:
remains available for as long as it is needed.INFORMATION ABOUT A PATTERN The pcre_fullinfo() function returns information about a compiled pattern. It replaces the obsolete pcre_info() func- tion, which is nevertheless retained for backwards compabil- ity (and is documented below). The first argument for pcre_fullinfo() is a pointer to the compiled pattern. The second argument is the result of pcre_study(), or NULL if the pattern was not studied. The third argument specifies which piece of information is required, while the fourth argument is a pointer to a vari- able to receive the data. The yield of the function is zero for success, or one of the following negative numbers: PCRE_ERROR_NULL the argument code was NULL the argument where was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found PCRE_ERROR_BADOPTION the value of what was invalid The possible values for the third argument are defined in pcre.h, and are as follows: PCRE_INFO_OPTIONS Return a copy of the options with which the pattern was com- piled. The fourth argument should point to au unsigned long int variable. These option bits are those specified in the call to pcre_compile(), modified by any top-level option settings within the pattern itself, and with the PCRE_ANCHORED bit forcibly set if the form of the pattern implies that it can match only at the start of a subject string. PCRE_INFO_SIZE Return the size of the compiled pattern, that is, the value that was passed as the argument to pcre_malloc() when PCRE was getting memory in which to place the compiled data. The fourth argument should point to a size_t variable. PCRE_INFO_CAPTURECOUNT Return the number of capturing subpatterns in the pattern. The fourth argument should point to an int variable. PCRE_INFO_BACKREFMAX Return the number of the highest back reference in the pat- tern. The fourth argument should point to an int variable. Zero is returned if there are no back references. PCRE_INFO_FIRSTCHAR Return information about the first character of any matched string, for a non-anchored pattern. If there is a fixed first character, e.g. from a pattern such as (cat|cow|coyote), it is returned in the integer pointed to by where. Otherwise, if either (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch starts with "^", or (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set (if it were set, the pattern would be anchored), -1 is returned, indicating that the pattern matches only at the start of a subject string or after any "\n" within the string. Otherwise -2 is returned. For anchored patterns, -2 is returned. PCRE_INFO_FIRSTTABLE If the pattern was studied, and this resulted in the con- struction of a 256-bit table indicating a fixed set of char- acters for the first character in any matching string, a pointer to the table is returned. Otherwise NULL is returned. The fourth argument should point to an unsigned char * variable. PCRE_INFO_LASTLITERAL For a non-anchored pattern, return the value of the right- most literal character which must exist in any matched string, other than at its start. The fourth argument should point to an int variable. If there is no such character, or if the pattern is anchored, -1 is returned. For example, for the pattern /a\d+z\d+/ the returned value is 'z'. The pcre_info() function is now obsolete because its inter- face is too restrictive to return all the available data about a compiled pattern. New programs should use pcre_fullinfo() instead. The yield of pcre_info() is the number of capturing subpatterns, or one of the following negative numbers: PCRE_ERROR_NULL the argument code was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found If the optptr argument is not NULL, a copy of the options with which the pattern was compiled is placed in the integer it points to (see PCRE_INFO_OPTIONS above). If the pattern is not anchored and the firstcharptr argument is not NULL, it is used to pass back information about the first character of any matched string (see PCRE_INFO_FIRSTCHAR above).MATCHING A PATTERN The function pcre_exec() is called to match a subject string against a pre-compiled pattern, which is passed in the code argument. If the pattern has been studied, the result of the study should be passed in the extra argument. Otherwise this must be NULL. The PCRE_ANCHORED option can be passed in the options argu- ment, whose unused bits must be zero. However, if a pattern was compiled with PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it cannot be made unachored at matching time. There are also three further options that can be set only at matching time: PCRE_NOTBOL The first character of the string is not the beginning of a line, so the circumflex metacharacter should not match before it. Setting this without PCRE_MULTILINE (at compile time) causes circumflex never to match. PCRE_NOTEOL The end of the string is not the end of a line, so the dol- lar metacharacter should not match it nor (except in multi- line mode) a newline immediately before it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never to match. PCRE_NOTEMPTY An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pat- tern, they are tried. If all the alternatives match the empty string, the entire match fails. For example, if the pattern a?b? is applied to a string not beginning with "a" or "b", it matches the empty string at the start of the subject. With PCRE_NOTEMPTY set, this match is not valid, so PCRE searches further into the string for occurrences of "a" or "b". Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a special case of a pattern match of the empty string within its split() function, and when using the /g modifier. It is possible to emulate Perl's behaviour after matching a null string by first trying the match again at the same offset with PCRE_NOTEMPTY set, and then if that fails by advancing the starting offset (see below) and trying an ordinary match again. The subject string is passed as a pointer in subject, a length in length, and a starting offset in startoffset. Unlike the pattern string, it may contain binary zero char- acters. When the starting offset is zero, the search for a match starts at the beginning of the subject, and this is by far the most common case. A non-zero starting offset is useful when searching for another match in the same subject by calling pcre_exec() again after a previous success. Setting startoffset differs from just passing over a shortened string and setting PCRE_NOTBOL in the case of a pattern that begins with any kind of lookbehind. For example, consider the pattern \Biss\B which finds occurrences of "iss" in the middle of words. (\B matches only if the current position in the subject is not a word boundary.) When applied to the string "Mississipi" the first call to pcre_exec() finds the first occurrence. If pcre_exec() is called again with just the remainder of the subject, namely "issipi", it does not match, because \B is always false at the start of the subject, which is deemed to be a word boundary. However, if pcre_exec() is passed the entire string again, but with startoffset set to 4, it finds the second occurrence of "iss" because it is able to look behind the starting point to discover that it is preceded by a letter. If a non-zero starting offset is passed when the pattern is anchored, one attempt to match at the given offset is tried. This can only succeed if the pattern does not require the match to be at the start of the subject. In general, a pattern matches a certain portion of the sub- ject, and in addition, further substrings from the subject may be picked out by parts of the pattern. Following the usage in Jeffrey Friedl's book, this is called "capturing" in what follows, and the phrase "capturing subpattern" is used for a fragment of a pattern that picks out a substring. PCRE supports several other kinds of parenthesized subpat- tern that do not cause substrings to be captured. Captured substrings are returned to the caller via a vector of integer offsets whose address is passed in ovector. The number of elements in the vector is passed in ovecsize. The first two-thirds of the vector is used to pass back captured substrings, each substring using a pair of integers. The remaining third of the vector is used as workspace by pcre_exec() while matching capturing subpatterns, and is not available for passing back information. The length passed in ovecsize should always be a multiple of three. If it is not, it is rounded down. When a match has been successful, information about captured substrings is returned in pairs of integers, starting at the beginning of ovector, and continuing up to two-thirds of its length at the most. The first element of a pair is set to the offset of the first character in a substring, and the second is set to the offset of the first character after the end of a substring. The first pair, ovector[0] and ovec- tor[1], identify the portion of the subject string matched by the entire pattern. The next pair is used for the first capturing subpattern, and so on. The value returned by pcre_exec() is the number of pairs that have been set. If there are no capturing subpatterns, the return value from a successful match is 1, indicating that just the first pair of offsets has been set. Some convenience functions are provided for extracting the captured substrings as separate strings. These are described in the following section. It is possible for an capturing subpattern number n+1 to match some part of the subject when subpattern n has not been used at all. For example, if the string "abc" is matched against the pattern (a|(z))(bc) subpatterns 1 and 3 are matched, but 2 is not. When this happens, both offset values corresponding to the unused subpattern are set to -1. If a capturing subpattern is matched repeatedly, it is the last portion of the string that it matched that gets returned. If the vector is too small to hold all the captured sub- strings, it is used as far as possible (up to two-thirds of its length), and the function returns a value of zero. In particular, if the substring offsets are not of interest, pcre_exec() may be called with ovector passed as NULL and ovecsize as zero. However, if the pattern contains back references and the ovector isn't big enough to remember the related substrings, PCRE has to get additional memory for use during matching. Thus it is usually advisable to supply an ovector. Note that pcre_info() can be used to find out how many cap- turing subpatterns there are in a compiled pattern. The smallest size for ovector that will allow for n captured substrings in addition to the offsets of the substring matched by the whole pattern is (n+1)*3. If pcre_exec() fails, it returns a negative number. The fol- lowing are defined in the header file: PCRE_ERROR_NOMATCH (-1) The subject string did not match the pattern. PCRE_ERROR_NULL (-2) Either code or subject was passed as NULL, or ovector was NULL and ovecsize was not zero. PCRE_ERROR_BADOPTION (-3) An unrecognized bit was set in the options argument. PCRE_ERROR_BADMAGIC (-4) PCRE stores a 4-byte "magic number" at the start of the com- piled code, to catch the case when it is passed a junk pointer. This is the error it gives when the magic number isn't present. PCRE_ERROR_UNKNOWN_NODE (-5)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -