📄 pcre.txt
字号:
the results of the study. The returned value from pcre_study() can be passed directly to pcre_exec(). However, a pcre_extra block also contains other fields that can be set by the caller before the block is passed; these are described below in the section on matching a pattern. If studying the pattern does not produce any additional information pcre_study() returns NULL. In that circumstance, if the calling program wants to pass any of the other fields to pcre_exec(), it must set up its own pcre_extra block. The second argument of pcre_study() contains option bits. At present, no options are defined, and this argument should always be zero. The third argument for pcre_study() is a pointer for an error message. If studying succeeds (even if no data is returned), the variable it points to is set to NULL. Otherwise it is set to point to a textual error message. This is a static string that is part of the library. You must not try to free it. You should test the error pointer for NULL after calling pcre_study(), to be sure that it has run successfully. This is a typical call to pcre_study(): pcre_extra *pe; pe = pcre_study( re, /* result of pcre_compile() */ 0, /* no options exist */ &error); /* set to NULL or points to a message */ At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character. A bitmap of possi- ble starting bytes is created.LOCALE SUPPORT PCRE handles caseless matching, and determines whether characters are letters digits, or whatever, by reference to a set of tables, indexed by character value. When running in UTF-8 mode, this applies only to characters with codes less than 128. Higher-valued codes never match escapes such as \w or \d, but can be tested with \p if PCRE is built with Unicode character property support. The use of locales with Uni- code is discouraged. An internal set of tables is created in the default C locale when PCRE is built. This is used when the final argument of pcre_compile() is NULL, and is sufficient for many applications. An alternative set of tables can, however, be supplied. These may be created in a different locale from the default. As more and more applications change to using Unicode, the need for this locale support is expected to die away. External tables are built by calling the pcre_maketables() function, which has no arguments, in the relevant locale. The result can then be passed to pcre_compile() or pcre_exec() as often as necessary. For example, to build and use tables that are appropriate for the French locale (where accented characters with values greater than 128 are treated as letters), the following code could be used: setlocale(LC_CTYPE, "fr_FR"); tables = pcre_maketables(); re = pcre_compile(..., tables); When pcre_maketables() runs, the tables are built in memory that is obtained via pcre_malloc. It is the caller's responsibility to ensure that the memory containing the tables remains available for as long as it is needed. The pointer that is passed to pcre_compile() is saved with the compiled pattern, and the same tables are used via this pointer by pcre_study() and normally also by pcre_exec(). Thus, by default, for any single pat- tern, compilation, studying and matching all happen in the same locale, but different patterns can be compiled in different locales. It is possible to pass a table pointer or NULL (indicating the use of the internal tables) to pcre_exec(). Although not intended for this purpose, this facility could be used to match a pattern in a different locale from the one in which it was compiled. Passing table pointers at run time is discussed below in the section on matching a pattern.INFORMATION ABOUT A PATTERN int pcre_fullinfo(const pcre *code, const pcre_extra *extra, int what, void *where); The pcre_fullinfo() function returns information about a compiled pat- tern. It replaces the obsolete pcre_info() function, which is neverthe- less retained for backwards compability (and is documented below). The first argument for pcre_fullinfo() is a pointer to the compiled pattern. The second argument is the result of pcre_study(), or NULL if the pattern was not studied. The third argument specifies which piece of information is required, and the fourth argument is a pointer to a variable to receive the data. The yield of the function is zero for success, or one of the following negative numbers: PCRE_ERROR_NULL the argument code was NULL the argument where was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found PCRE_ERROR_BADOPTION the value of what was invalid The "magic number" is placed at the start of each compiled pattern as an simple check against passing an arbitrary memory pointer. Here is a typical call of pcre_fullinfo(), to obtain the length of the compiled pattern: int rc; unsigned long int length; rc = pcre_fullinfo( re, /* result of pcre_compile() */ pe, /* result of pcre_study(), or NULL */ PCRE_INFO_SIZE, /* what is required */ &length); /* where to put the data */ The possible values for the third argument are defined in pcre.h, and are as follows: PCRE_INFO_BACKREFMAX Return the number of the highest back reference in the pattern. The fourth argument should point to an int variable. Zero is returned if there are no back references. PCRE_INFO_CAPTURECOUNT Return the number of capturing subpatterns in the pattern. The fourth argument should point to an int variable. PCRE_INFO_DEFAULT_TABLES Return a pointer to the internal default character tables within PCRE. The fourth argument should point to an unsigned char * variable. This information call is provided for internal use by the pcre_study() func- tion. External callers can cause PCRE to use its internal tables by passing a NULL table pointer. PCRE_INFO_FIRSTBYTE Return information about the first byte of any matched string, for a non-anchored pattern. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name is still recognized for backwards compatibility.) If there is a fixed first byte, for example, from a pattern such as (cat|cow|coyote), it is returned in the integer pointed to by where. Otherwise, if either (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch starts with "^", or (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set (if it were set, the pattern would be anchored), -1 is returned, indicating that the pattern matches only at the start of a subject string or after any newline within the string. Otherwise -2 is returned. For anchored patterns, -2 is returned. PCRE_INFO_FIRSTTABLE If the pattern was studied, and this resulted in the construction of a 256-bit table indicating a fixed set of bytes for the first byte in any matching string, a pointer to the table is returned. Otherwise NULL is returned. The fourth argument should point to an unsigned char * vari- able. PCRE_INFO_LASTLITERAL Return the value of the rightmost literal byte that must exist in any matched string, other than at its start, if such a byte has been recorded. The fourth argument should point to an int variable. If there is no such byte, -1 is returned. For anchored patterns, a last literal byte is recorded only if it follows something of variable length. For example, for the pattern /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value is -1. PCRE_INFO_NAMECOUNT PCRE_INFO_NAMEENTRYSIZE PCRE_INFO_NAMETABLE PCRE supports the use of named as well as numbered capturing parenthe- ses. The names are just an additional way of identifying the parenthe- ses, which still acquire numbers. A convenience function called pcre_get_named_substring() is provided for extracting an individual captured substring by name. It is also possible to extract the data directly, by first converting the name to a number in order to access the correct pointers in the output vector (described with pcre_exec() below). To do the conversion, you need to use the name-to-number map, which is described by these three values. The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each entry; both of these return an int value. The entry size depends on the length of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the first entry of the table (a pointer to char). The first two bytes of each entry are the number of the capturing parenthe- sis, most significant byte first. The rest of the entry is the corre- sponding name, zero terminated. The names are in alphabetical order. For example, consider the following pattern (assume PCRE_EXTENDED is set, so white space - including newlines - is ignored): (?P<date> (?P<year>(\d\d)?\d\d) - (?P<month>\d\d) - (?P<day>\d\d) ) There are four named subpatterns, so the table has four entries, and each entry in the table is eight bytes long. The table is as follows, with non-printing bytes shows in hexadecimal, and undefined bytes shown as ??: 00 01 d a t e 00 ?? 00 05 d a y 00 ?? ?? 00 04 m o n t h 00 00 02 y e a r 00 ?? When writing code to extract data from named subpatterns using the name-to-number map, remember that the length of each entry is likely to be different for each compiled pattern. PCRE_INFO_OPTIONS Return a copy of the options with which the pattern was compiled. The fourth argument should point to an unsigned long int variable. These option bits are those specified in the call to pcre_compile(), modified by any top-level option settings within the pattern itself. A pattern is automatically anchored by PCRE if all of its top-level alternatives begin with one of the following: ^ unless PCRE_MULTILINE is set \A always \G always .* if PCRE_DOTALL is set and there are no back references to the subpattern in which .* appears For such patterns, the PCRE_ANCHORED bit is set in the options returned by pcre_fullinfo(). PCRE_INFO_SIZE Return the size of the compiled pattern, that is, the value that was passed as the argument to pcre_malloc() when PCRE was getting memory in which to place the compiled data. The fourth argument should point to a size_t variable. PCRE_INFO_STUDYSIZE Return the size of the data block pointed to by the study_data field in a pcre_extra block. That is, it is the value that was passed to pcre_malloc() when PCRE was getting memory into which to place the data created by pcre_study(). The fourth argument should point to a size_t variable.OBSOLETE INFO FUNCTION int pcre_info(const pcre *code, int *optptr, int *firstcharptr); The pcre_info() function is now obsolete because its interface is too restrictive to return all the available data about a compiled pattern. New programs should use pcre_fullinfo() instead. The yield of pcre_info() is the number of capturing subpatterns, or one of the fol- lowing negative numbers: PCRE_ERROR_NULL the argument code was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found If the optptr argument is not NULL, a copy of the options with which the pattern was compiled is placed in the integer it points to (see PCRE_INFO_OPTIONS above). If the pattern is not anchored and the firstcharptr argument is not NULL, it i
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -