📄 pcre.txt
字号:
NAME pcre - Perl-compatible regular expressions.SYNOPSIS #include <pcre.h> pcre *pcre_compile(const char *pattern, int options, const char **errptr, int *erroffset, const unsigned char *tableptr); pcre_extra *pcre_study(const pcre *code, int options, const char **errptr); int pcre_exec(const pcre *code, const pcre_extra *extra, const char *subject, int length, int startoffset, int options, int *ovector, int ovecsize); int pcre_copy_substring(const char *subject, int *ovector, int stringcount, int stringnumber, char *buffer, int buffersize); int pcre_get_substring(const char *subject, int *ovector, int stringcount, int stringnumber, const char **stringptr); int pcre_get_substring_list(const char *subject, int *ovector, int stringcount, const char ***listptr); const unsigned char *pcre_maketables(void); int pcre_fullinfo(const pcre *code, const pcre_extra *extra, int what, void *where); int pcre_info(const pcre *code, int *optptr, *firstcharptr); char *pcre_version(void); void *(*pcre_malloc)(size_t); void (*pcre_free)(void *);DESCRIPTION The PCRE library is a set of functions that implement regu- lar expression pattern matching using the same syntax and semantics as Perl 5, with just a few differences (see below). The current implementation corresponds to Perl 5.005, with some additional features from the Perl develop- ment release. PCRE has its own native API, which is described in this document. There is also a set of wrapper functions that correspond to the POSIX regular expression API. These are described in the pcreposix documentation. The native API function prototypes are defined in the header file pcre.h, and on Unix systems the library itself is called libpcre.a, so can be accessed by adding -lpcre to the command for linking an application which calls it. The header file defines the macros PCRE_MAJOR and PCRE_MINOR to contain the major and minor release numbers for the library. Applications can use these to include support for different releases. The functions pcre_compile(), pcre_study(), and pcre_exec() are used for compiling and matching regular expressions, while pcre_copy_substring(), pcre_get_substring(), and pcre_get_substring_list() are convenience functions for extracting captured substrings from a matched subject string. The function pcre_maketables() is used (optionally) to build a set of character tables in the current locale for passing to pcre_compile(). The function pcre_fullinfo() is used to find out information about a compiled pattern; pcre_info() is an obsolete version which returns only some of the available information, but is retained for backwards compatibility. The function pcre_version() returns a pointer to a string containing the version of PCRE and its date of release. The global variables pcre_malloc and pcre_free initially contain the entry points of the standard malloc() and free() functions respectively. PCRE calls the memory management functions via these variables, so a calling program can replace them if it wishes to intercept the calls. This should be done before calling any PCRE functions.MULTI-THREADING The PCRE functions can be used in multi-threading applica- tions, with the proviso that the memory management functions pointed to by pcre_malloc and pcre_free are shared by all threads. The compiled form of a regular expression is not altered during matching, so the same compiled pattern can safely be used by several threads at once.COMPILING A PATTERN The function pcre_compile() is called to compile a pattern into an internal form. The pattern is a C string terminated by a binary zero, and is passed in the argument pattern. A pointer to a single block of memory that is obtained via pcre_malloc is returned. This contains the compiled code and related data. The pcre type is defined for this for conveni- ence, but in fact pcre is just a typedef for void, since the contents of the block are not externally defined. It is up to the caller to free the memory when it is no longer required. The size of a compiled pattern is roughly proportional to the length of the pattern string, except that each character class (other than those containing just a single character, negated or not) requires 33 bytes, and repeat quantifiers with a minimum greater than one or a bounded maximum cause the relevant portions of the compiled pattern to be repli- cated. The options argument contains independent bits that affect the compilation. It should be zero if no options are required. Some of the options, in particular, those that are compatible with Perl, can also be set and unset from within the pattern (see the detailed description of regular expres- sions below). For these options, the contents of the options argument specifies their initial settings at the start of compilation and execution. The PCRE_ANCHORED option can be set at the time of matching as well as at compile time. If errptr is NULL, pcre_compile() returns NULL immediately. Otherwise, if compilation of a pattern fails, pcre_compile() returns NULL, and sets the variable pointed to by errptr to point to a textual error message. The offset from the start of the pattern to the character where the error was discovered is placed in the variable pointed to by erroffset, which must not be NULL. If it is, an immediate error is given. If the final argument, tableptr, is NULL, PCRE uses a default set of character tables which are built when it is compiled, using the default C locale. Otherwise, tableptr must be the result of a call to pcre_maketables(). See the section on locale support below. The following option bits are defined in the header file: PCRE_ANCHORED If this bit is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl. PCRE_CASELESS If this bit is set, letters in the pattern match both upper and lower case letters. It is equivalent to Perl's /i option. PCRE_DOLLAR_ENDONLY If this bit is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before the final character if it is a newline (but not before any other new- lines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set. There is no equivalent to this option in Perl. PCRE_DOTALL If this bit is set, a dot metacharater in the pattern matches all characters, including newlines. Without it, new- lines are excluded. This option is equivalent to Perl's /s option. A negative class such as [^a] always matches a new- line character, independent of the setting of this option. PCRE_EXTENDED If this bit is set, whitespace data characters in the pat- tern are totally ignored except when escaped or inside a character class, and characters between an unescaped # out- side a character class and the next newline character, inclusive, are also ignored. This is equivalent to Perl's /x option, and makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional sub- pattern. PCRE_EXTRA This option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of very little use. When set, any backslash in a pattern that is followed by a letter that has no special meaning causes an error, thus reserving these combinations for future expansion. By default, as in Perl, a backslash followed by a letter with no special meaning is treated as a literal. There are at present no other features controlled by this option. It can also be set by a (?X) option setting within a pattern. PCRE_MULTILINE By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY is set). This is the same as Perl. When PCRE_MULTILINE it is set, the "start of line" and "end of line" constructs match immediately following or immedi- ately before any newline in the subject string, respec- tively, as well as at the very start and end. This is equivalent to Perl's /m option. If there are no "\n" charac- ters in a subject string, or no occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect. PCRE_UNGREEDY This option inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It is not compatible with Perl. It can also be set by a (?U) option setting within the pattern.STUDYING A PATTERN When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. The function pcre_study() takes a pointer to a compiled pattern as its first argument, and returns a pointer to a pcre_extra block (another void typedef) containing additional information about the pat- tern; this can be passed to pcre_exec(). If no additional information is available, NULL is returned. The second argument contains option bits. At present, no options are defined for pcre_study(), and this argument should always be zero. The third argument for pcre_study() is a pointer to an error message. If studying succeeds (even if no data is returned), the variable it points to is set to NULL. Otherwise it points to a textual error message. At present, studying a pattern is useful only for non- anchored patterns that do not have a single fixed starting character. A bitmap of possible starting characters is created.LOCALE SUPPORT PCRE handles caseless matching, and determines whether char- acters are letters, digits, or whatever, by reference to a set of tables. The library contains a default set of tables which is created in the default C locale when PCRE is com- piled. This is used when the final argument of pcre_compile() is NULL, and is sufficient for many applica- tions. An alternative set of tables can, however, be supplied. Such tables are built by calling the pcre_maketables() function, which has no arguments, in the relevant locale. The result can then be passed to pcre_compile() as often as necessary. For example, to build and use tables that are appropriate for the French locale (where accented characters with codes greater than 128 are treated as letters), the following code could be used: setlocale(LC_CTYPE, "fr"); tables = pcre_maketables(); re = pcre_compile(..., tables); The tables are built in memory that is obtained via pcre_malloc. The pointer that is passed to pcre_compile is saved with the compiled pattern, and the same tables are used via this pointer by pcre_study() and pcre_exec(). Thus for any single pattern, compilation, studying and matching all happen in the same locale, but different patterns can be compiled in different locales. It is the caller's responsi- bility to ensure that the memory containing the tables
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -