📄 pcre.txt
字号:
NAME
pcre - Perl-compatible regular expressions.
SYNOPSIS
#include <pcre.h>
pcre *pcre_compile(const char *pattern, int options,
const char **errptr, int *erroffset,
const unsigned char *tableptr);
pcre_extra *pcre_study(const pcre *code, int options,
const char **errptr);
int pcre_exec(const pcre *code, const pcre_extra *extra,
const char *subject, int length, int startoffset,
int options, int *ovector, int ovecsize);
int pcre_copy_substring(const char *subject, int *ovector,
int stringcount, int stringnumber, char *buffer,
int buffersize);
int pcre_get_substring(const char *subject, int *ovector,
int stringcount, int stringnumber,
const char **stringptr);
int pcre_get_substring_list(const char *subject,
int *ovector, int stringcount, const char ***listptr);
void pcre_free_substring(const char *stringptr);
void pcre_free_substring_list(const char **stringptr);
const unsigned char *pcre_maketables(void);
int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
int what, void *where);
int pcre_info(const pcre *code, int *optptr, *firstcharptr);
char *pcre_version(void);
void *(*pcre_malloc)(size_t);
void (*pcre_free)(void *);
DESCRIPTION
The PCRE library is a set of functions that implement regu-
lar expression pattern matching using the same syntax and
semantics as Perl 5, with just a few differences (see
below). The current implementation corresponds to Perl
5.005, with some additional features from later versions.
This includes some experimental, incomplete support for
UTF-8 encoded strings. Details of exactly what is and what
is not supported are given below.
PCRE has its own native API, which is described in this
document. There is also a set of wrapper functions that
correspond to the POSIX regular expression API. These are
described in the pcreposix documentation.
The native API function prototypes are defined in the header
file pcre.h, and on Unix systems the library itself is
called libpcre.a, so can be accessed by adding -lpcre to the
command for linking an application which calls it. The
header file defines the macros PCRE_MAJOR and PCRE_MINOR to
contain the major and minor release numbers for the library.
Applications can use these to include support for different
releases.
The functions pcre_compile(), pcre_study(), and pcre_exec()
are used for compiling and matching regular expressions. A
sample program that demonstrates the simplest way of using
them is given in the file pcredemo.c. The last section of
this man page describes how to run it.
The functions pcre_copy_substring(), pcre_get_substring(),
and pcre_get_substring_list() are convenience functions for
extracting captured substrings from a matched subject
string; pcre_free_substring() and pcre_free_substring_list()
are also provided, to free the memory used for extracted
strings.
The function pcre_maketables() is used (optionally) to build
a set of character tables in the current locale for passing
to pcre_compile().
The function pcre_fullinfo() is used to find out information
about a compiled pattern; pcre_info() is an obsolete version
which returns only some of the available information, but is
retained for backwards compatibility. The function
pcre_version() returns a pointer to a string containing the
version of PCRE and its date of release.
The global variables pcre_malloc and pcre_free initially
contain the entry points of the standard malloc() and free()
functions respectively. PCRE calls the memory management
functions via these variables, so a calling program can
replace them if it wishes to intercept the calls. This
should be done before calling any PCRE functions.
MULTI-THREADING
The PCRE functions can be used in multi-threading applica-
tions, with the proviso that the memory management functions
pointed to by pcre_malloc and pcre_free are shared by all
threads.
The compiled form of a regular expression is not altered
during matching, so the same compiled pattern can safely be
used by several threads at once.
COMPILING A PATTERN
The function pcre_compile() is called to compile a pattern
into an internal form. The pattern is a C string terminated
by a binary zero, and is passed in the argument pattern. A
pointer to a single block of memory that is obtained via
pcre_malloc is returned. This contains the compiled code and
related data. The pcre type is defined for the returned
block; this is a typedef for a structure whose contents are
not externally defined. It is up to the caller to free the
memory when it is no longer required.
Although the compiled code of a PCRE regex is relocatable,
that is, it does not depend on memory location, the complete
pcre data block is not fully relocatable, because it con-
tains a copy of the tableptr argument, which is an address
(see below).
The size of a compiled pattern is roughly proportional to
the length of the pattern string, except that each character
class (other than those containing just a single character,
negated or not) requires 33 bytes, and repeat quantifiers
with a minimum greater than one or a bounded maximum cause
the relevant portions of the compiled pattern to be repli-
cated.
The options argument contains independent bits that affect
the compilation. It should be zero if no options are
required. Some of the options, in particular, those that are
compatible with Perl, can also be set and unset from within
the pattern (see the detailed description of regular expres-
sions below). For these options, the contents of the options
argument specifies their initial settings at the start of
compilation and execution. The PCRE_ANCHORED option can be
set at the time of matching as well as at compile time.
If errptr is NULL, pcre_compile() returns NULL immediately.
Otherwise, if compilation of a pattern fails, pcre_compile()
returns NULL, and sets the variable pointed to by errptr to
point to a textual error message. The offset from the start
of the pattern to the character where the error was
discovered is placed in the variable pointed to by
erroffset, which must not be NULL. If it is, an immediate
error is given.
If the final argument, tableptr, is NULL, PCRE uses a
default set of character tables which are built when it is
compiled, using the default C locale. Otherwise, tableptr
must be the result of a call to pcre_maketables(). See the
section on locale support below.
This code fragment shows a typical straightforward call to
pcre_compile():
pcre *re;
const char *error;
int erroffset;
re = pcre_compile(
"^A.*Z", /* the pattern */
0, /* default options */
&error, /* for error message */
&erroffset, /* for error offset */
NULL); /* use default character tables */
The following option bits are defined in the header file:
PCRE_ANCHORED
If this bit is set, the pattern is forced to be "anchored",
that is, it is constrained to match only at the start of the
string which is being searched (the "subject string"). This
effect can also be achieved by appropriate constructs in the
pattern itself, which is the only way to do it in Perl.
PCRE_CASELESS
If this bit is set, letters in the pattern match both upper
and lower case letters. It is equivalent to Perl's /i
option.
PCRE_DOLLAR_ENDONLY
If this bit is set, a dollar metacharacter in the pattern
matches only at the end of the subject string. Without this
option, a dollar also matches immediately before the final
character if it is a newline (but not before any other new-
lines). The PCRE_DOLLAR_ENDONLY option is ignored if
PCRE_MULTILINE is set. There is no equivalent to this option
in Perl.
PCRE_DOTALL
If this bit is set, a dot metacharater in the pattern
matches all characters, including newlines. Without it, new-
lines are excluded. This option is equivalent to Perl's /s
option. A negative class such as [^a] always matches a new-
line character, independent of the setting of this option.
PCRE_EXTENDED
If this bit is set, whitespace data characters in the pat-
tern are totally ignored except when escaped or inside a
character class, and characters between an unescaped # out-
side a character class and the next newline character,
inclusive, are also ignored. This is equivalent to Perl's /x
option, and makes it possible to include comments inside
complicated patterns. Note, however, that this applies only
to data characters. Whitespace characters may never appear
within special character sequences in a pattern, for example
within the sequence (?( which introduces a conditional sub-
pattern.
PCRE_EXTRA
This option was invented in order to turn on additional
functionality of PCRE that is incompatible with Perl, but it
is currently of very little use. When set, any backslash in
a pattern that is followed by a letter that has no special
meaning causes an error, thus reserving these combinations
for future expansion. By default, as in Perl, a backslash
followed by a letter with no special meaning is treated as a
literal. There are at present no other features controlled
by this option. It can also be set by a (?X) option setting
within a pattern.
PCRE_MULTILINE
By default, PCRE treats the subject string as consisting of
a single "line" of characters (even if it actually contains
several newlines). The "start of line" metacharacter (^)
matches only at the start of the string, while the "end of
line" metacharacter ($) matches only at the end of the
string, or before a terminating newline (unless
PCRE_DOLLAR_ENDONLY is set). This is the same as Perl.
When PCRE_MULTILINE it is set, the "start of line" and "end
of line" constructs match immediately following or immedi-
ately before any newline in the subject string, respec-
tively, as well as at the very start and end. This is
equivalent to Perl's /m option. If there are no "\n" charac-
ters in a subject string, or no occurrences of ^ or $ in a
pattern, setting PCRE_MULTILINE has no effect.
PCRE_UNGREEDY
This option inverts the "greediness" of the quantifiers so
that they are not greedy by default, but become greedy if
followed by "?". It is not compatible with Perl. It can also
be set by a (?U) option setting within the pattern.
PCRE_UTF8
This option causes PCRE to regard both the pattern and the
subject as strings of UTF-8 characters instead of just byte
strings. However, it is available only if PCRE has been
built to include UTF-8 support. If not, the use of this
option provokes an error. Support for UTF-8 is new, experi-
mental, and incomplete. Details of exactly what it entails
are given below.
STUDYING A PATTERN
When a pattern is going to be used several times, it is
worth spending more time analyzing it in order to speed up
the time taken for matching. The function pcre_study() takes
a pointer to a compiled pattern as its first argument, and
returns a pointer to a pcre_extra block (another typedef for
a structure with hidden contents) containing additional
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -