📄 pcre.txt
字号:
information about the pattern; this can be passed to
pcre_exec(). If no additional information is available, NULL
is returned.
The second argument contains option bits. At present, no
options are defined for pcre_study(), and this argument
should always be zero.
The third argument for pcre_study() is a pointer to an error
message. If studying succeeds (even if no data is returned),
the variable it points to is set to NULL. Otherwise it
points to a textual error message.
This is a typical call to pcre_study():
pcre_extra *pe;
pe = pcre_study(
re, /* result of pcre_compile() */
0, /* no options exist */
&error); /* set to NULL or points to a message */
At present, studying a pattern is useful only for non-
anchored patterns that do not have a single fixed starting
character. A bitmap of possible starting characters is
created.
LOCALE SUPPORT
PCRE handles caseless matching, and determines whether char-
acters are letters, digits, or whatever, by reference to a
set of tables. The library contains a default set of tables
which is created in the default C locale when PCRE is com-
piled. This is used when the final argument of
pcre_compile() is NULL, and is sufficient for many applica-
tions.
An alternative set of tables can, however, be supplied. Such
tables are built by calling the pcre_maketables() function,
which has no arguments, in the relevant locale. The result
can then be passed to pcre_compile() as often as necessary.
For example, to build and use tables that are appropriate
for the French locale (where accented characters with codes
greater than 128 are treated as letters), the following code
could be used:
setlocale(LC_CTYPE, "fr");
tables = pcre_maketables();
re = pcre_compile(..., tables);
The tables are built in memory that is obtained via
pcre_malloc. The pointer that is passed to pcre_compile is
saved with the compiled pattern, and the same tables are
used via this pointer by pcre_study() and pcre_exec(). Thus
for any single pattern, compilation, studying and matching
all happen in the same locale, but different patterns can be
compiled in different locales. It is the caller's responsi-
bility to ensure that the memory containing the tables
remains available for as long as it is needed.
INFORMATION ABOUT A PATTERN
The pcre_fullinfo() function returns information about a
compiled pattern. It replaces the obsolete pcre_info() func-
tion, which is nevertheless retained for backwards compabil-
ity (and is documented below).
The first argument for pcre_fullinfo() is a pointer to the
compiled pattern. The second argument is the result of
pcre_study(), or NULL if the pattern was not studied. The
third argument specifies which piece of information is
required, while the fourth argument is a pointer to a vari-
able to receive the data. The yield of the function is zero
for success, or one of the following negative numbers:
PCRE_ERROR_NULL the argument code was NULL
the argument where was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of what was invalid
Here is a typical call of pcre_fullinfo(), to obtain the
length of the compiled pattern:
int rc;
unsigned long int length;
rc = pcre_fullinfo(
re, /* result of pcre_compile() */
pe, /* result of pcre_study(), or NULL */
PCRE_INFO_SIZE, /* what is required */
&length); /* where to put the data */
The possible values for the third argument are defined in
pcre.h, and are as follows:
PCRE_INFO_OPTIONS
Return a copy of the options with which the pattern was com-
piled. The fourth argument should point to an unsigned long
int variable. These option bits are those specified in the
call to pcre_compile(), modified by any top-level option
settings within the pattern itself, and with the
PCRE_ANCHORED bit forcibly set if the form of the pattern
implies that it can match only at the start of a subject
string.
PCRE_INFO_SIZE
Return the size of the compiled pattern, that is, the value
that was passed as the argument to pcre_malloc() when PCRE
was getting memory in which to place the compiled data. The
fourth argument should point to a size_t variable.
PCRE_INFO_CAPTURECOUNT
Return the number of capturing subpatterns in the pattern.
The fourth argument should point to an int variable.
PCRE_INFO_BACKREFMAX
Return the number of the highest back reference in the pat-
tern. The fourth argument should point to an int variable.
Zero is returned if there are no back references.
PCRE_INFO_FIRSTCHAR
Return information about the first character of any matched
string, for a non-anchored pattern. If there is a fixed
first character, e.g. from a pattern such as
(cat|cow|coyote), it is returned in the integer pointed to
by where. Otherwise, if either
(a) the pattern was compiled with the PCRE_MULTILINE option,
and every branch starts with "^", or
(b) every branch of the pattern starts with ".*" and
PCRE_DOTALL is not set (if it were set, the pattern would be
anchored),
-1 is returned, indicating that the pattern matches only at
the start of a subject string or after any "\n" within the
string. Otherwise -2 is returned. For anchored patterns, -2
is returned.
PCRE_INFO_FIRSTTABLE
If the pattern was studied, and this resulted in the con-
struction of a 256-bit table indicating a fixed set of char-
acters for the first character in any matching string, a
pointer to the table is returned. Otherwise NULL is
returned. The fourth argument should point to an unsigned
char * variable.
PCRE_INFO_LASTLITERAL
For a non-anchored pattern, return the value of the right-
most literal character which must exist in any matched
string, other than at its start. The fourth argument should
point to an int variable. If there is no such character, or
if the pattern is anchored, -1 is returned. For example, for
the pattern /a\d+z\d+/ the returned value is 'z'.
The pcre_info() function is now obsolete because its inter-
face is too restrictive to return all the available data
about a compiled pattern. New programs should use
pcre_fullinfo() instead. The yield of pcre_info() is the
number of capturing subpatterns, or one of the following
negative numbers:
PCRE_ERROR_NULL the argument code was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
If the optptr argument is not NULL, a copy of the options
with which the pattern was compiled is placed in the integer
it points to (see PCRE_INFO_OPTIONS above).
If the pattern is not anchored and the firstcharptr argument
is not NULL, it is used to pass back information about the
first character of any matched string (see
PCRE_INFO_FIRSTCHAR above).
MATCHING A PATTERN
The function pcre_exec() is called to match a subject string
SunOS 5.8 Last change: 9
against a pre-compiled pattern, which is passed in the code
argument. If the pattern has been studied, the result of the
study should be passed in the extra argument. Otherwise this
must be NULL.
Here is an example of a simple call to pcre_exec():
int rc;
int ovector[30];
rc = pcre_exec(
re, /* result of pcre_compile() */
NULL, /* we didn't study the pattern */
"some string", /* the subject string */
11, /* the length of the subject string */
0, /* start at offset 0 in the subject */
0, /* default options */
ovector, /* vector for substring information */
30); /* number of elements in the vector */
The PCRE_ANCHORED option can be passed in the options argu-
ment, whose unused bits must be zero. However, if a pattern
was compiled with PCRE_ANCHORED, or turned out to be
anchored by virtue of its contents, it cannot be made
unachored at matching time.
There are also three further options that can be set only at
matching time:
PCRE_NOTBOL
The first character of the string is not the beginning of a
line, so the circumflex metacharacter should not match
before it. Setting this without PCRE_MULTILINE (at compile
time) causes circumflex never to match.
PCRE_NOTEOL
The end of the string is not the end of a line, so the dol-
lar metacharacter should not match it nor (except in multi-
line mode) a newline immediately before it. Setting this
without PCRE_MULTILINE (at compile time) causes dollar never
to match.
PCRE_NOTEMPTY
An empty string is not considered to be a valid match if
this option is set. If there are alternatives in the pat-
tern, they are tried. If all the alternatives match the
empty string, the entire match fails. For example, if the
pattern
a?b?
is applied to a string not beginning with "a" or "b", it
matches the empty string at the start of the subject. With
PCRE_NOTEMPTY set, this match is not valid, so PCRE searches
further into the string for occurrences of "a" or "b".
Perl has no direct equivalent of PCRE_NOTEMPTY, but it does
make a special case of a pattern match of the empty string
within its split() function, and when using the /g modifier.
It is possible to emulate Perl's behaviour after matching a
null string by first trying the match again at the same
offset with PCRE_NOTEMPTY set, and then if that fails by
advancing the starting offset (see below) and trying an
ordinary match again.
The subject string is passed as a pointer in subject, a
length in length, and a starting offset in startoffset.
Unlike the pattern string, the subject may contain binary
zero characters. When the starting offset is zero, the
search for a match starts at the beginning of the subject,
and this is by far the most common case.
A non-zero starting offset is useful when searching for
another match in the same subject by calling pcre_exec()
again after a previous success. Setting startoffset differs
from just passing over a shortened string and setting
PCRE_NOTBOL in the case of a pattern that begins with any
kind of lookbehind. For example, consider the pattern
\Biss\B
which finds occurrences of "iss" in the middle of words. (\B
matches only if the current position in the subject is not a
word boundary.) When applied to the string "Mississipi" the
first call to pcre_exec() finds the first occurrence. If
pcre_exec() is called again with just the remainder of the
subject, namely "issipi", it does not match, because \B is
always false at the start of the subject, which is deemed to
be a word boundary. However, if pcre_exec() is passed the
entire string again, but with startoffset set to 4, it finds
the second occurrence of "iss" because it is able to look
behind the starting point to discover that it is preceded by
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -