⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pcre.3

📁 一套很值得分析的短信SMS开发源代码。是我今年早些时候从taobao上买来的。但我现在也没看完(先说清楚
💻 3
📖 第 1 页 / 共 5 页
字号:
.TH PCRE 3
.SH NAME
pcre - Perl-compatible regular expressions.
.SH SYNOPSIS
.B #include <pcre.h>
.PP
.SM
.br
.B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR,
.ti +5n
.B const char **\fIerrptr\fR, int *\fIerroffset\fR,
.ti +5n
.B const unsigned char *\fItableptr\fR);
.PP
.br
.B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR,
.ti +5n
.B const char **\fIerrptr\fR);
.PP
.br
.B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"
.ti +5n
.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIstartoffset\fR,
.ti +5n
.B int \fIoptions\fR, int *\fIovector\fR, int \fIovecsize\fR);
.PP
.br
.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR,
.ti +5n
.B int \fIstringcount\fR, int \fIstringnumber\fR, char *\fIbuffer\fR,
.ti +5n
.B int \fIbuffersize\fR);
.PP
.br
.B int pcre_get_substring(const char *\fIsubject\fR, int *\fIovector\fR,
.ti +5n
.B int \fIstringcount\fR, int \fIstringnumber\fR,
.ti +5n
.B const char **\fIstringptr\fR);
.PP
.br
.B int pcre_get_substring_list(const char *\fIsubject\fR,
.ti +5n
.B int *\fIovector\fR, int \fIstringcount\fR, "const char ***\fIlistptr\fR);"
.PP
.br
.B void pcre_free_substring(const char *\fIstringptr\fR);
.PP
.br
.B void pcre_free_substring_list(const char **\fIstringptr\fR);
.PP
.br
.B const unsigned char *pcre_maketables(void);
.PP
.br
.B int pcre_fullinfo(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"
.ti +5n
.B int \fIwhat\fR, void *\fIwhere\fR);
.PP
.br
.B int pcre_info(const pcre *\fIcode\fR, int *\fIoptptr\fR, int
.B *\fIfirstcharptr\fR);
.PP
.br
.B char *pcre_version(void);
.PP
.br
.B void *(*pcre_malloc)(size_t);
.PP
.br
.B void (*pcre_free)(void *);



.SH DESCRIPTION
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl 5, with just a few
differences (see below). The current implementation corresponds to Perl 5.005,
with some additional features from later versions. This includes some
experimental, incomplete support for UTF-8 encoded strings. Details of exactly
what is and what is not supported are given below.

PCRE has its own native API, which is described in this document. There is also
a set of wrapper functions that correspond to the POSIX regular expression API.
These are described in the \fBpcreposix\fR documentation.

The native API function prototypes are defined in the header file \fBpcre.h\fR,
and on Unix systems the library itself is called \fBlibpcre.a\fR, so can be
accessed by adding \fB-lpcre\fR to the command for linking an application which
calls it. The header file defines the macros PCRE_MAJOR and PCRE_MINOR to
contain the major and minor release numbers for the library. Applications can
use these to include support for different releases.

The functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and \fBpcre_exec()\fR
are used for compiling and matching regular expressions. A sample program that
demonstrates the simplest way of using them is given in the file
\fIpcredemo.c\fR. The last section of this man page describes how to run it.

The functions \fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
\fBpcre_get_substring_list()\fR are convenience functions for extracting
captured substrings from a matched subject string; \fBpcre_free_substring()\fR
and \fBpcre_free_substring_list()\fR are also provided, to free the memory used
for extracted strings.

The function \fBpcre_maketables()\fR is used (optionally) to build a set of
character tables in the current locale for passing to \fBpcre_compile()\fR.

The function \fBpcre_fullinfo()\fR is used to find out information about a
compiled pattern; \fBpcre_info()\fR is an obsolete version which returns only
some of the available information, but is retained for backwards compatibility.
The function \fBpcre_version()\fR returns a pointer to a string containing the
version of PCRE and its date of release.

The global variables \fBpcre_malloc\fR and \fBpcre_free\fR initially contain
the entry points of the standard \fBmalloc()\fR and \fBfree()\fR functions
respectively. PCRE calls the memory management functions via these variables,
so a calling program can replace them if it wishes to intercept the calls. This
should be done before calling any PCRE functions.


.SH MULTI-THREADING
The PCRE functions can be used in multi-threading applications, with the
proviso that the memory management functions pointed to by \fBpcre_malloc\fR
and \fBpcre_free\fR are shared by all threads.

The compiled form of a regular expression is not altered during matching, so
the same compiled pattern can safely be used by several threads at once.


.SH COMPILING A PATTERN
The function \fBpcre_compile()\fR is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument \fIpattern\fR. A pointer to a single block of memory
that is obtained via \fBpcre_malloc\fR is returned. This contains the compiled
code and related data. The \fBpcre\fR type is defined for the returned block;
this is a typedef for a structure whose contents are not externally defined. It
is up to the caller to free the memory when it is no longer required.

Although the compiled code of a PCRE regex is relocatable, that is, it does not
depend on memory location, the complete \fBpcre\fR data block is not
fully relocatable, because it contains a copy of the \fItableptr\fR argument,
which is an address (see below).

The size of a compiled pattern is roughly proportional to the length of the
pattern string, except that each character class (other than those containing
just a single character, negated or not) requires 33 bytes, and repeat
quantifiers with a minimum greater than one or a bounded maximum cause the
relevant portions of the compiled pattern to be replicated.

The \fIoptions\fR argument contains independent bits that affect the
compilation. It should be zero if no options are required. Some of the options,
in particular, those that are compatible with Perl, can also be set and unset
from within the pattern (see the detailed description of regular expressions
below). For these options, the contents of the \fIoptions\fR argument specifies
their initial settings at the start of compilation and execution. The
PCRE_ANCHORED option can be set at the time of matching as well as at compile
time.

If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately.
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns
NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual
error message. The offset from the start of the pattern to the character where
the error was discovered is placed in the variable pointed to by
\fIerroffset\fR, which must not be NULL. If it is, an immediate error is given.

If the final argument, \fItableptr\fR, is NULL, PCRE uses a default set of
character tables which are built when it is compiled, using the default C
locale. Otherwise, \fItableptr\fR must be the result of a call to
\fBpcre_maketables()\fR. See the section on locale support below.

This code fragment shows a typical straightforward call to \fBpcre_compile()\fR:

  pcre *re;
  const char *error;
  int erroffset;
  re = pcre_compile(
    "^A.*Z",          /* the pattern */
    0,                /* default options */
    &error,           /* for error message */
    &erroffset,       /* for error offset */
    NULL);            /* use default character tables */

The following option bits are defined in the header file:

  PCRE_ANCHORED

If this bit is set, the pattern is forced to be "anchored", that is, it is
constrained to match only at the start of the string which is being searched
(the "subject string"). This effect can also be achieved by appropriate
constructs in the pattern itself, which is the only way to do it in Perl.

  PCRE_CASELESS

If this bit is set, letters in the pattern match both upper and lower case
letters. It is equivalent to Perl's /i option.

  PCRE_DOLLAR_ENDONLY

If this bit is set, a dollar metacharacter in the pattern matches only at the
end of the subject string. Without this option, a dollar also matches
immediately before the final character if it is a newline (but not before any
other newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is
set. There is no equivalent to this option in Perl.

  PCRE_DOTALL

If this bit is set, a dot metacharater in the pattern matches all characters,
including newlines. Without it, newlines are excluded. This option is
equivalent to Perl's /s option. A negative class such as [^a] always matches a
newline character, independent of the setting of this option.

  PCRE_EXTENDED

If this bit is set, whitespace data characters in the pattern are totally
ignored except when escaped or inside a character class, and characters between
an unescaped # outside a character class and the next newline character,
inclusive, are also ignored. This is equivalent to Perl's /x option, and makes
it possible to include comments inside complicated patterns. Note, however,
that this applies only to data characters. Whitespace characters may never
appear within special character sequences in a pattern, for example within the
sequence (?( which introduces a conditional subpattern.

  PCRE_EXTRA

This option was invented in order to turn on additional functionality of PCRE
that is incompatible with Perl, but it is currently of very little use. When
set, any backslash in a pattern that is followed by a letter that has no
special meaning causes an error, thus reserving these combinations for future
expansion. By default, as in Perl, a backslash followed by a letter with no
special meaning is treated as a literal. There are at present no other features
controlled by this option. It can also be set by a (?X) option setting within a
pattern.

  PCRE_MULTILINE

By default, PCRE treats the subject string as consisting of a single "line" of
characters (even if it actually contains several newlines). The "start of line"
metacharacter (^) matches only at the start of the string, while the "end of
line" metacharacter ($) matches only at the end of the string, or before a
terminating newline (unless PCRE_DOLLAR_ENDONLY is set). This is the same as
Perl.

When PCRE_MULTILINE it is set, the "start of line" and "end of line" constructs
match immediately following or immediately before any newline in the subject
string, respectively, as well as at the very start and end. This is equivalent
to Perl's /m option. If there are no "\\n" characters in a subject string, or
no occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no
effect.

  PCRE_UNGREEDY

This option inverts the "greediness" of the quantifiers so that they are not
greedy by default, but become greedy if followed by "?". It is not compatible
with Perl. It can also be set by a (?U) option setting within the pattern.

  PCRE_UTF8

This option causes PCRE to regard both the pattern and the subject as strings
of UTF-8 characters instead of just byte strings. However, it is available only
if PCRE has been built to include UTF-8 support. If not, the use of this option
provokes an error. Support for UTF-8 is new, experimental, and incomplete.
Details of exactly what it entails are given below.


.SH STUDYING A PATTERN
When a pattern is going to be used several times, it is worth spending more
time analyzing it in order to speed up the time taken for matching. The
function \fBpcre_study()\fR takes a pointer to a compiled pattern as its first
argument, and returns a pointer to a \fBpcre_extra\fR block (another typedef
for a structure with hidden contents) containing additional information about
the pattern; this can be passed to \fBpcre_exec()\fR. If no additional
information is available, NULL is returned.

The second argument contains option bits. At present, no options are defined
for \fBpcre_study()\fR, and this argument should always be zero.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -