📄 library_9.html
字号:
<!-- This HTML file has been created by texi2html 1.27
from library.texinfo on 3 March 1994 -->
<TITLE>The GNU C Library - Pattern Matching</TITLE>
<P>Go to the <A HREF="library_8.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_8.html">previous</A>, <A HREF="library_10.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_10.html">next</A> section.<P>
<H1><A NAME="SEC91" HREF="library_toc.html#SEC91" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC91">Pattern Matching</A></H1>
<P>
The GNU C Library provides pattern matching facilities for two kinds
of patterns: regular expressions and file-name wildcards.
<P>
<H2><A NAME="SEC92" HREF="library_toc.html#SEC92" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC92">Wildcard Matching</A></H2>
<A NAME="IDX394"></A>
<P>
This section describes how to match a wildcard pattern against a
particular string. The result is a yes or no answer: does the
string fit the pattern or not. The symbols described here are all
declared in <TT>`fnmatch.h'</TT>.
<P>
<A NAME="IDX395"></A>
<U>Function:</U> int <B>fnmatch</B> <I>(const char *<VAR>pattern</VAR>, const char *<VAR>string</VAR>, int <VAR>flags</VAR>)</I><P>
This function tests whether the string <VAR>string</VAR> matches the pattern
<VAR>pattern</VAR>. It returns <CODE>0</CODE> if they do match; otherwise, it
returns the nonzero value <CODE>FNM_NOMATCH</CODE>. The arguments
<VAR>pattern</VAR> and <VAR>string</VAR> are both strings.
<P>
The argument <VAR>flags</VAR> is a combination of flag bits that alter the
details of matching. See below for a list of the defined flags.
<P>
In the GNU C Library, <CODE>fnmatch</CODE> cannot experience an "error"---it
always returns an answer for whether the match succeeds. However, other
implementations of <CODE>fnmatch</CODE> might sometimes report "errors".
They would do so by returning nonzero values that are not equal to
<CODE>FNM_NOMATCH</CODE>.
<P>
These are the available flags for the <VAR>flags</VAR> argument:
<P>
<DL COMPACT>
<DT><CODE>FNM_FILE_NAME</CODE>
<DD>Treat the <SAMP>`/'</SAMP> character specially, for matching file names. If
this flag is set, wildcard constructs in <VAR>pattern</VAR> cannot match
<SAMP>`/'</SAMP> in <VAR>string</VAR>. Thus, the only way to match <SAMP>`/'</SAMP> is with
an explicit <SAMP>`/'</SAMP> in <VAR>pattern</VAR>.
<P>
<DT><CODE>FNM_PATHNAME</CODE>
<DD>This is an alias for <CODE>FNM_FILE_NAME</CODE>; it comes from POSIX.2. We
don't recommend this name because we don't use the term "pathname" for
file names.
<P>
<DT><CODE>FNM_PERIOD</CODE>
<DD>Treat the <SAMP>`.'</SAMP> character specially if it appears at the beginning of
<VAR>string</VAR>. If this flag is set, wildcard constructs in <VAR>pattern</VAR>
cannot match <SAMP>`.'</SAMP> as the first character of <VAR>string</VAR>.
<P>
If you set both <CODE>FNM_PERIOD</CODE> and <CODE>FNM_FILE_NAME</CODE>, then the
special treatment applies to <SAMP>`.'</SAMP> following <SAMP>`/'</SAMP> as well as
to <SAMP>`.'</SAMP> at the beginning of <VAR>string</VAR>.
<P>
<DT><CODE>FNM_NOESCAPE</CODE>
<DD>Don't treat the <SAMP>`\'</SAMP> character specially in patterns. Normally,
<SAMP>`\'</SAMP> quotes the following character, turning off its special meaning
(if any) so that it matches only itself. When quoting is enabled, the
pattern <SAMP>`\?'</SAMP> matches only the string <SAMP>`?'</SAMP>, because the question
mark in the pattern acts like an ordinary character.
<P>
If you use <CODE>FNM_NOESCAPE</CODE>, then <SAMP>`\'</SAMP> is an ordinary character.
<P>
<DT><CODE>FNM_LEADING_DIR</CODE>
<DD>Ignore a trailing sequence of characters starting with a <SAMP>`/'</SAMP> in
<VAR>string</VAR>; that is to say, test whether <VAR>string</VAR> starts with a
directory name that <VAR>pattern</VAR> matches.
<P>
If this flag is set, either <SAMP>`foo*'</SAMP> or <SAMP>`foobar'</SAMP> as a pattern
would match the string <SAMP>`foobar/frobozz'</SAMP>.
<P>
<DT><CODE>FNM_CASEFOLD</CODE>
<DD>Ignore case in comparing <VAR>string</VAR> to <VAR>pattern</VAR>.
</DL>
<P>
<H2><A NAME="SEC93" HREF="library_toc.html#SEC93" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC93">Globbing</A></H2>
<A NAME="IDX396"></A>
<P>
The archetypal use of wildcards is for matching against the files in a
directory, and making a list of all the matches. This is called
<DFN>globbing</DFN>.
<P>
You could do this using <CODE>fnmatch</CODE>, by reading the directory entries
one by one and testing each one with <CODE>fnmatch</CODE>. But that would be
slow (and complex, since you would have to handle subdirectories by
hand).
<P>
The library provides a function <CODE>glob</CODE> to make this particular use
of wildcards convenient. <CODE>glob</CODE> and the other symbols in this
section are declared in <TT>`glob.h'</TT>.
<P>
<H3><A NAME="SEC94" HREF="library_toc.html#SEC94" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC94">Calling <CODE>glob</CODE></A></H3>
<P>
The result of globbing is a vector of file names (strings). To return
this vector, <CODE>glob</CODE> uses a special data type, <CODE>glob_t</CODE>, which
is a structure. You pass <CODE>glob</CODE> the address of the structure, and
it fills in the structure's fields to tell you about the results.
<P>
<A NAME="IDX397"></A>
<U>Data Type:</U> <B>glob_t</B><P>
This data type holds a pointer to a word vector. More precisely, it
records both the address of the word vector and its size.
<P>
<DL COMPACT>
<DT><CODE>gl_pathc</CODE>
<DD>The number of elements in the vector.
<P>
<DT><CODE>gl_pathv</CODE>
<DD>The address of the vector. This field has type <CODE>char **</CODE>.
<P>
<DT><CODE>gl_offs</CODE>
<DD>The offset of the first real element of the vector, from its nominal
address in the <CODE>gl_pathv</CODE> field. Unlike the other fields, this
is always an input to <CODE>glob</CODE>, rather than an output from it.
<P>
If you use a nonzero offset, then that many elements at the beginning of
the vector are left empty. (The <CODE>glob</CODE> function fills them with
null pointers.)
<P>
The <CODE>gl_offs</CODE> field is meaningful only if you use the
<CODE>GLOB_DOOFFS</CODE> flag. Otherwise, the offset is always zero
regardless of what is in this field, and the first real element comes at
the beginning of the vector.
</DL>
<P>
<A NAME="IDX398"></A>
<U>Function:</U> int <B>glob</B> <I>(const char *<VAR>pattern</VAR>, int <VAR>flags</VAR>, int (*<VAR>errfunc</VAR>) (const char *<VAR>filename</VAR>, int <VAR>error-code</VAR>), glob_t *<VAR>vector_ptr</VAR>)</I><P>
The function <CODE>glob</CODE> does globbing using the pattern <VAR>pattern</VAR>
in the current directory. It puts the result in a newly allocated
vector, and stores the size and address of this vector into
<CODE>*<VAR>vector-ptr</VAR></CODE>. The argument <VAR>flags</VAR> is a combination of
bit flags; see section <A HREF="library_9.html#SEC95" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_9.html#SEC95">Flags for Globbing</A>, for details of the flags.
<P>
The result of globbing is a sequence of file names. The function
<CODE>glob</CODE> allocates a string for each resulting word, then
allocates a vector of type <CODE>char **</CODE> to store the addresses of
these strings. The last element of the vector is a null pointer.
This vector is called the <DFN>word vector</DFN>.
<P>
To return this vector, <CODE>glob</CODE> stores both its address and its
length (number of elements, not counting the terminating null pointer)
into <CODE>*<VAR>vector-ptr</VAR></CODE>.
<P>
Normally, <CODE>glob</CODE> sorts the file names alphabetically before
returning them. You can turn this off with the flag <CODE>GLOB_NOSORT</CODE>
if you want to get the information as fast as possible. Usually it's
a good idea to let <CODE>glob</CODE> sort them--if you process the files in
alphabetical order, the users will have a feel for the rate of progress
that your application is making.
<P>
If <CODE>glob</CODE> succeeds, it returns 0. Otherwise, it returns one
of these error codes:
<P>
<DL COMPACT>
<DT><CODE>GLOB_ABORTED</CODE>
<DD>There was an error opening a directory, and you used the flag
<CODE>GLOB_ERR</CODE> or your specified <VAR>errfunc</VAR> returned a nonzero
value.
<P>
<DT><CODE>GLOB_NOMATCH</CODE>
<DD>The pattern didn't match any existing files. If you use the
<CODE>GLOB_NOCHECK</CODE> flag, then you never get this error code, because
that flag tells <CODE>glob</CODE> to <EM>pretend</EM> that the pattern matched
at least one file.
<P>
<DT><CODE>GLOB_NOSPACE</CODE>
<DD>It was impossible to allocate memory to hold the result.
</DL>
<P>
In the event of an error, <CODE>glob</CODE> stores information in
<CODE>*<VAR>vector-ptr</VAR></CODE> about all the matches it has found so far.
<P>
<H3><A NAME="SEC95" HREF="library_toc.html#SEC95" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC95">Flags for Globbing</A></H3>
<P>
This section describes the flags that you can specify in the
<VAR>flags</VAR> argument to <CODE>glob</CODE>. Choose the flags you want,
and combine them with the C operator <CODE>|</CODE>.
<P>
<DL COMPACT>
<DT><CODE>GLOB_APPEND</CODE>
<DD>Append the words from this expansion to the vector of words produced by
previous calls to <CODE>glob</CODE>. This way you can effectively expand
several words as if they were concatenated with spaces between them.
<P>
In order for appending to work, you must not modify the contents of the
word vector structure between calls to <CODE>glob</CODE>. And, if you set
<CODE>GLOB_DOOFFS</CODE> in the first call to <CODE>glob</CODE>, you must also
set it when you append to the results.
<P>
<DT><CODE>GLOB_DOOFFS</CODE>
<DD>Leave blank slots at the beginning of the vector of words.
The <CODE>gl_offs</CODE> field says how many slots to leave.
The blank slots contain null pointers.
<P>
<DT><CODE>GLOB_ERR</CODE>
<DD>Give up right away and report an error if there is any difficulty
reading the directories that must be read in order to expand <VAR>pattern</VAR>
fully. Such difficulties might include a directory in which you don't
have the requisite access. Normally, <CODE>glob</CODE> tries its best to keep
on going despite any errors, reading whatever directories it can.
<P>
You can exercise even more control than this by specifying an error-handler
function <VAR>errfunc</VAR> when you call <CODE>glob</CODE>. If <VAR>errfunc</VAR> is
nonzero, then <CODE>glob</CODE> doesn't give up right away when it can't read
a directory; instead, it calls <VAR>errfunc</VAR> with two arguments, like
this:
<P>
<PRE>
(*<VAR>errfunc</VAR>) (<VAR>filename</VAR>, <VAR>error-code</VAR>)
</PRE>
<P>
The argument <VAR>filename</VAR> is the name of the directory that
<CODE>glob</CODE> couldn't open or couldn't read, and <VAR>error-code</VAR> is the
<CODE>errno</CODE> value that was reported to <CODE>glob</CODE>.
<P>
If the error handler function returns nonzero, then <CODE>glob</CODE> gives up
right away. Otherwise, it continues.
<P>
<DT><CODE>GLOB_MARK</CODE>
<DD>If the pattern matches the name of a directory, append <SAMP>`/'</SAMP> to the
directory's name when returning it.
<P>
<DT><CODE>GLOB_NOCHECK</CODE>
<DD>If the pattern doesn't match any file names, return the pattern itself
as if it were a file name that had been matched. (Normally, when the
pattern doesn't match anything, <CODE>glob</CODE> returns that there were no
matches.)
<P>
<DT><CODE>GLOB_NOSORT</CODE>
<DD>Don't sort the file names; return them in no particular order.
(In practice, the order will depend on the order of the entries in
the directory.) The only reason <EM>not</EM> to sort is to save time.
<P>
<DT><CODE>GLOB_NOESCAPE</CODE>
<DD>Don't treat the <SAMP>`\'</SAMP> character specially in patterns. Normally,
<SAMP>`\'</SAMP> quotes the following character, turning off its special meaning
(if any) so that it matches only itself. When quoting is enabled, the
pattern <SAMP>`\?'</SAMP> matches only the string <SAMP>`?'</SAMP>, because the question
mark in the pattern acts like an ordinary character.
<P>
If you use <CODE>GLOB_NOESCAPE</CODE>, then <SAMP>`\'</SAMP> is an ordinary character.
<P>
<CODE>glob</CODE> does its work by calling the function <CODE>fnmatch</CODE>
repeatedly. It handles the flag <CODE>GLOB_NOESCAPE</CODE> by turning on the
<CODE>FNM_NOESCAPE</CODE> flag in calls to <CODE>fnmatch</CODE>.
</DL>
<P>
<H2><A NAME="SEC96" HREF="library_toc.html#SEC96" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC96">Regular Expression Matching</A></H2>
<P>
The GNU C library supports two interfaces for matching regular
expressions. One is the standard POSIX.2 interface, and the other is
what the GNU system has had for many years.
<P>
Both interfaces are declared in the header file <TT>`regex.h'</TT>.
If you define <CODE>_GNU_SOURCE</CODE>, then the GNU functions, structures
and constants are declared. Otherwise, only the POSIX names are
declared.
<P>
<H3><A NAME="SEC97" HREF="library_toc.html#SEC97" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC97">POSIX Regular Expression Compilation</A></H3>
<P>
Before you can actually match a regular expression, you must
<DFN>compile</DFN> it. This is not true compilation--it produces a special
data structure, not machine instructions. But it is like ordinary
compilation in that its purpose is to enable you to "execute" the
pattern fast. (See section <A HREF="library_9.html#SEC99" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_9.html#SEC99">Matching a Compiled POSIX Regular Expression</A>, for how to use the
compiled regular expression for matching.)
<P>
There is a special data type for compiled regular expressions:
<P>
<A NAME="IDX399"></A>
<U>Data Type:</U> <B>regex_t</B><P>
This type of object holds a compiled regular expression.
It is actually a structure. It has just one field that your programs
should look at:
<P>
<DL COMPACT>
<DT><CODE>re_nsub</CODE>
<DD>This field holds the number of parenthetical subexpressions in the
regular expression that was compiled.
</DL>
<P>
There are several other fields, but we don't describe them here, because
only the functions in the library should use them.
<P>
After you create a <CODE>regex_t</CODE> object, you can compile a regular
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -