📄 library_9.html

📁 Glibc的中文手册
💻 HTML
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<!-- This HTML file has been created by texi2html 1.27
     from library.texinfo on 3 March 1994 -->

<TITLE>The GNU C Library - Pattern Matching</TITLE>
<P>Go to the <A HREF="library_8.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_8.html">previous</A>, <A HREF="library_10.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_10.html">next</A> section.<P>
<H1><A NAME="SEC91" HREF="library_toc.html#SEC91" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC91">Pattern Matching</A></H1>
<P>
The GNU C Library provides pattern matching facilities for two kinds
of patterns: regular expressions and file-name wildcards.
<P>
<H2><A NAME="SEC92" HREF="library_toc.html#SEC92" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC92">Wildcard Matching</A></H2>
<A NAME="IDX394"></A>
<P>
This section describes how to match a wildcard pattern against a
particular string.  The result is a yes or no answer: does the
string fit the pattern or not.  The symbols described here are all
declared in <TT>`fnmatch.h'</TT>.
<P>
<A NAME="IDX395"></A>
<U>Function:</U> int <B>fnmatch</B> <I>(const char *<VAR>pattern</VAR>, const char *<VAR>string</VAR>, int <VAR>flags</VAR>)</I><P>
This function tests whether the string <VAR>string</VAR> matches the pattern
<VAR>pattern</VAR>.  It returns <CODE>0</CODE> if they do match; otherwise, it
returns the nonzero value <CODE>FNM_NOMATCH</CODE>.  The arguments
<VAR>pattern</VAR> and <VAR>string</VAR> are both strings.
<P>
The argument <VAR>flags</VAR> is a combination of flag bits that alter the
details of matching.  See below for a list of the defined flags.
<P>
In the GNU C Library, <CODE>fnmatch</CODE> cannot experience an "error"---it
always returns an answer for whether the match succeeds.  However, other
implementations of <CODE>fnmatch</CODE> might sometimes report "errors".
They would do so by returning nonzero values that are not equal to
<CODE>FNM_NOMATCH</CODE>.
<P>
These are the available flags for the <VAR>flags</VAR> argument:
<P>
<DL COMPACT>
<DT><CODE>FNM_FILE_NAME</CODE>
<DD>Treat the <SAMP>`/'</SAMP> character specially, for matching file names.  If
this flag is set, wildcard constructs in <VAR>pattern</VAR> cannot match
<SAMP>`/'</SAMP> in <VAR>string</VAR>.  Thus, the only way to match <SAMP>`/'</SAMP> is with
an explicit <SAMP>`/'</SAMP> in <VAR>pattern</VAR>.
<P>
<DT><CODE>FNM_PATHNAME</CODE>
<DD>This is an alias for <CODE>FNM_FILE_NAME</CODE>; it comes from POSIX.2.  We
don't recommend this name because we don't use the term "pathname" for
file names.
<P>
<DT><CODE>FNM_PERIOD</CODE>
<DD>Treat the <SAMP>`.'</SAMP> character specially if it appears at the beginning of
<VAR>string</VAR>.  If this flag is set, wildcard constructs in <VAR>pattern</VAR>
cannot match <SAMP>`.'</SAMP> as the first character of <VAR>string</VAR>.
<P>
If you set both <CODE>FNM_PERIOD</CODE> and <CODE>FNM_FILE_NAME</CODE>, then the
special treatment applies to <SAMP>`.'</SAMP> following <SAMP>`/'</SAMP> as well as
to <SAMP>`.'</SAMP> at the beginning of <VAR>string</VAR>.
<P>
<DT><CODE>FNM_NOESCAPE</CODE>
<DD>Don't treat the <SAMP>`\'</SAMP> character specially in patterns.  Normally,
<SAMP>`\'</SAMP> quotes the following character, turning off its special meaning
(if any) so that it matches only itself.  When quoting is enabled, the
pattern <SAMP>`\?'</SAMP> matches only the string <SAMP>`?'</SAMP>, because the question
mark in the pattern acts like an ordinary character.
<P>
If you use <CODE>FNM_NOESCAPE</CODE>, then <SAMP>`\'</SAMP> is an ordinary character.
<P>
<DT><CODE>FNM_LEADING_DIR</CODE>
<DD>Ignore a trailing sequence of characters starting with a <SAMP>`/'</SAMP> in
<VAR>string</VAR>; that is to say, test whether <VAR>string</VAR> starts with a
directory name that <VAR>pattern</VAR> matches.
<P>
If this flag is set, either <SAMP>`foo*'</SAMP> or <SAMP>`foobar'</SAMP> as a pattern
would match the string <SAMP>`foobar/frobozz'</SAMP>.
<P>
<DT><CODE>FNM_CASEFOLD</CODE>
<DD>Ignore case in comparing <VAR>string</VAR> to <VAR>pattern</VAR>.
</DL>
<P>
<H2><A NAME="SEC93" HREF="library_toc.html#SEC93" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC93">Globbing</A></H2>
<A NAME="IDX396"></A>
<P>
The archetypal use of wildcards is for matching against the files in a
directory, and making a list of all the matches.  This is called
<DFN>globbing</DFN>.
<P>
You could do this using <CODE>fnmatch</CODE>, by reading the directory entries
one by one and testing each one with <CODE>fnmatch</CODE>.  But that would be
slow (and complex, since you would have to handle subdirectories by
hand).
<P>
The library provides a function <CODE>glob</CODE> to make this particular use
of wildcards convenient.  <CODE>glob</CODE> and the other symbols in this
section are declared in <TT>`glob.h'</TT>.
<P>
<H3><A NAME="SEC94" HREF="library_toc.html#SEC94" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC94">Calling <CODE>glob</CODE></A></H3>
<P>
The result of globbing is a vector of file names (strings).  To return
this vector, <CODE>glob</CODE> uses a special data type, <CODE>glob_t</CODE>, which
is a structure.  You pass <CODE>glob</CODE> the address of the structure, and
it fills in the structure's fields to tell you about the results.
<P>
<A NAME="IDX397"></A>
<U>Data Type:</U> <B>glob_t</B><P>
This data type holds a pointer to a word vector.  More precisely, it
records both the address of the word vector and its size.
<P>
<DL COMPACT>
<DT><CODE>gl_pathc</CODE>
<DD>The number of elements in the vector.
<P>
<DT><CODE>gl_pathv</CODE>
<DD>The address of the vector.  This field has type <CODE>char **</CODE>.
<P>
<DT><CODE>gl_offs</CODE>
<DD>The offset of the first real element of the vector, from its nominal
address in the <CODE>gl_pathv</CODE> field.  Unlike the other fields, this
is always an input to <CODE>glob</CODE>, rather than an output from it.
<P>
If you use a nonzero offset, then that many elements at the beginning of
the vector are left empty.  (The <CODE>glob</CODE> function fills them with
null pointers.)
<P>
The <CODE>gl_offs</CODE> field is meaningful only if you use the
<CODE>GLOB_DOOFFS</CODE> flag.  Otherwise, the offset is always zero
regardless of what is in this field, and the first real element comes at
the beginning of the vector.
</DL>
<P>
<A NAME="IDX398"></A>
<U>Function:</U> int <B>glob</B> <I>(const char *<VAR>pattern</VAR>, int <VAR>flags</VAR>, int (*<VAR>errfunc</VAR>) (const char *<VAR>filename</VAR>, int <VAR>error-code</VAR>), glob_t *<VAR>vector_ptr</VAR>)</I><P>
The function <CODE>glob</CODE> does globbing using the pattern <VAR>pattern</VAR>
in the current directory.  It puts the result in a newly allocated
vector, and stores the size and address of this vector into
<CODE>*<VAR>vector-ptr</VAR></CODE>.  The argument <VAR>flags</VAR> is a combination of
bit flags; see section <A HREF="library_9.html#SEC95" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_9.html#SEC95">Flags for Globbing</A>, for details of the flags.
<P>
The result of globbing is a sequence of file names.  The function
<CODE>glob</CODE> allocates a string for each resulting word, then
allocates a vector of type <CODE>char **</CODE> to store the addresses of
these strings.  The last element of the vector is a null pointer.
This vector is called the <DFN>word vector</DFN>.
<P>
To return this vector, <CODE>glob</CODE> stores both its address and its
length (number of elements, not counting the terminating null pointer)
into <CODE>*<VAR>vector-ptr</VAR></CODE>.
<P>
Normally, <CODE>glob</CODE> sorts the file names alphabetically before 
returning them.  You can turn this off with the flag <CODE>GLOB_NOSORT</CODE>
if you want to get the information as fast as possible.  Usually it's
a good idea to let <CODE>glob</CODE> sort them--if you process the files in
alphabetical order, the users will have a feel for the rate of progress
that your application is making.
<P>
If <CODE>glob</CODE> succeeds, it returns 0.  Otherwise, it returns one
of these error codes:
<P>
<DL COMPACT>
<DT><CODE>GLOB_ABORTED</CODE>
<DD>There was an error opening a directory, and you used the flag
<CODE>GLOB_ERR</CODE> or your specified <VAR>errfunc</VAR> returned a nonzero
value.
<P>
<DT><CODE>GLOB_NOMATCH</CODE>
<DD>The pattern didn't match any existing files.  If you use the
<CODE>GLOB_NOCHECK</CODE> flag, then you never get this error code, because
that flag tells <CODE>glob</CODE> to <EM>pretend</EM> that the pattern matched
at least one file.
<P>
<DT><CODE>GLOB_NOSPACE</CODE>
<DD>It was impossible to allocate memory to hold the result.
</DL>
<P>
In the event of an error, <CODE>glob</CODE> stores information in
<CODE>*<VAR>vector-ptr</VAR></CODE> about all the matches it has found so far.
<P>
<H3><A NAME="SEC95" HREF="library_toc.html#SEC95" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC95">Flags for Globbing</A></H3>
<P>
This section describes the flags that you can specify in the 
<VAR>flags</VAR> argument to <CODE>glob</CODE>.  Choose the flags you want,
and combine them with the C operator <CODE>|</CODE>.
<P>
<DL COMPACT>
<DT><CODE>GLOB_APPEND</CODE>
<DD>Append the words from this expansion to the vector of words produced by
previous calls to <CODE>glob</CODE>.  This way you can effectively expand
several words as if they were concatenated with spaces between them.
<P>
In order for appending to work, you must not modify the contents of the
word vector structure between calls to <CODE>glob</CODE>.  And, if you set
<CODE>GLOB_DOOFFS</CODE> in the first call to <CODE>glob</CODE>, you must also
set it when you append to the results.
<P>
<DT><CODE>GLOB_DOOFFS</CODE>
<DD>Leave blank slots at the beginning of the vector of words.
The <CODE>gl_offs</CODE> field says how many slots to leave.
The blank slots contain null pointers.
<P>
<DT><CODE>GLOB_ERR</CODE>
<DD>Give up right away and report an error if there is any difficulty
reading the directories that must be read in order to expand <VAR>pattern</VAR>
fully.  Such difficulties might include a directory in which you don't
have the requisite access.  Normally, <CODE>glob</CODE> tries its best to keep
on going despite any errors, reading whatever directories it can.
<P>
You can exercise even more control than this by specifying an error-handler
function <VAR>errfunc</VAR> when you call <CODE>glob</CODE>.  If <VAR>errfunc</VAR> is
nonzero, then <CODE>glob</CODE> doesn't give up right away when it can't read
a directory; instead, it calls <VAR>errfunc</VAR> with two arguments, like
this:
<P>
<PRE>
(*<VAR>errfunc</VAR>) (<VAR>filename</VAR>, <VAR>error-code</VAR>)
</PRE>
<P>
The argument <VAR>filename</VAR> is the name of the directory that
<CODE>glob</CODE> couldn't open or couldn't read, and <VAR>error-code</VAR> is the
<CODE>errno</CODE> value that was reported to <CODE>glob</CODE>.
<P>
If the error handler function returns nonzero, then <CODE>glob</CODE> gives up
right away.  Otherwise, it continues.
<P>
<DT><CODE>GLOB_MARK</CODE>
<DD>If the pattern matches the name of a directory, append <SAMP>`/'</SAMP> to the
directory's name when returning it.
<P>
<DT><CODE>GLOB_NOCHECK</CODE>
<DD>If the pattern doesn't match any file names, return the pattern itself
as if it were a file name that had been matched.  (Normally, when the
pattern doesn't match anything, <CODE>glob</CODE> returns that there were no
matches.)
<P>
<DT><CODE>GLOB_NOSORT</CODE>
<DD>Don't sort the file names; return them in no particular order.
(In practice, the order will depend on the order of the entries in
the directory.)  The only reason <EM>not</EM> to sort is to save time.
<P>
<DT><CODE>GLOB_NOESCAPE</CODE>
<DD>Don't treat the <SAMP>`\'</SAMP> character specially in patterns.  Normally,
<SAMP>`\'</SAMP> quotes the following character, turning off its special meaning
(if any) so that it matches only itself.  When quoting is enabled, the
pattern <SAMP>`\?'</SAMP> matches only the string <SAMP>`?'</SAMP>, because the question
mark in the pattern acts like an ordinary character.
<P>
If you use <CODE>GLOB_NOESCAPE</CODE>, then <SAMP>`\'</SAMP> is an ordinary character.
<P>
<CODE>glob</CODE> does its work by calling the function <CODE>fnmatch</CODE>
repeatedly.  It handles the flag <CODE>GLOB_NOESCAPE</CODE> by turning on the
<CODE>FNM_NOESCAPE</CODE> flag in calls to <CODE>fnmatch</CODE>.
</DL>
<P>
<H2><A NAME="SEC96" HREF="library_toc.html#SEC96" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC96">Regular Expression Matching</A></H2>
<P>
The GNU C library supports two interfaces for matching regular
expressions.  One is the standard POSIX.2 interface, and the other is
what the GNU system has had for many years.
<P>
Both interfaces are declared in the header file <TT>`regex.h'</TT>.
If you define <CODE>_GNU_SOURCE</CODE>, then the GNU functions, structures
and constants are declared.  Otherwise, only the POSIX names are
declared.
<P>
<H3><A NAME="SEC97" HREF="library_toc.html#SEC97" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC97">POSIX Regular Expression Compilation</A></H3>
<P>
Before you can actually match a regular expression, you must
<DFN>compile</DFN> it.  This is not true compilation--it produces a special
data structure, not machine instructions.  But it is like ordinary
compilation in that its purpose is to enable you to "execute" the
pattern fast.  (See section <A HREF="library_9.html#SEC99" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_9.html#SEC99">Matching a Compiled POSIX Regular Expression</A>, for how to use the
compiled regular expression for matching.)
<P>
There is a special data type for compiled regular expressions:
<P>
<A NAME="IDX399"></A>
<U>Data Type:</U> <B>regex_t</B><P>
This type of object holds a compiled regular expression.
It is actually a structure.  It has just one field that your programs
should look at:
<P>
<DL COMPACT>
<DT><CODE>re_nsub</CODE>
<DD>This field holds the number of parenthetical subexpressions in the
regular expression that was compiled.
</DL>
<P>
There are several other fields, but we don't describe them here, because
only the functions in the library should use them.
<P>
After you create a <CODE>regex_t</CODE> object, you can compile a regular
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -