library_9.html

来自「Linux程序员的工作手册」· HTML 代码 · 共 862 行 · 第 1/3 页

HTML
862
字号
<!-- This HTML file has been created by texi2html 1.27     from library.texinfo on 3 March 1994 --><TITLE>The GNU C Library - Pattern Matching</TITLE><P>Go to the <A HREF="library_8.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_8.html">previous</A>, <A HREF="library_10.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_10.html">next</A> section.<P><H1><A NAME="SEC91" HREF="library_toc.html#SEC91" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC91">Pattern Matching</A></H1><P>The GNU C Library provides pattern matching facilities for two kindsof patterns: regular expressions and file-name wildcards.<P><H2><A NAME="SEC92" HREF="library_toc.html#SEC92" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC92">Wildcard Matching</A></H2><A NAME="IDX394"></A><P>This section describes how to match a wildcard pattern against aparticular string.  The result is a yes or no answer: does thestring fit the pattern or not.  The symbols described here are alldeclared in <TT>`fnmatch.h'</TT>.<P><A NAME="IDX395"></A><U>Function:</U> int <B>fnmatch</B> <I>(const char *<VAR>pattern</VAR>, const char *<VAR>string</VAR>, int <VAR>flags</VAR>)</I><P>This function tests whether the string <VAR>string</VAR> matches the pattern<VAR>pattern</VAR>.  It returns <CODE>0</CODE> if they do match; otherwise, itreturns the nonzero value <CODE>FNM_NOMATCH</CODE>.  The arguments<VAR>pattern</VAR> and <VAR>string</VAR> are both strings.<P>The argument <VAR>flags</VAR> is a combination of flag bits that alter thedetails of matching.  See below for a list of the defined flags.<P>In the GNU C Library, <CODE>fnmatch</CODE> cannot experience an "error"---italways returns an answer for whether the match succeeds.  However, otherimplementations of <CODE>fnmatch</CODE> might sometimes report "errors".They would do so by returning nonzero values that are not equal to<CODE>FNM_NOMATCH</CODE>.<P>These are the available flags for the <VAR>flags</VAR> argument:<P><DL COMPACT><DT><CODE>FNM_FILE_NAME</CODE><DD>Treat the <SAMP>`/'</SAMP> character specially, for matching file names.  Ifthis flag is set, wildcard constructs in <VAR>pattern</VAR> cannot match<SAMP>`/'</SAMP> in <VAR>string</VAR>.  Thus, the only way to match <SAMP>`/'</SAMP> is withan explicit <SAMP>`/'</SAMP> in <VAR>pattern</VAR>.<P><DT><CODE>FNM_PATHNAME</CODE><DD>This is an alias for <CODE>FNM_FILE_NAME</CODE>; it comes from POSIX.2.  Wedon't recommend this name because we don't use the term "pathname" forfile names.<P><DT><CODE>FNM_PERIOD</CODE><DD>Treat the <SAMP>`.'</SAMP> character specially if it appears at the beginning of<VAR>string</VAR>.  If this flag is set, wildcard constructs in <VAR>pattern</VAR>cannot match <SAMP>`.'</SAMP> as the first character of <VAR>string</VAR>.<P>If you set both <CODE>FNM_PERIOD</CODE> and <CODE>FNM_FILE_NAME</CODE>, then thespecial treatment applies to <SAMP>`.'</SAMP> following <SAMP>`/'</SAMP> as well asto <SAMP>`.'</SAMP> at the beginning of <VAR>string</VAR>.<P><DT><CODE>FNM_NOESCAPE</CODE><DD>Don't treat the <SAMP>`\'</SAMP> character specially in patterns.  Normally,<SAMP>`\'</SAMP> quotes the following character, turning off its special meaning(if any) so that it matches only itself.  When quoting is enabled, thepattern <SAMP>`\?'</SAMP> matches only the string <SAMP>`?'</SAMP>, because the questionmark in the pattern acts like an ordinary character.<P>If you use <CODE>FNM_NOESCAPE</CODE>, then <SAMP>`\'</SAMP> is an ordinary character.<P><DT><CODE>FNM_LEADING_DIR</CODE><DD>Ignore a trailing sequence of characters starting with a <SAMP>`/'</SAMP> in<VAR>string</VAR>; that is to say, test whether <VAR>string</VAR> starts with adirectory name that <VAR>pattern</VAR> matches.<P>If this flag is set, either <SAMP>`foo*'</SAMP> or <SAMP>`foobar'</SAMP> as a patternwould match the string <SAMP>`foobar/frobozz'</SAMP>.<P><DT><CODE>FNM_CASEFOLD</CODE><DD>Ignore case in comparing <VAR>string</VAR> to <VAR>pattern</VAR>.</DL><P><H2><A NAME="SEC93" HREF="library_toc.html#SEC93" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC93">Globbing</A></H2><A NAME="IDX396"></A><P>The archetypal use of wildcards is for matching against the files in adirectory, and making a list of all the matches.  This is called<DFN>globbing</DFN>.<P>You could do this using <CODE>fnmatch</CODE>, by reading the directory entriesone by one and testing each one with <CODE>fnmatch</CODE>.  But that would beslow (and complex, since you would have to handle subdirectories byhand).<P>The library provides a function <CODE>glob</CODE> to make this particular useof wildcards convenient.  <CODE>glob</CODE> and the other symbols in thissection are declared in <TT>`glob.h'</TT>.<P><H3><A NAME="SEC94" HREF="library_toc.html#SEC94" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC94">Calling <CODE>glob</CODE></A></H3><P>The result of globbing is a vector of file names (strings).  To returnthis vector, <CODE>glob</CODE> uses a special data type, <CODE>glob_t</CODE>, whichis a structure.  You pass <CODE>glob</CODE> the address of the structure, andit fills in the structure's fields to tell you about the results.<P><A NAME="IDX397"></A><U>Data Type:</U> <B>glob_t</B><P>This data type holds a pointer to a word vector.  More precisely, itrecords both the address of the word vector and its size.<P><DL COMPACT><DT><CODE>gl_pathc</CODE><DD>The number of elements in the vector.<P><DT><CODE>gl_pathv</CODE><DD>The address of the vector.  This field has type <CODE>char **</CODE>.<P><DT><CODE>gl_offs</CODE><DD>The offset of the first real element of the vector, from its nominaladdress in the <CODE>gl_pathv</CODE> field.  Unlike the other fields, thisis always an input to <CODE>glob</CODE>, rather than an output from it.<P>If you use a nonzero offset, then that many elements at the beginning ofthe vector are left empty.  (The <CODE>glob</CODE> function fills them withnull pointers.)<P>The <CODE>gl_offs</CODE> field is meaningful only if you use the<CODE>GLOB_DOOFFS</CODE> flag.  Otherwise, the offset is always zeroregardless of what is in this field, and the first real element comes atthe beginning of the vector.</DL><P><A NAME="IDX398"></A><U>Function:</U> int <B>glob</B> <I>(const char *<VAR>pattern</VAR>, int <VAR>flags</VAR>, int (*<VAR>errfunc</VAR>) (const char *<VAR>filename</VAR>, int <VAR>error-code</VAR>), glob_t *<VAR>vector_ptr</VAR>)</I><P>The function <CODE>glob</CODE> does globbing using the pattern <VAR>pattern</VAR>in the current directory.  It puts the result in a newly allocatedvector, and stores the size and address of this vector into<CODE>*<VAR>vector-ptr</VAR></CODE>.  The argument <VAR>flags</VAR> is a combination ofbit flags; see section <A HREF="library_9.html#SEC95" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_9.html#SEC95">Flags for Globbing</A>, for details of the flags.<P>The result of globbing is a sequence of file names.  The function<CODE>glob</CODE> allocates a string for each resulting word, thenallocates a vector of type <CODE>char **</CODE> to store the addresses ofthese strings.  The last element of the vector is a null pointer.This vector is called the <DFN>word vector</DFN>.<P>To return this vector, <CODE>glob</CODE> stores both its address and itslength (number of elements, not counting the terminating null pointer)into <CODE>*<VAR>vector-ptr</VAR></CODE>.<P>Normally, <CODE>glob</CODE> sorts the file names alphabetically before returning them.  You can turn this off with the flag <CODE>GLOB_NOSORT</CODE>if you want to get the information as fast as possible.  Usually it'sa good idea to let <CODE>glob</CODE> sort them--if you process the files inalphabetical order, the users will have a feel for the rate of progressthat your application is making.<P>If <CODE>glob</CODE> succeeds, it returns 0.  Otherwise, it returns oneof these error codes:<P><DL COMPACT><DT><CODE>GLOB_ABORTED</CODE><DD>There was an error opening a directory, and you used the flag<CODE>GLOB_ERR</CODE> or your specified <VAR>errfunc</VAR> returned a nonzerovalue.<P><DT><CODE>GLOB_NOMATCH</CODE><DD>The pattern didn't match any existing files.  If you use the<CODE>GLOB_NOCHECK</CODE> flag, then you never get this error code, becausethat flag tells <CODE>glob</CODE> to <EM>pretend</EM> that the pattern matchedat least one file.<P><DT><CODE>GLOB_NOSPACE</CODE><DD>It was impossible to allocate memory to hold the result.</DL><P>In the event of an error, <CODE>glob</CODE> stores information in<CODE>*<VAR>vector-ptr</VAR></CODE> about all the matches it has found so far.<P><H3><A NAME="SEC95" HREF="library_toc.html#SEC95" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC95">Flags for Globbing</A></H3><P>This section describes the flags that you can specify in the <VAR>flags</VAR> argument to <CODE>glob</CODE>.  Choose the flags you want,and combine them with the C operator <CODE>|</CODE>.<P><DL COMPACT><DT><CODE>GLOB_APPEND</CODE><DD>Append the words from this expansion to the vector of words produced byprevious calls to <CODE>glob</CODE>.  This way you can effectively expandseveral words as if they were concatenated with spaces between them.<P>In order for appending to work, you must not modify the contents of theword vector structure between calls to <CODE>glob</CODE>.  And, if you set<CODE>GLOB_DOOFFS</CODE> in the first call to <CODE>glob</CODE>, you must alsoset it when you append to the results.<P><DT><CODE>GLOB_DOOFFS</CODE><DD>Leave blank slots at the beginning of the vector of words.The <CODE>gl_offs</CODE> field says how many slots to leave.The blank slots contain null pointers.<P><DT><CODE>GLOB_ERR</CODE><DD>Give up right away and report an error if there is any difficultyreading the directories that must be read in order to expand <VAR>pattern</VAR>fully.  Such difficulties might include a directory in which you don'thave the requisite access.  Normally, <CODE>glob</CODE> tries its best to keepon going despite any errors, reading whatever directories it can.<P>You can exercise even more control than this by specifying an error-handlerfunction <VAR>errfunc</VAR> when you call <CODE>glob</CODE>.  If <VAR>errfunc</VAR> isnonzero, then <CODE>glob</CODE> doesn't give up right away when it can't reada directory; instead, it calls <VAR>errfunc</VAR> with two arguments, likethis:<P><PRE>(*<VAR>errfunc</VAR>) (<VAR>filename</VAR>, <VAR>error-code</VAR>)</PRE><P>The argument <VAR>filename</VAR> is the name of the directory that<CODE>glob</CODE> couldn't open or couldn't read, and <VAR>error-code</VAR> is the<CODE>errno</CODE> value that was reported to <CODE>glob</CODE>.<P>If the error handler function returns nonzero, then <CODE>glob</CODE> gives upright away.  Otherwise, it continues.<P><DT><CODE>GLOB_MARK</CODE><DD>If the pattern matches the name of a directory, append <SAMP>`/'</SAMP> to thedirectory's name when returning it.<P><DT><CODE>GLOB_NOCHECK</CODE><DD>If the pattern doesn't match any file names, return the pattern itselfas if it were a file name that had been matched.  (Normally, when thepattern doesn't match anything, <CODE>glob</CODE> returns that there were nomatches.)<P><DT><CODE>GLOB_NOSORT</CODE><DD>Don't sort the file names; return them in no particular order.(In practice, the order will depend on the order of the entries inthe directory.)  The only reason <EM>not</EM> to sort is to save time.<P><DT><CODE>GLOB_NOESCAPE</CODE><DD>Don't treat the <SAMP>`\'</SAMP> character specially in patterns.  Normally,<SAMP>`\'</SAMP> quotes the following character, turning off its special meaning(if any) so that it matches only itself.  When quoting is enabled, thepattern <SAMP>`\?'</SAMP> matches only the string <SAMP>`?'</SAMP>, because the questionmark in the pattern acts like an ordinary character.<P>If you use <CODE>GLOB_NOESCAPE</CODE>, then <SAMP>`\'</SAMP> is an ordinary character.<P><CODE>glob</CODE> does its work by calling the function <CODE>fnmatch</CODE>repeatedly.  It handles the flag <CODE>GLOB_NOESCAPE</CODE> by turning on the<CODE>FNM_NOESCAPE</CODE> flag in calls to <CODE>fnmatch</CODE>.</DL><P><H2><A NAME="SEC96" HREF="library_toc.html#SEC96" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC96">Regular Expression Matching</A></H2><P>The GNU C library supports two interfaces for matching regularexpressions.  One is the standard POSIX.2 interface, and the other iswhat the GNU system has had for many years.<P>Both interfaces are declared in the header file <TT>`regex.h'</TT>.If you define <CODE>_GNU_SOURCE</CODE>, then the GNU functions, structuresand constants are declared.  Otherwise, only the POSIX names aredeclared.<P><H3><A NAME="SEC97" HREF="library_toc.html#SEC97" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC97">POSIX Regular Expression Compilation</A></H3><P>Before you can actually match a regular expression, you must<DFN>compile</DFN> it.  This is not true compilation--it produces a specialdata structure, not machine instructions.  But it is like ordinarycompilation in that its purpose is to enable you to "execute" thepattern fast.  (See section <A HREF="library_9.html#SEC99" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_9.html#SEC99">Matching a Compiled POSIX Regular Expression</A>, for how to use thecompiled regular expression for matching.)<P>There is a special data type for compiled regular expressions:<P><A NAME="IDX399"></A><U>Data Type:</U> <B>regex_t</B><P>This type of object holds a compiled regular expression.It is actually a structure.  It has just one field that your programsshould look at:<P><DL COMPACT><DT><CODE>re_nsub</CODE><DD>This field holds the number of parenthetical subexpressions in theregular expression that was compiled.</DL><P>There are several other fields, but we don't describe them here, becauseonly the functions in the library should use them.<P>After you create a <CODE>regex_t</CODE> object, you can compile a regular

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?