📄 regex.texi

📁 正则表达式库
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
matches, e.g., @samp{foo} and, e.g., the first three characters of@samp{foo\nbar}.Its interaction with the syntax bits and pattern buffer fields isexactly the dual of @samp{^}'s; see the previous section.  (That is,``beginning'' becomes ``end'', ``next'' becomes ``previous'', and``after'' becomes ``before''.)@node GNU Operators, GNU Emacs Operators, Common Operators, Top@chapter GNU OperatorsFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't).@menu* Word Operators::* Buffer Operators::@end menu@node Word Operators, Buffer Operators,  , GNU Operators@section Word OperatorsThe operators in this section require Regex to recognize parts of words.Regex uses a syntax table to determine whether or not a character ispart of a word, i.e., whether or not it is @dfn{word-constituent}.@menu* Non-Emacs Syntax Tables::* Match-word-boundary Operator::        \b* Match-within-word Operator::          \B* Match-beginning-of-word Operator::    \<* Match-end-of-word Operator::          \>* Match-word-constituent Operator::     \w* Match-non-word-constituent Operator:: \W@end menu@node Non-Emacs Syntax Tables, Match-word-boundary Operator,  , Word Operators@subsection Non-Emacs Syntax Tables    A @dfn{syntax table} is an array indexed by the characters in yourcharacter set.  In the @sc{ascii} encoding, therefore, a syntax tablehas 256 elements.  Regex always uses a @code{char *} variable@code{re_syntax_table} as its syntax table.  In some cases, itinitializes this variable and in others it expects you to initialize it.@itemize @bullet@itemIf Regex is compiled with the preprocessor symbols @code{emacs} and@code{SYNTAX_TABLE} both undefined, then Regex allocates@code{re_syntax_table} and initializes an element @var{i} either to@code{Sword} (which it defines) if @var{i} is a letter, number, or@samp{_}, or to zero if it's not.@itemIf Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}defined, then Regex expects you to define a @code{char *} variable@code{re_syntax_table} to be a valid syntax table.@item@xref{Emacs Syntax Tables}, for what happens when Regex is compiled withthe preprocessor symbol @code{emacs} defined.@end itemize@node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators@subsection The Match-word-boundary Operator (@code{\b})@cindex @samp{\b}@cindex word boundaries, matchingThis operator (represented by @samp{\b}) matches the empty string ateither the beginning or the end of a word.  For example, @samp{\brat\b}matches the separate word @samp{rat}.@node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators@subsection The Match-within-word Operator (@code{\B})@cindex @samp{\B}This operator (represented by @samp{\B}) matches the empty string withina word. For example, @samp{c\Brat\Be} matches @samp{crate}, but@samp{dirty \Brat} doesn't match @samp{dirty rat}.@node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators@subsection The Match-beginning-of-word Operator (@code{\<})@cindex @samp{\<}This operator (represented by @samp{\<}) matches the empty string at thebeginning of a word.@node Match-end-of-word Operator, Match-word-constituent Operator, Match-beginning-of-word Operator, Word Operators@subsection The Match-end-of-word Operator (@code{\>})@cindex @samp{\>}This operator (represented by @samp{\>}) matches the empty string at theend of a word.@node Match-word-constituent Operator, Match-non-word-constituent Operator, Match-end-of-word Operator, Word Operators@subsection The Match-word-constituent Operator (@code{\w})@cindex @samp{\w}This operator (represented by @samp{\w}) matches any word-constituentcharacter.@node Match-non-word-constituent Operator,  , Match-word-constituent Operator, Word Operators@subsection The Match-non-word-constituent Operator (@code{\W})@cindex @samp{\W}This operator (represented by @samp{\W}) matches any character that isnot word-constituent.@node Buffer Operators,  , Word Operators, GNU Operators@section Buffer Operators    Following are operators which work on buffers.  In Emacs, a @dfn{buffer}is, naturally, an Emacs buffer.  For other programs, Regex considers theentire string to be matched as the buffer.@menu* Match-beginning-of-buffer Operator::  \`* Match-end-of-buffer Operator::        \'@end menu@node Match-beginning-of-buffer Operator, Match-end-of-buffer Operator,  , Buffer Operators@subsection The Match-beginning-of-buffer Operator (@code{\`})@cindex @samp{\`}This operator (represented by @samp{\`}) matches the empty string at thebeginning of the buffer.@node Match-end-of-buffer Operator,  , Match-beginning-of-buffer Operator, Buffer Operators@subsection The Match-end-of-buffer Operator (@code{\'})@cindex @samp{\'}This operator (represented by @samp{\'}) matches the empty string at theend of the buffer.@node GNU Emacs Operators, What Gets Matched?, GNU Operators, Top@chapter GNU Emacs OperatorsFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't)that you can use only when Regex is compiled with the preprocessorsymbol @code{emacs} defined.  @menu* Syntactic Class Operators::@end menu@node Syntactic Class Operators,  ,  , GNU Emacs Operators@section Syntactic Class OperatorsThe operators in this section require Regex to recognize the syntacticclasses of characters.  Regex uses a syntax table to determine this.@menu* Emacs Syntax Tables::* Match-syntactic-class Operator::      \sCLASS* Match-not-syntactic-class Operator::  \SCLASS@end menu@node Emacs Syntax Tables, Match-syntactic-class Operator,  , Syntactic Class Operators@subsection Emacs Syntax TablesA @dfn{syntax table} is an array indexed by the characters in yourcharacter set.  In the @sc{ascii} encoding, therefore, a syntax tablehas 256 elements.If Regex is compiled with the preprocessor symbol @code{emacs} defined,then Regex expects you to define and initialize the variable@code{re_syntax_table} to be an Emacs syntax table.  Emacs' syntaxtables are more complicated than Regex's own (@pxref{Non-Emacs SyntaxTables}).  @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual},for a description of Emacs' syntax tables.@node Match-syntactic-class Operator, Match-not-syntactic-class Operator, Emacs Syntax Tables, Syntactic Class Operators@subsection The Match-syntactic-class Operator (@code{\s}@var{class})@cindex @samp{\s}This operator matches any character whose syntactic class is representedby a specified character.  @samp{\s@var{class}} represents this operatorwhere @var{class} is the character representing the syntactic class youwant.  For example, @samp{w} represents the syntacticclass of word-constituent characters, so @samp{\sw} matches anyword-constituent character.@node Match-not-syntactic-class Operator,  , Match-syntactic-class Operator, Syntactic Class Operators@subsection The Match-not-syntactic-class Operator (@code{\S}@var{class})@cindex @samp{\S}This operator is similar to the match-syntactic-class operator exceptthat it matches any character whose syntactic class is @emph{not}represented by the specified character.  @samp{\S@var{class}} representsthis operator.  For example, @samp{w} represents the syntactic class ofword-constituent characters, so @samp{\Sw} matches any character that isnot word-constituent.@node What Gets Matched?, Programming with Regex, GNU Emacs Operators, Top@chapter What Gets Matched?Regex usually matches strings according to the ``leftmost longest''rule; that is, it chooses the longest of the leftmost matches.  Thisdoes not mean that for a regular expression containing subexpressionsthat it simply chooses the longest match for each subexpression, left toright; the overall match must also be the longest possible one.For example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not@samp{acdac}, as it would if it were to choose the longest match for thefirst subexpression.@node Programming with Regex, Copying, What Gets Matched?, Top@chapter Programming with RegexHere we describe how you use the Regex data structures and functions inC programs.  Regex has three interfaces: one designed for @sc{gnu}, onecompatible with @sc{posix} and one compatible with Berkeley @sc{unix}.@menu* GNU Regex Functions::* POSIX Regex Functions::* BSD Regex Functions::@end menu@node GNU Regex Functions, POSIX Regex Functions,  , Programming with Regex@section GNU Regex FunctionsIf you're writing code that doesn't need to be compatible with either@sc{posix} or Berkeley @sc{unix}, you can use these functions.  Theyprovide more options than the other interfaces.@menu* GNU Pattern Buffers::         The re_pattern_buffer type.* GNU Regular Expression Compiling::  re_compile_pattern ()* GNU Matching::                re_match ()* GNU Searching::               re_search ()* Matching/Searching with Split Data::  re_match_2 (), re_search_2 ()* Searching with Fastmaps::     re_compile_fastmap ()* GNU Translate Tables::        The `translate' field.* Using Registers::             The re_registers type and related fns.* Freeing GNU Pattern Buffers::  regfree ()@end menu@node GNU Pattern Buffers, GNU Regular Expression Compiling,  , GNU Regex Functions@subsection GNU Pattern Buffers@cindex pattern buffer, definition of@tindex re_pattern_buffer @r{definition}@tindex struct re_pattern_buffer @r{definition}To compile, match, or search for a given regular expression, you mustsupply a pattern buffer.  A @dfn{pattern buffer} holds one compiledregular expression.@footnote{Regular expressions are also referred to as``patterns,'' hence the name ``pattern buffer.''}You can have several different pattern buffers simultaneously, eachholding a compiled pattern for a different regular expression.@file{regex.h} defines the pattern buffer @code{struct} as follows:@example        /* Space that holds the compiled pattern.  It is declared as          `unsigned char *' because its elements are           sometimes used as array indexes.  */  unsigned char *buffer;        /* Number of bytes to which `buffer' points.  */  unsigned long allocated;        /* Number of bytes actually used in `buffer'.  */  unsigned long used;           /* Syntax setting with which the pattern was compiled.  */  reg_syntax_t syntax;        /* Pointer to a fastmap, if any, otherwise zero.  re_search uses           the fastmap, if there is one, to skip over impossible           starting points for matches.  */  char *fastmap;        /* Either a translate table to apply to all characters before           comparing them, or zero for no translation.  The translation           is applied to a pattern when it is compiled and to a string           when it is matched.  */  char *translate;        /* Number of subexpressions found by the compiler.  */  size_t re_nsub;        /* Zero if this pattern cannot match the empty string, one else.           Well, in truth it's used only in `re_search_2', to see           whether or not we should use the fastmap, so we don't set           this absolutely perfectly; see `re_compile_fastmap' (the           `duplicate' case).  */  unsigned can_be_null : 1;        /* If REGS_UNALLOCATED, allocate space in the `regs' structure             for `max (RE_NREGS, re_nsub + 1)' groups.           If REGS_REALLOCATE, reallocate space if necessary.           If REGS_FIXED, use what's there.  */#define REGS_UNALLOCATED 0#define REGS_REALLOCATE 1#define REGS_FIXED 2  unsigned regs_allocated : 2;        /* Set to zero when `regex_compile' compiles a pattern; set to one           by `re_compile_fastmap' if it updates the fastmap.  */  unsigned fastmap_accurate : 1;        /* If set, `re_match_2' does not return information about           subexpressions.  */  unsigned no_sub : 1;        /* If set, a beginning-of-line anchor doesn't match at the           beginning of the string.  */   unsigned not_bol : 1;        /* Similarly for an end-of-line anchor.  */  unsigned not_eol : 1;        /* If true, an anchor at a newline matches.  */  unsigned newline_anchor : 1;@end example@node GNU Regular Expression Compiling, GNU Matching, GNU Pattern Buffers, GNU Regex Functions@subsection GNU Regular Expression CompilingIn @sc{gnu}, you can both match and search for a given regularexpression.  To do either, you must first compile it in a pattern buffer(@pxref{GNU Pattern Buffers}).@cindex syntax initialization@vindex re_syntax_options @r{initialization}Regular expressions match according to the syntax with which they werecompiled; with @sc{gnu}, you indicate what syntax you want by settingthe variable @code{re_syntax_options} (declared in @file{regex.h} anddefined in
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -