📄 regex.texi
字号:
matches, e.g., @samp{foo} and, e.g., the first three characters of@samp{foo\nbar}.Its interaction with the syntax bits and pattern buffer fields isexactly the dual of @samp{^}'s; see the previous section. (That is,``beginning'' becomes ``end'', ``next'' becomes ``previous'', and``after'' becomes ``before''.)@node GNU Operators, GNU Emacs Operators, Common Operators, Top@chapter GNU OperatorsFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't).@menu* Word Operators::* Buffer Operators::@end menu@node Word Operators, Buffer Operators, , GNU Operators@section Word OperatorsThe operators in this section require Regex to recognize parts of words.Regex uses a syntax table to determine whether or not a character ispart of a word, i.e., whether or not it is @dfn{word-constituent}.@menu* Non-Emacs Syntax Tables::* Match-word-boundary Operator:: \b* Match-within-word Operator:: \B* Match-beginning-of-word Operator:: \<* Match-end-of-word Operator:: \>* Match-word-constituent Operator:: \w* Match-non-word-constituent Operator:: \W@end menu@node Non-Emacs Syntax Tables, Match-word-boundary Operator, , Word Operators@subsection Non-Emacs Syntax Tables A @dfn{syntax table} is an array indexed by the characters in yourcharacter set. In the @sc{ascii} encoding, therefore, a syntax tablehas 256 elements. Regex always uses a @code{char *} variable@code{re_syntax_table} as its syntax table. In some cases, itinitializes this variable and in others it expects you to initialize it.@itemize @bullet@itemIf Regex is compiled with the preprocessor symbols @code{emacs} and@code{SYNTAX_TABLE} both undefined, then Regex allocates@code{re_syntax_table} and initializes an element @var{i} either to@code{Sword} (which it defines) if @var{i} is a letter, number, or@samp{_}, or to zero if it's not.@itemIf Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}defined, then Regex expects you to define a @code{char *} variable@code{re_syntax_table} to be a valid syntax table.@item@xref{Emacs Syntax Tables}, for what happens when Regex is compiled withthe preprocessor symbol @code{emacs} defined.@end itemize@node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators@subsection The Match-word-boundary Operator (@code{\b})@cindex @samp{\b}@cindex word boundaries, matchingThis operator (represented by @samp{\b}) matches the empty string ateither the beginning or the end of a word. For example, @samp{\brat\b}matches the separate word @samp{rat}.@node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators@subsection The Match-within-word Operator (@code{\B})@cindex @samp{\B}This operator (represented by @samp{\B}) matches the empty string withina word. For example, @samp{c\Brat\Be} matches @samp{crate}, but@samp{dirty \Brat} doesn't match @samp{dirty rat}.@node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators@subsection The Match-beginning-of-word Operator (@code{\<})@cindex @samp{\<}This operator (represented by @samp{\<}) matches the empty string at thebeginning of a word.@node Match-end-of-word Operator, Match-word-constituent Operator, Match-beginning-of-word Operator, Word Operators@subsection The Match-end-of-word Operator (@code{\>})@cindex @samp{\>}This operator (represented by @samp{\>}) matches the empty string at theend of a word.@node Match-word-constituent Operator, Match-non-word-constituent Operator, Match-end-of-word Operator, Word Operators@subsection The Match-word-constituent Operator (@code{\w})@cindex @samp{\w}This operator (represented by @samp{\w}) matches any word-constituentcharacter.@node Match-non-word-constituent Operator, , Match-word-constituent Operator, Word Operators@subsection The Match-non-word-constituent Operator (@code{\W})@cindex @samp{\W}This operator (represented by @samp{\W}) matches any character that isnot word-constituent.@node Buffer Operators, , Word Operators, GNU Operators@section Buffer Operators Following are operators which work on buffers. In Emacs, a @dfn{buffer}is, naturally, an Emacs buffer. For other programs, Regex considers theentire string to be matched as the buffer.@menu* Match-beginning-of-buffer Operator:: \`* Match-end-of-buffer Operator:: \'@end menu@node Match-beginning-of-buffer Operator, Match-end-of-buffer Operator, , Buffer Operators@subsection The Match-beginning-of-buffer Operator (@code{\`})@cindex @samp{\`}This operator (represented by @samp{\`}) matches the empty string at thebeginning of the buffer.@node Match-end-of-buffer Operator, , Match-beginning-of-buffer Operator, Buffer Operators@subsection The Match-end-of-buffer Operator (@code{\'})@cindex @samp{\'}This operator (represented by @samp{\'}) matches the empty string at theend of the buffer.@node GNU Emacs Operators, What Gets Matched?, GNU Operators, Top@chapter GNU Emacs OperatorsFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't)that you can use only when Regex is compiled with the preprocessorsymbol @code{emacs} defined. @menu* Syntactic Class Operators::@end menu@node Syntactic Class Operators, , , GNU Emacs Operators@section Syntactic Class OperatorsThe operators in this section require Regex to recognize the syntacticclasses of characters. Regex uses a syntax table to determine this.@menu* Emacs Syntax Tables::* Match-syntactic-class Operator:: \sCLASS* Match-not-syntactic-class Operator:: \SCLASS@end menu@node Emacs Syntax Tables, Match-syntactic-class Operator, , Syntactic Class Operators@subsection Emacs Syntax TablesA @dfn{syntax table} is an array indexed by the characters in yourcharacter set. In the @sc{ascii} encoding, therefore, a syntax tablehas 256 elements.If Regex is compiled with the preprocessor symbol @code{emacs} defined,then Regex expects you to define and initialize the variable@code{re_syntax_table} to be an Emacs syntax table. Emacs' syntaxtables are more complicated than Regex's own (@pxref{Non-Emacs SyntaxTables}). @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual},for a description of Emacs' syntax tables.@node Match-syntactic-class Operator, Match-not-syntactic-class Operator, Emacs Syntax Tables, Syntactic Class Operators@subsection The Match-syntactic-class Operator (@code{\s}@var{class})@cindex @samp{\s}This operator matches any character whose syntactic class is representedby a specified character. @samp{\s@var{class}} represents this operatorwhere @var{class} is the character representing the syntactic class youwant. For example, @samp{w} represents the syntacticclass of word-constituent characters, so @samp{\sw} matches anyword-constituent character.@node Match-not-syntactic-class Operator, , Match-syntactic-class Operator, Syntactic Class Operators@subsection The Match-not-syntactic-class Operator (@code{\S}@var{class})@cindex @samp{\S}This operator is similar to the match-syntactic-class operator exceptthat it matches any character whose syntactic class is @emph{not}represented by the specified character. @samp{\S@var{class}} representsthis operator. For example, @samp{w} represents the syntactic class ofword-constituent characters, so @samp{\Sw} matches any character that isnot word-constituent.@node What Gets Matched?, Programming with Regex, GNU Emacs Operators, Top@chapter What Gets Matched?Regex usually matches strings according to the ``leftmost longest''rule; that is, it chooses the longest of the leftmost matches. Thisdoes not mean that for a regular expression containing subexpressionsthat it simply chooses the longest match for each subexpression, left toright; the overall match must also be the longest possible one.For example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not@samp{acdac}, as it would if it were to choose the longest match for thefirst subexpression.@node Programming with Regex, Copying, What Gets Matched?, Top@chapter Programming with RegexHere we describe how you use the Regex data structures and functions inC programs. Regex has three interfaces: one designed for @sc{gnu}, onecompatible with @sc{posix} and one compatible with Berkeley @sc{unix}.@menu* GNU Regex Functions::* POSIX Regex Functions::* BSD Regex Functions::@end menu@node GNU Regex Functions, POSIX Regex Functions, , Programming with Regex@section GNU Regex FunctionsIf you're writing code that doesn't need to be compatible with either@sc{posix} or Berkeley @sc{unix}, you can use these functions. Theyprovide more options than the other interfaces.@menu* GNU Pattern Buffers:: The re_pattern_buffer type.* GNU Regular Expression Compiling:: re_compile_pattern ()* GNU Matching:: re_match ()* GNU Searching:: re_search ()* Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()* Searching with Fastmaps:: re_compile_fastmap ()* GNU Translate Tables:: The `translate' field.* Using Registers:: The re_registers type and related fns.* Freeing GNU Pattern Buffers:: regfree ()@end menu@node GNU Pattern Buffers, GNU Regular Expression Compiling, , GNU Regex Functions@subsection GNU Pattern Buffers@cindex pattern buffer, definition of@tindex re_pattern_buffer @r{definition}@tindex struct re_pattern_buffer @r{definition}To compile, match, or search for a given regular expression, you mustsupply a pattern buffer. A @dfn{pattern buffer} holds one compiledregular expression.@footnote{Regular expressions are also referred to as``patterns,'' hence the name ``pattern buffer.''}You can have several different pattern buffers simultaneously, eachholding a compiled pattern for a different regular expression.@file{regex.h} defines the pattern buffer @code{struct} as follows:@example /* Space that holds the compiled pattern. It is declared as `unsigned char *' because its elements are sometimes used as array indexes. */ unsigned char *buffer; /* Number of bytes to which `buffer' points. */ unsigned long allocated; /* Number of bytes actually used in `buffer'. */ unsigned long used; /* Syntax setting with which the pattern was compiled. */ reg_syntax_t syntax; /* Pointer to a fastmap, if any, otherwise zero. re_search uses the fastmap, if there is one, to skip over impossible starting points for matches. */ char *fastmap; /* Either a translate table to apply to all characters before comparing them, or zero for no translation. The translation is applied to a pattern when it is compiled and to a string when it is matched. */ char *translate; /* Number of subexpressions found by the compiler. */ size_t re_nsub; /* Zero if this pattern cannot match the empty string, one else. Well, in truth it's used only in `re_search_2', to see whether or not we should use the fastmap, so we don't set this absolutely perfectly; see `re_compile_fastmap' (the `duplicate' case). */ unsigned can_be_null : 1; /* If REGS_UNALLOCATED, allocate space in the `regs' structure for `max (RE_NREGS, re_nsub + 1)' groups. If REGS_REALLOCATE, reallocate space if necessary. If REGS_FIXED, use what's there. */#define REGS_UNALLOCATED 0#define REGS_REALLOCATE 1#define REGS_FIXED 2 unsigned regs_allocated : 2; /* Set to zero when `regex_compile' compiles a pattern; set to one by `re_compile_fastmap' if it updates the fastmap. */ unsigned fastmap_accurate : 1; /* If set, `re_match_2' does not return information about subexpressions. */ unsigned no_sub : 1; /* If set, a beginning-of-line anchor doesn't match at the beginning of the string. */ unsigned not_bol : 1; /* Similarly for an end-of-line anchor. */ unsigned not_eol : 1; /* If true, an anchor at a newline matches. */ unsigned newline_anchor : 1;@end example@node GNU Regular Expression Compiling, GNU Matching, GNU Pattern Buffers, GNU Regex Functions@subsection GNU Regular Expression CompilingIn @sc{gnu}, you can both match and search for a given regularexpression. To do either, you must first compile it in a pattern buffer(@pxref{GNU Pattern Buffers}).@cindex syntax initialization@vindex re_syntax_options @r{initialization}Regular expressions match according to the syntax with which they werecompiled; with @sc{gnu}, you indicate what syntax you want by settingthe variable @code{re_syntax_options} (declared in @file{regex.h} anddefined in
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -