📄 regex.texi
字号:
@cnindex RE_CONTEXT_INDEP_OPS@item RE_CONTEXT_INDEP_OPSIf this bit is set, then certain characters are special anywhere outsidea list; if this bit isn't set, then those characters are special only insome contexts and are ordinary elsewhere. Specifically, if this bitisn't set then @samp{*}, and (if the syntax bit @code{RE_LIMITED_OPS}isn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, dependingon the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operatorsonly if they're not first in a regular expression or just after anopen-group or alternation operator. The same holds for @samp{@{} (or@samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) ifit is the beginning of a valid interval and the syntax bit@code{RE_INTERVALS} is set.@cnindex RE_CONTEXT_INVALID_OPS@item RE_CONTEXT_INVALID_OPSIf this bit is set, then repetition and alternation operators can't bein certain positions within a regular expression. Specifically, theregular expression is invalid if it has:@itemize @bullet@itema repetition operator first in the regular expression or just after amatch-beginning-of-line, open-group, or alternation operator; or@iteman alternation operator first or last in the regular expression, justbefore a match-end-of-line operator, or just after an alternation oropen-group operator.@end itemizeIf this bit isn't set, then you can put the characters representing therepetition and alternation characters anywhere in a regular expression.Whether or not they will in fact be operators in certain positionsdepends on other syntax bits.@cnindex RE_DOT_NEWLINE@item RE_DOT_NEWLINEIf this bit is set, then the match-any-character operator matchesa newline; if this bit isn't set, then it doesn't.@cnindex RE_DOT_NOT_NULL@item RE_DOT_NOT_NULLIf this bit is set, then the match-any-character operator doesn't matcha null character; if this bit isn't set, then it does.@cnindex RE_INTERVALS@item RE_INTERVALSIf this bit is set, then Regex recognizes interval operators; if this bitisn't set, then it doesn't.@cnindex RE_LIMITED_OPS@item RE_LIMITED_OPSIf this bit is set, then Regex doesn't recognize the match-one-or-more,match-zero-or-one or alternation operators; if this bit isn't set, thenit does.@cnindex RE_NEWLINE_ALT@item RE_NEWLINE_ALTIf this bit is set, then newline represents the alternation operator; ifthis bit isn't set, then newline is ordinary.@cnindex RE_NO_BK_BRACES@item RE_NO_BK_BRACESIf this bit is set, then @samp{@{} represents the open-interval operatorand @samp{@}} represents the close-interval operator; if this bit isn'tset, then @samp{\@{} represents the open-interval operator and@samp{\@}} represents the close-interval operator. This bit is relevantonly if @code{RE_INTERVALS} is set.@cnindex RE_NO_BK_PARENS@item RE_NO_BK_PARENSIf this bit is set, then @samp{(} represents the open-group operator and@samp{)} represents the close-group operator; if this bit isn't set, then@samp{\(} represents the open-group operator and @samp{\)} representsthe close-group operator.@cnindex RE_NO_BK_REFS@item RE_NO_BK_REFSIf this bit is set, then Regex doesn't recognize @samp{\}@var{digit} asthe back reference operator; if this bit isn't set, then it does.@cnindex RE_NO_BK_VBAR@item RE_NO_BK_VBARIf this bit is set, then @samp{|} represents the alternation operator;if this bit isn't set, then @samp{\|} represents the alternationoperator. This bit is irrelevant if @code{RE_LIMITED_OPS} is set.@cnindex RE_NO_EMPTY_RANGES@item RE_NO_EMPTY_RANGESIf this bit is set, then a regular expression with a range whose endingpoint collates lower than its starting point is invalid; if this bitisn't set, then Regex considers such a range to be empty.@cnindex RE_UNMATCHED_RIGHT_PAREN_ORD@item RE_UNMATCHED_RIGHT_PAREN_ORDIf this bit is set and the regular expression has no matching open-groupoperator, then Regex considers what would otherwise be a close-groupoperator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}.@end table@node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax@section Predefined Syntaxes If you're programming with Regex, you can set a pattern buffer's(@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers})@code{syntax} field either to an arbitrary combination of syntax bits(@pxref{Syntax Bits}) or else to the configurations defined by Regex.These configurations define the syntaxes used by certainprograms---@sc{gnu} Emacs,@cindex Emacs @sc{posix} Awk,@cindex POSIX Awktraditional Awk, @cindex AwkGrep,@cindex Grep@cindex EgrepEgrep---in addition to syntaxes for @sc{posix} basic and extendedregular expressions.The predefined syntaxes--taken directly from @file{regex.h}---are:@example#define RE_SYNTAX_EMACS 0#define RE_SYNTAX_AWK \ (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \ | RE_NO_BK_PARENS | RE_NO_BK_REFS \ | RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \ | RE_UNMATCHED_RIGHT_PAREN_ORD)#define RE_SYNTAX_POSIX_AWK \ (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)#define RE_SYNTAX_GREP \ (RE_BK_PLUS_QM | RE_CHAR_CLASSES \ | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \ | RE_NEWLINE_ALT)#define RE_SYNTAX_EGREP \ (RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \ | RE_NEWLINE_ALT | RE_NO_BK_PARENS \ | RE_NO_BK_VBAR)#define RE_SYNTAX_POSIX_EGREP \ (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)/* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */#define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC#define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC/* Syntax bits common to both basic and extended POSIX regex syntax. */#define _RE_SYNTAX_POSIX_COMMON \ (RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \ | RE_INTERVALS | RE_NO_EMPTY_RANGES)#define RE_SYNTAX_POSIX_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)/* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this isn't minimal, since other operators, such as \`, aren't disabled. */#define RE_SYNTAX_POSIX_MINIMAL_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)#define RE_SYNTAX_POSIX_EXTENDED \ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \ | RE_NO_BK_PARENS | RE_NO_BK_VBAR \ | RE_UNMATCHED_RIGHT_PAREN_ORD)/* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */#define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \ | RE_NO_BK_PARENS | RE_NO_BK_REFS \ | RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD)@end example@node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax@section Collating Elements vs.@: Characters @sc{posix} generalizes the notion of a character to that of acollating element. It defines a @dfn{collating element} to be ``asequence of one or more bytes defined in the current collating sequenceas a unit of collation.''This generalizes the notion of a character intwo ways. First, a single character can map into two or more collatingelements. For example, the German@tex`\ss'@end tex@ifinfo``es-zet''@end ifinfocollates as the collating element @samp{s} followed by another collatingelement @samp{s}. Second, two or more characters can map into onecollating element. For example, the Spanish @samp{ll} collates after@samp{l} and before @samp{m}.Since @sc{posix}'s ``collating element'' preserves the essential idea ofa ``character,'' we use the latter, more familiar, term in this document.@node The Backslash Character, , Collating Elements vs. Characters, Regular Expression Syntax@section The Backslash Character@cindex \The @samp{\} character has one of four different meanings, depending onthe context in which you use it and what syntax bits are set(@pxref{Syntax Bits}). It can: 1) stand for itself, 2) quote the nextcharacter, 3) introduce an operator, or 4) do nothing.@enumerate@itemIt stands for itself inside a list(@pxref{List Operators}) if the syntax bit@code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set. For example, @samp{[\]}would match @samp{\}.@itemIt quotes (makes ordinary, if it's special) the next character when youuse it either:@itemize @bullet@itemoutside a list,@footnote{Sometimesyou don't have to explicitly quote special characters to makethem ordinary. For instance, most characters lose any special meaninginside a list (@pxref{List Operators}). In addition, if the syntax bits@code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS}aren't set, then (for historical reasons) the matcher considers specialcharacters ordinary if they are in contexts where the operations theyrepresent make no sense; for example, then the match-zero-or-moreoperator (represented by @samp{*}) matches itself in the regularexpression @samp{*foo} because there is no preceding expression on whichit can operate. It is poor practice, however, to depend on thisbehavior; if you want a special character to be ordinary outside a list,it's better to always quote it, regardless.} or@iteminside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set.@end itemize@itemIt introduces an operator when followed by certain ordinarycharacters---sometimes only when certain syntax bits are set. See thecases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR},@code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}. Also:@itemize @bullet@item@samp{\b} represents the match-word-boundary operator(@pxref{Match-word-boundary Operator}).@item@samp{\B} represents the match-within-word operator(@pxref{Match-within-word Operator}).@item@samp{\<} represents the match-beginning-of-word operator @*(@pxref{Match-beginning-of-word Operator}).@item@samp{\>} represents the match-end-of-word operator(@pxref{Match-end-of-word Operator}).@item@samp{\w} represents the match-word-constituent operator(@pxref{Match-word-constituent Operator}).@item@samp{\W} represents the match-non-word-constituent operator(@pxref{Match-non-word-constituent Operator}).@item@samp{\`} represents the match-beginning-of-bufferoperator and @samp{\'} represents the match-end-of-buffer operator(@pxref{Buffer Operators}).@itemIf Regex was compiled with the C preprocessor symbol @code{emacs}defined, then @samp{\s@var{class}} represents the match-syntactic-classoperator and @samp{\S@var{class}} represents thematch-not-syntactic-class operator (@pxref{Syntactic Class Operators}).@end itemize@itemIn all other cases, Regex ignores @samp{\}. For example,@samp{\n} matches @samp{n}.@end enumerate@node Common Operators, GNU Operators, Regular Expression Syntax, Top@chapter Common OperatorsYou compose regular expressions from operators. In the followingsections, we describe the regular expression operators specified by@sc{posix}; @sc{gnu} also uses these. Most operators have more than onerepresentation as characters. @xref{Regular Expression Syntax}, forwhat characters represent what operators under what circumstances.For most operators that can be represented in two ways, onerepresentation is a single character and the other is that characterpreceded by @samp{\}. For example, either @samp{(} or @samp{\(}represents the open-group operator. Which one does depends on thesetting of a syntax bit, in this case @code{RE_NO_BK_PARENS}. Why isthis so? Historical reasons dictate some of the varyingrepresentations, while @sc{posix} dictates others. Finally, almost all characters lose any special meaning inside a list(@pxref{List Operators}).@menu* Match-self Operator:: Ordinary characters.* Match-any-character Operator:: .* Concatenation Operator:: Juxtaposition.* Repetition Operators:: * + ? @{@}* Alternation Operator:: |* List Operators:: [...] [^...]* Grouping Operators:: (...)* Back-reference Operator:: \digit* Anchoring Operators:: ^ $@end menu@node Match-self Operator, Match-any-character Operator, , Common Operators@section The Match-self Operator (@var{ordinary character})This operator matches the character itself. All ordinary characters(@pxref{Regular Expression Syntax}) represent this operator. Forexample, @samp{f} is always an ordinary character, so the regularexpression @samp{f} matches only the string @samp{f}. Inparticular, it does @emph{not} match the string @samp{ff}.@node Match-any-character Operator, Concatenation Operator, Match-self Operator, Common Operators@section The Match-any-character Operator (@code{.})@cindex @samp{.}This operator matches any single printing or nonprinting characterexcept it won't match a:@table @asis@item newlineif the syntax bit @code{RE_DOT_NEWLINE} isn't set.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -