⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 xregex.texi

📁 正则表达式库
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
isn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, dependingon the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operatorsonly if they're not first in a regular expression or just after anopen-group or alternation operator.  The same holds for @samp{@{} (or@samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) ifit is the beginning of a valid interval and the syntax bit@code{RE_INTERVALS} is set.@cnindex RE_CONTEXT_INVALID_OPS@item RE_CONTEXT_INVALID_OPSIf this bit is set, then repetition and alternation operators can't bein certain positions within a regular expression.  Specifically, theregular expression is invalid if it has:@itemize @bullet@itema repetition operator first in the regular expression or just after amatch-beginning-of-line, open-group, or alternation operator; or@iteman alternation operator first or last in the regular expression, justbefore a match-end-of-line operator, or just after an alternation oropen-group operator.@end itemizeIf this bit isn't set, then you can put the characters representing therepetition and alternation characters anywhere in a regular expression.Whether or not they will in fact be operators in certain positionsdepends on other syntax bits.@cnindex RE_DOT_NEWLINE@item RE_DOT_NEWLINEIf this bit is set, then the match-any-character operator matchesa newline; if this bit isn't set, then it doesn't.@cnindex RE_DOT_NOT_NULL@item RE_DOT_NOT_NULLIf this bit is set, then the match-any-character operator doesn't matcha null character; if this bit isn't set, then it does.@cnindex RE_INTERVALS@item RE_INTERVALSIf this bit is set, then Regex recognizes interval operators; if this bitisn't set, then it doesn't.@cnindex RE_LIMITED_OPS@item RE_LIMITED_OPSIf this bit is set, then Regex doesn't recognize the match-one-or-more,match-zero-or-one or alternation operators; if this bit isn't set, thenit does.@cnindex RE_NEWLINE_ALT@item RE_NEWLINE_ALTIf this bit is set, then newline represents the alternation operator; ifthis bit isn't set, then newline is ordinary.@cnindex RE_NO_BK_BRACES@item RE_NO_BK_BRACESIf this bit is set, then @samp{@{} represents the open-interval operatorand @samp{@}} represents the close-interval operator; if this bit isn'tset, then @samp{\@{} represents the open-interval operator and@samp{\@}} represents the close-interval operator.  This bit is relevantonly if @code{RE_INTERVALS} is set.@cnindex RE_NO_BK_PARENS@item RE_NO_BK_PARENSIf this bit is set, then @samp{(} represents the open-group operator and@samp{)} represents the close-group operator; if this bit isn't set, then@samp{\(} represents the open-group operator and @samp{\)} representsthe close-group operator.@cnindex RE_NO_BK_REFS@item RE_NO_BK_REFSIf this bit is set, then Regex doesn't recognize @samp{\}@var{digit} asthe back reference operator; if this bit isn't set, then it does.@cnindex RE_NO_BK_VBAR@item RE_NO_BK_VBARIf this bit is set, then @samp{|} represents the alternation operator;if this bit isn't set, then @samp{\|} represents the alternationoperator.  This bit is irrelevant if @code{RE_LIMITED_OPS} is set.@cnindex RE_NO_EMPTY_RANGES@item RE_NO_EMPTY_RANGESIf this bit is set, then a regular expression with a range whose endingpoint collates lower than its starting point is invalid; if this bitisn't set, then Regex considers such a range to be empty.@cnindex RE_UNMATCHED_RIGHT_PAREN_ORD@item RE_UNMATCHED_RIGHT_PAREN_ORDIf this bit is set and the regular expression has no matching open-groupoperator, then Regex considers what would otherwise be a close-groupoperator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}.@end table@node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax@section Predefined Syntaxes    If you're programming with Regex, you can set a pattern buffer's(@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers})@code{syntax} field either to an arbitrary combination of syntax bits(@pxref{Syntax Bits}) or else to the configurations defined by Regex.These configurations define the syntaxes used by certainprograms---@sc{gnu} Emacs,@cindex Emacs @sc{posix} Awk,@cindex POSIX Awktraditional Awk, @cindex AwkGrep,@cindex Grep@cindex EgrepEgrep---in addition to syntaxes for @sc{posix} basic and extendedregular expressions.The predefined syntaxes--taken directly from @file{regex.h}---are:@example[[[ syntaxes ]]]@end example@node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax@section Collating Elements vs.@: Characters    @sc{posix} generalizes the notion of a character to that of acollating element.  It defines a @dfn{collating element} to be ``asequence of one or more bytes defined in the current collating sequenceas a unit of collation.''This generalizes the notion of a character intwo ways.  First, a single character can map into two or more collatingelements.  For example, the German@tex`\ss'@end tex@ifinfo``es-zet''@end ifinfocollates as the collating element @samp{s} followed by another collatingelement @samp{s}.  Second, two or more characters can map into onecollating element.  For example, the Spanish @samp{ll} collates after@samp{l} and before @samp{m}.Since @sc{posix}'s ``collating element'' preserves the essential idea ofa ``character,'' we use the latter, more familiar, term in this document.@node The Backslash Character,  , Collating Elements vs. Characters, Regular Expression Syntax@section The Backslash Character@cindex \The @samp{\} character has one of four different meanings, depending onthe context in which you use it and what syntax bits are set(@pxref{Syntax Bits}).  It can: 1) stand for itself, 2) quote the nextcharacter, 3) introduce an operator, or 4) do nothing.@enumerate@itemIt stands for itself inside a list(@pxref{List Operators}) if the syntax bit@code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set.  For example, @samp{[\]}would match @samp{\}.@itemIt quotes (makes ordinary, if it's special) the next character when youuse it either:@itemize @bullet@itemoutside a list,@footnote{Sometimesyou don't have to explicitly quote special characters to makethem ordinary.  For instance, most characters lose any special meaninginside a list (@pxref{List Operators}).  In addition, if the syntax bits@code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS}aren't set, then (for historical reasons) the matcher considers specialcharacters ordinary if they are in contexts where the operations theyrepresent make no sense; for example, then the match-zero-or-moreoperator (represented by @samp{*}) matches itself in the regularexpression @samp{*foo} because there is no preceding expression on whichit can operate.  It is poor practice, however, to depend on thisbehavior; if you want a special character to be ordinary outside a list,it's better to always quote it, regardless.} or@iteminside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set.@end itemize@itemIt introduces an operator when followed by certain ordinarycharacters---sometimes only when certain syntax bits are set.  See thecases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR},@code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}.  Also:@itemize @bullet@item@samp{\b} represents the match-word-boundary operator(@pxref{Match-word-boundary Operator}).@item@samp{\B} represents the match-within-word operator(@pxref{Match-within-word Operator}).@item@samp{\<} represents the match-beginning-of-word operator @*(@pxref{Match-beginning-of-word Operator}).@item@samp{\>} represents the match-end-of-word operator(@pxref{Match-end-of-word Operator}).@item@samp{\w} represents the match-word-constituent operator(@pxref{Match-word-constituent Operator}).@item@samp{\W} represents the match-non-word-constituent operator(@pxref{Match-non-word-constituent Operator}).@item@samp{\`} represents the match-beginning-of-bufferoperator and @samp{\'} represents the match-end-of-buffer operator(@pxref{Buffer Operators}).@itemIf Regex was compiled with the C preprocessor symbol @code{emacs}defined, then @samp{\s@var{class}} represents the match-syntactic-classoperator and @samp{\S@var{class}} represents thematch-not-syntactic-class operator (@pxref{Syntactic Class Operators}).@end itemize@itemIn all other cases, Regex ignores @samp{\}.  For example,@samp{\n} matches @samp{n}.@end enumerate@node Common Operators, GNU Operators, Regular Expression Syntax, Top@chapter Common OperatorsYou compose regular expressions from operators.  In the followingsections, we describe the regular expression operators specified by@sc{posix}; @sc{gnu} also uses these.  Most operators have more than onerepresentation as characters.  @xref{Regular Expression Syntax}, forwhat characters represent what operators under what circumstances.For most operators that can be represented in two ways, onerepresentation is a single character and the other is that characterpreceded by @samp{\}.  For example, either @samp{(} or @samp{\(}represents the open-group operator.  Which one does depends on thesetting of a syntax bit, in this case @code{RE_NO_BK_PARENS}.  Why isthis so?  Historical reasons dictate some of the varyingrepresentations, while @sc{posix} dictates others.  Finally, almost all characters lose any special meaning inside a list(@pxref{List Operators}).@menu* Match-self Operator::			Ordinary characters.* Match-any-character Operator::	.* Concatenation Operator::		Juxtaposition.* Repetition Operators::		*  +  ? @{@}* Alternation Operator::		|* List Operators::			[...]  [^...]* Grouping Operators::			(...)* Back-reference Operator::		\digit* Anchoring Operators::			^  $@end menu@node Match-self Operator, Match-any-character Operator,  , Common Operators@section The Match-self Operator (@var{ordinary character})This operator matches the character itself.  All ordinary characters(@pxref{Regular Expression Syntax}) represent this operator.  Forexample, @samp{f} is always an ordinary character, so the regularexpression @samp{f} matches only the string @samp{f}.  Inparticular, it does @emph{not} match the string @samp{ff}.@node Match-any-character Operator, Concatenation Operator, Match-self Operator, Common Operators@section The Match-any-character Operator (@code{.})@cindex @samp{.}This operator matches any single printing or nonprinting characterexcept it won't match a:@table @asis@item newlineif the syntax bit @code{RE_DOT_NEWLINE} isn't set.@item nullif the syntax bit @code{RE_DOT_NOT_NULL} is set.@end tableThe @samp{.} (period) character represents this operator.  For example,@samp{a.b} matches any three-character string beginning with @samp{a}and ending with @samp{b}.@node Concatenation Operator, Repetition Operators, Match-any-character Operator, Common Operators@section The Concatenation OperatorThis operator concatenates two regular expressions @var{a} and @var{b}.No character represents this operator; you simply put @var{b} after@var{a}.  The result is a regular expression that will match a string if@var{a} matches its first part and @var{b} matches the rest.  Forexample, @samp{xy} (two match-self operators) matches @samp{xy}.@node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators@section Repetition Operators    Repetition operators repeat the preceding regular expression a specifiednumber of times.@menu* Match-zero-or-more Operator::  ** Match-one-or-more Operator::   +* Match-zero-or-one Operator::   ?* Interval Operators::           @{@}@end menu@node Match-zero-or-more Operator, Match-one-or-more Operator,  , Repetition Operators@subsection The Match-zero-or-more Operator (@code{*})@cindex @samp{*}This operator repeats the smallest possible preceding regular expressionas many times as necessary (including zero) to match the pattern.@samp{*} represents this operator.  For example, @samp{o*}matches any string made up of zero or more @samp{o}s.  Since thisoperator operates on the smallest preceding regular expression,@samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}.  So,@samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on.Since the match-zero-or-more operator is a suffix operator, it may beuseless as such when no regular expression precedes it.  This is thecase when it:@itemize @bullet@item is first in a regular expression, or@item follows a match-beginning-of-line, open-group, or alternationoperator.@end itemize@noindentThree different things can happen in these cases:@enumerate@itemIf the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then theregular expression is invalid.@itemIf @code{RE_CONTEXT_INVALID_OPS} isn't set, but@code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents thematch-zero-or-more operator (which then operates on the empty string).

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -