📄 regex.texi
字号:
@item nullif the syntax bit @code{RE_DOT_NOT_NULL} is set.@end tableThe @samp{.} (period) character represents this operator. For example,@samp{a.b} matches any three-character string beginning with @samp{a}and ending with @samp{b}.@node Concatenation Operator, Repetition Operators, Match-any-character Operator, Common Operators@section The Concatenation OperatorThis operator concatenates two regular expressions @var{a} and @var{b}.No character represents this operator; you simply put @var{b} after@var{a}. The result is a regular expression that will match a string if@var{a} matches its first part and @var{b} matches the rest. Forexample, @samp{xy} (two match-self operators) matches @samp{xy}.@node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators@section Repetition Operators Repetition operators repeat the preceding regular expression a specifiednumber of times.@menu* Match-zero-or-more Operator:: ** Match-one-or-more Operator:: +* Match-zero-or-one Operator:: ?* Interval Operators:: @{@}@end menu@node Match-zero-or-more Operator, Match-one-or-more Operator, , Repetition Operators@subsection The Match-zero-or-more Operator (@code{*})@cindex @samp{*}This operator repeats the smallest possible preceding regular expressionas many times as necessary (including zero) to match the pattern.@samp{*} represents this operator. For example, @samp{o*}matches any string made up of zero or more @samp{o}s. Since thisoperator operates on the smallest preceding regular expression,@samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}. So,@samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on.Since the match-zero-or-more operator is a suffix operator, it may beuseless as such when no regular expression precedes it. This is thecase when it:@itemize @bullet@item is first in a regular expression, or@item follows a match-beginning-of-line, open-group, or alternationoperator.@end itemize@noindentThree different things can happen in these cases:@enumerate@itemIf the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then theregular expression is invalid.@itemIf @code{RE_CONTEXT_INVALID_OPS} isn't set, but@code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents thematch-zero-or-more operator (which then operates on the empty string).@itemOtherwise, @samp{*} is ordinary.@end enumerate@cindex backtrackingThe matcher processes a match-zero-or-more operator by first matching asmany repetitions of the smallest preceding regular expression as it can.Then it continues to match the rest of the pattern. If it can't match the rest of the pattern, it backtracks (as many timesas necessary), each time discarding one of the matches until it caneither match the entire pattern or be certain that it cannot get amatch. For example, when matching @samp{ca*ar} against @samp{caaar},the matcher first matches all three @samp{a}s of the string with the@samp{a*} of the regular expression. However, it cannot then match thefinal @samp{ar} of the regular expression against the final @samp{r} ofthe string. So it backtracks, discarding the match of the last @samp{a}in the string. It can then match the remaining @samp{ar}.@node Match-one-or-more Operator, Match-zero-or-one Operator, Match-zero-or-more Operator, Repetition Operators@subsection The Match-one-or-more Operator (@code{+} or @code{\+})@cindex @samp{+} If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognizethis operator. Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn'tset, then @samp{+} represents this operator; if it is, then @samp{\+}does.This operator is similar to the match-zero-or-more operator except thatit repeats the preceding regular expression at least once;@pxref{Match-zero-or-more Operator}, for what it operates on, how somesyntax bits affect it, and how Regex backtracks to match it.For example, supposing that @samp{+} represents the match-one-or-moreoperator; then @samp{ca+r} matches, e.g., @samp{car} and@samp{caaaar}, but not @samp{cr}.@node Match-zero-or-one Operator, Interval Operators, Match-one-or-more Operator, Repetition Operators@subsection The Match-zero-or-one Operator (@code{?} or @code{\?})@cindex @samp{?}If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn'trecognize this operator. Otherwise, if the syntax bit@code{RE_BK_PLUS_QM} isn't set, then @samp{?} represents this operator;if it is, then @samp{\?} does.This operator is similar to the match-zero-or-more operator except thatit repeats the preceding regular expression once or not at all;@pxref{Match-zero-or-more Operator}, to see what it operates on, howsome syntax bits affect it, and how Regex backtracks to match it.For example, supposing that @samp{?} represents the match-zero-or-oneoperator; then @samp{ca?r} matches both @samp{car} and @samp{cr}, butnothing else.@node Interval Operators, , Match-zero-or-one Operator, Repetition Operators@subsection Interval Operators (@code{@{} @dots{} @code{@}} or @code{\@{} @dots{} @code{\@}})@cindex interval expression@cindex @samp{@{}@cindex @samp{@}}@cindex @samp{\@{}@cindex @samp{\@}}If the syntax bit @code{RE_INTERVALS} is set, then Regex recognizes@dfn{interval expressions}. They repeat the smallest possible precedingregular expression a specified number of times.If the syntax bit @code{RE_NO_BK_BRACES} is set, @samp{@{} representsthe @dfn{open-interval operator} and @samp{@}} represents the@dfn{close-interval operator} ; otherwise, @samp{\@{} and @samp{\@}} do.Specifically, supposing that @samp{@{} and @samp{@}} represent theopen-interval and close-interval operators; then:@table @code@item @{@var{count}@}matches exactly @var{count} occurrences of the preceding regularexpression.@item @{@var{min,}@}matches @var{min} or more occurrences of the preceding regularexpression.@item @{@var{min, max}@}matches at least @var{min} but no more than @var{max} occurrences ofthe preceding regular expression.@end tableThe interval expression (but not necessarily the regular expression thatcontains it) is invalid if:@itemize @bullet@item@var{min} is greater than @var{max}, or @itemany of @var{count}, @var{min}, or @var{max} are outside the rangezero to @code{RE_DUP_MAX} (which symbol @file{regex.h}defines).@end itemizeIf the interval expression is invalid and the syntax bit@code{RE_NO_BK_BRACES} is set, then Regex considers all thecharacters in the would-be interval to be ordinary. If that bitisn't set, then the regular expression is invalid.If the interval expression is valid but there is no preceding regularexpression on which to operate, then if the syntax bit@code{RE_CONTEXT_INVALID_OPS} is set, the regular expression is invalid.If that bit isn't set, then Regex considers all the characters---otherthan backslashes, which it ignores---in the would-be interval to beordinary.@node Alternation Operator, List Operators, Repetition Operators, Common Operators@section The Alternation Operator (@code{|} or @code{\|})@kindex |@kindex \|@cindex alternation operator@cindex or operatorIf the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn'trecognize this operator. Otherwise, if the syntax bit@code{RE_NO_BK_VBAR} is set, then @samp{|} represents this operator;otherwise, @samp{\|} does.Alternatives match one of a choice of regular expressions:if you put the character(s) representing the alternation operator betweenany two regular expressions @var{a} and @var{b}, the result matchesthe union of the strings that @var{a} and @var{b} match. Forexample, supposing that @samp{|} is the alternation operator, then@samp{foo|bar|quux} would match any of @samp{foo}, @samp{bar} or@samp{quux}.@ignore@c Nobody needs to disallow empty alternatives any more.If the syntax bit @code{RE_NO_EMPTY_ALTS} is set, then if either of the regularexpressions @var{a} or @var{b} is empty, theregular expression is invalid. More precisely, if this syntax bit isset, then the alternation operator can't:@itemize @bullet@itembe first or last in a regular expression;@itemfollow either another alternation operator or an open-group operator(@pxref{Grouping Operators}); or@itemprecede a close-group operator.@end itemize@noindentFor example, supposing @samp{(} and @samp{)} represent the open andclose-group operators, then @samp{|foo}, @samp{foo|}, @samp{foo||bar},@samp{foo(|bar)}, and @samp{(foo|)bar} would all be invalid.@end ignoreThe alternation operator operates on the @emph{largest} possiblesurrounding regular expressions. (Put another way, it has the lowestprecedence of any regular expression operator.)Thus, the only way you candelimit its arguments is to use grouping. For example, if @samp{(} and@samp{)} are the open and close-group operators, then @samp{fo(o|b)ar}would match either @samp{fooar} or @samp{fobar}. (@samp{foo|bar} wouldmatch @samp{foo} or @samp{bar}.)@cindex backtrackingThe matcher usually tries all combinations of alternatives so as to match the longest possible string. For example, when matching@samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannottake, say, the first (``depth-first'') combination it could match, sincethen it would be content to match just @samp{fooqbar}. @comment xx something about leftmost-longest@node List Operators, Grouping Operators, Alternation Operator, Common Operators@section List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})@cindex matching list@cindex @samp{[}@cindex @samp{]}@cindex @samp{^}@cindex @samp{-}@cindex @samp{\}@cindex @samp{[^}@cindex nonmatching list@cindex matching newline@cindex bracket expression@dfn{Lists}, also called @dfn{bracket expressions}, are a set of one ormore items. An @dfn{item} is a character,@ignore(These get added when they get implemented.)a collating symbol, an equivalence class expression, @end ignorea character class expression, or a range expression. The syntax bitsaffect which kinds of items you can put in a list. We explain the lasttwo items in subsections below. Empty lists are invalid.A @dfn{matching list} matches a single character represented by one ofthe list items. You form a matching list by enclosing one or more itemswithin an @dfn{open-matching-list operator} (represented by @samp{[})and a @dfn{close-list operator} (represented by @samp{]}). For example, @samp{[ab]} matches either @samp{a} or @samp{b}.@samp{[ad]*} matches the empty string and any string composed of just@samp{a}s and @samp{d}s in any order. Regex considers invalid a regularexpression with a @samp{[} but no matching@samp{]}.@dfn{Nonmatching lists} are similar to matching lists except that theymatch a single character @emph{not} represented by one of the listitems. You use an @dfn{open-nonmatching-list operator} (represented by@samp{[^}@footnote{Regex therefore doesn't consider the @samp{^} to bethe first character in the list. If you put a @samp{^} character firstin (what you think is) a matching list, you'll turn it into anonmatching list.}) instead of an open-matching-list operator to start anonmatching list. For example, @samp{[^ab]} matches any character except @samp{a} or@samp{b}. If the @code{posix_newline} field in the pattern buffer (@pxref{GNUPattern Buffers} is set, then nonmatching lists do not match a newline.Most characters lose any special meaning inside a list. The specialcharacters inside a list follow.@table @samp@item ]ends the list if it's not the first list item. So, if you want to makethe @samp{]} character a list item, you must put it first.@item \quotes the next character if the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} isset.@ignorePut these in if they get implemented.@item [.represents the open-collating-symbol operator (@pxref{Collating SymbolOperators}).@item .]represents the close-collating-symbol operator.@item [=represents the open-equivalence-class operator (@pxref{Equivalence ClassOperators}).@item =]represents the close-equivalence-class operator.@end ignore@item [:represents the open-character-class operator (@pxref{Character ClassOperators}) if the syntax bit @code{RE_CHAR_CLASSES} is set and whatfollows is a valid character class expression.@item :]represents the close-character-class operator if the syntax bit@code{RE_CHAR_CLASSES} is set and what precedes it is anopen-character-class operator followed by a valid character class name.@item - represents the range operator (@pxref{Range Operator}) if it'snot first or last in a list or the ending point of a range.@end table@noindentAll other characters are ordinary. For example, @samp{[.*]} matches @samp{.} and @samp{*}.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -