📄 regex.texi
字号:
@menu* Character Class Operators:: [:class:]* Range Operator:: start-end@end menu@ignore(If collating symbols and equivalence class expressions get implemented,then add this.)node Collating Symbol Operatorssubsubsection Collating Symbol Operators (@code{[.} @dots{} @code{.]})If the syntax bit @code{XX} is set, then you can representcollating symbols inside lists. You form a @dfn{collating symbol} byputting a collating element between an @dfn{open-collating-symboloperator} and an @dfn{close-collating-symbol operator}. @samp{[.}represents the open-collating-symbol operator and @samp{.]} representsthe close-collating-symbol operator. For example, if @samp{ll} is acollating element, then @samp{[[.ll.]]} would match @samp{ll}.node Equivalence Class Operatorssubsubsection Equivalence Class Operators (@code{[=} @dots{} @code{=]})@cindex equivalence class expression in regex@cindex @samp{[=} in regex@cindex @samp{=]} in regexIf the syntax bit @code{XX} is set, then Regex recognizes equivalence classexpressions inside lists. A @dfn{equivalence class expression} is a setof collating elements which all belong to the same equivalence class.You form an equivalence class expression by putting a collatingelement between an @dfn{open-equivalence-class operator} and a@dfn{close-equivalence-class operator}. @samp{[=} represents theopen-equivalence-class operator and @samp{=]} represents theclose-equivalence-class operator. For example, if @samp{a} and @samp{A}were an equivalence class, then both @samp{[[=a=]]} and @samp{[[=A=]]}would match both @samp{a} and @samp{A}. If the collating element in anequivalence class expression isn't part of an equivalence class, thenthe matcher considers the equivalence class expression to be a collatingsymbol.@end ignore@node Character Class Operators, Range Operator, , List Operators@subsection Character Class Operators (@code{[:} @dots{} @code{:]})@cindex character classes@cindex @samp{[:} in regex@cindex @samp{:]} in regexIf the syntax bit @code{RE_CHARACTER_CLASSES} is set, then Regexrecognizes character class expressions inside lists. A @dfn{characterclass expression} matches one character from a given class. You form acharacter class expression by putting a character class name between an@dfn{open-character-class operator} (represented by @samp{[:}) and a@dfn{close-character-class operator} (represented by @samp{:]}). Thecharacter class names and their meanings are:@table @code@item alnum letters and digits@item alphaletters@item blanksystem-dependent; for @sc{gnu}, a space or tab@item cntrlcontrol characters (in the @sc{ascii} encoding, code 0177 and codesless than 040)@item digitdigits@item graphsame as @code{print} except omits space@item lower lowercase letters@item printprintable characters (in the @sc{ascii} encoding, space tilde---codes 040 through 0176)@item punctneither control nor alphanumeric characters@item spacespace, carriage return, newline, vertical tab, and form feed@item upperuppercase letters@item xdigithexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F}@end table@noindentThese correspond to the definitions in the C library's @file{<ctype.h>}facility. For example, @samp{[:alpha:]} corresponds to the standardfacility @code{isalpha}. Regex recognizes character class expressionsonly inside of lists; so @samp{[[:alpha:]]} matches any letter, but@samp{[:alpha:]} outside of a bracket expression and not followed by arepetition operator matches just itself.@node Range Operator, , Character Class Operators, List Operators@subsection The Range Operator (@code{-})Regex recognizes @dfn{range expressions} inside a list. They representthose charactersthat fall between two elements in the current collating sequence. Youform a range expression by putting a @dfn{range operator} between two @ignore(If these get implemented, then substitute this for ``characters.'')of any of the following: characters, collating elements, collating symbols,and equivalence class expressions. The starting point of the range andthe ending point of the range don't have to be the same kind of item,e.g., the starting point could be a collating element and the endingpoint could be an equivalence class expression. If a range's endingpoint is an equivalence class, then all the collating elements in thatclass will be in the range.@end ignorecharacters.@footnote{You can't use a character class for the startingor ending point of a range, since a character class is not a singlecharacter.} @samp{-} represents the range operator. For example,@samp{a-f} within a list represents all the characters from @samp{a}through @samp{f}inclusively.If the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range'sending point collates less than its starting point, the range (and theregular expression containing it) is invalid. For example, the regularexpression @samp{[z-a]} would be invalid. If this bit isn't set, thenRegex considers such a range to be empty.Since @samp{-} represents the range operator, if you want to make a@samp{-} character itselfa list item, you must do one of the following:@itemize @bullet@itemPut the @samp{-} either first or last in the list.@itemInclude a range whose starting point collates strictly lower than@samp{-} and whose ending point collates equal or higher. Unless arange is the first item in a list, a @samp{-} can't be its startingpoint, but @emph{can} be its ending point. That is because Regexconsiders @samp{-} to be the range operator unless it is preceded byanother @samp{-}. For example, in the @sc{ascii} encoding, @samp{)},@samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} arecontiguous characters in the collating sequence. You might think that@samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}. Rather, ithas the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, soit matches, e.g., @samp{,}, not @samp{.}.@itemPut a range whose starting point is @samp{-} first in the list.@end itemizeFor example, @samp{[-a-z]} matches a lowercase letter or a hyphen (inEnglish, in @sc{ascii}).@node Grouping Operators, Back-reference Operator, List Operators, Common Operators@section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)})@kindex (@kindex )@kindex \(@kindex \)@cindex grouping@cindex subexpressions@cindex parenthesizingA @dfn{group}, also known as a @dfn{subexpression}, consists of an@dfn{open-group operator}, any number of other operators, and a@dfn{close-group operator}. Regex treats this sequence as a unit, justas mathematics and programming languages treat a parenthesizedexpression as a unit.Therefore, using @dfn{groups}, you can:@itemize @bullet@itemdelimit the argument(s) to an alternation operator (@pxref{AlternationOperator}) or a repetition operator (@pxref{RepetitionOperators}).@item keep track of the indices of the substring that matched a given group.@xref{Using Registers}, for a precise explanation.This lets you:@itemize @bullet@itemuse the back-reference operator (@pxref{Back-reference Operator}).@item use registers (@pxref{Using Registers}).@end itemize@end itemizeIf the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} representsthe open-group operator and @samp{)} represents theclose-group operator; otherwise, @samp{\(} and @samp{\)} do.If the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and aclose-group operator has no matching open-group operator, then Regexconsiders it to match @samp{)}.@node Back-reference Operator, Anchoring Operators, Grouping Operators, Common Operators@section The Back-reference Operator (@dfn{\}@var{digit})@cindex back referencesIf the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizesback references. A back reference matches a specified preceding group.The back reference operator is represented by @samp{\@var{digit}}anywhere after the end of a regular expression's @w{@var{digit}-th}group (@pxref{Grouping Operators}).@var{digit} must be between @samp{1} and @samp{9}. The matcher assignsnumbers 1 through 9 to the first nine groups it encounters. By usingone of @samp{\1} through @samp{\9} after the corresponding group'sclose-group operator, you can match a substring identical to theone that the group does.Back references match according to the following (in all examples below,@samp{(} represents the open-group, @samp{)} the close-group, @samp{@{}the open-interval and @samp{@}} the close-interval operator):@itemize @bullet@itemIf the group matches a substring, the back reference matches anidentical substring. For example, @samp{(a)\1} matches @samp{aa} and@samp{(bana)na\1bo\1} matches @samp{bananabanabobana}. Likewise,@samp{(.*)\1} matches any (newline-free if the syntax bit@code{RE_DOT_NEWLINE} isn't set) string that is composed of twoidentical halves; the @samp{(.*)} matches the first half and the@samp{\1} matches the second half.@itemIf the group matches more than once (as it might if followedby, e.g., a repetition operator), then the back reference matches thesubstring the group @emph{last} matched. For example,@samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (theouter one) matches @samp{aab} and @w{group 2} (the inner one) matches@samp{aa}. Then @w{group 1} matches @samp{ab} and @w{group 2} matches@samp{a}. So, @samp{\1} matches @samp{ab} and @samp{\2} matches@samp{a}.@itemIf the group doesn't participate in a match, i.e., it is part of analternative not taken or a repetition operator allows zero repetitionsof it, then the back reference makes the whole match fail. For example,@samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three}and @samp{two-and-four}, but not @samp{one-and-four} or@samp{two-and-three}. For example, if the pattern matches@samp{one-and-}, then its @w{group 2} matches the empty string and its@w{group 3} doesn't participate in the match. So, if it then matches@samp{four}, then when it tries to back reference @w{group 3}---which itwill attempt to do because @samp{\3} follows the @samp{four}---the matchwill fail because @w{group 3} didn't participate in the match.@end itemizeYou can use a back reference as an argument to a repetition operator. Forexample, @samp{(a(b))\2*} matches @samp{a} followed by two or more@samp{b}s. Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}.If there is no preceding @w{@var{digit}-th} subexpression, the regularexpression is invalid.@node Anchoring Operators, , Back-reference Operator, Common Operators@section Anchoring Operators @cindex anchoring@cindex regexp anchoringThese operators can constrain a pattern to match only at the beginning orend of the entire string or at the beginning or end of a line.@menu* Match-beginning-of-line Operator:: ^* Match-end-of-line Operator:: $@end menu@node Match-beginning-of-line Operator, Match-end-of-line Operator, , Anchoring Operators@subsection The Match-beginning-of-line Operator (@code{^})@kindex ^@cindex beginning-of-line operator@cindex anchorsThis operator can match the empty string either at the beginning of thestring or after a newline character. Thus, it is said to @dfn{anchor}the pattern to the beginning of a line.In the cases following, @samp{^} represents this operator. (Otherwise,@samp{^} is ordinary.)@itemize @bullet@itemIt (the @samp{^}) is first in the pattern, as in @samp{^foo}.@cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})}@itemThe syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outsidea bracket expression.@cindex open-group operator and @samp{^}@cindex alternation operator and @samp{^}@itemIt follows an open-group or alternation operator, as in @samp{a\(^b\)}and @samp{a\|^b}. @xref{Grouping Operators}, and @ref{AlternationOperator}.@end itemizeThese rules imply that some valid patterns containing @samp{^} cannot bematched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS}is set.@vindex not_bol @r{field in pattern buffer}If the @code{not_bol} field is set in the pattern buffer (@pxref{GNUPattern Buffers}), then @samp{^} fails to match at the beginning of thestring. @xref{POSIX Matching}, for when you might find this useful.@vindex newline_anchor @r{field in pattern buffer}If the @code{newline_anchor} field is set in the pattern buffer, then@samp{^} fails to match after a newline. This is useful when you do notregard the string to be matched as broken into lines.@node Match-end-of-line Operator, , Match-beginning-of-line Operator, Anchoring Operators@subsection The Match-end-of-line Operator (@code{$})@kindex $@cindex end-of-line operator@cindex anchorsThis operator can match the empty string either at the end ofthe string or before a newline character in the string. Thus, it issaid to @dfn{anchor} the pattern to the end of a line.It is always represented by @samp{$}. For example, @samp{foo$} usually
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -