⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 xregex.texi

📁 正则表达式库
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
same as @code{print} except omits space@item lower lowercase letters@item printprintable characters (in the @sc{ascii} encoding, space tilde---codes 040 through 0176)@item punctneither control nor alphanumeric characters@item spacespace, carriage return, newline, vertical tab, and form feed@item upperuppercase letters@item xdigithexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F}@end table@noindentThese correspond to the definitions in the C library's @file{<ctype.h>}facility.  For example, @samp{[:alpha:]} corresponds to the standardfacility @code{isalpha}.  Regex recognizes character class expressionsonly inside of lists; so @samp{[[:alpha:]]} matches any letter, but@samp{[:alpha:]} outside of a bracket expression and not followed by arepetition operator matches just itself.@node Range Operator,  , Character Class Operators, List Operators@subsection The Range Operator (@code{-})Regex recognizes @dfn{range expressions} inside a list. They representthose charactersthat fall between two elements in the current collating sequence.  Youform a range expression by putting a @dfn{range operator} between two @ignore(If these get implemented, then substitute this for ``characters.'')of any of the following: characters, collating elements, collating symbols,and equivalence class expressions.  The starting point of the range andthe ending point of the range don't have to be the same kind of item,e.g., the starting point could be a collating element and the endingpoint could be an equivalence class expression.  If a range's endingpoint is an equivalence class, then all the collating elements in thatclass will be in the range.@end ignorecharacters.@footnote{You can't use a character class for the startingor ending point of a range, since a character class is not a singlecharacter.} @samp{-} represents the range operator.  For example,@samp{a-f} within a list represents all the characters from @samp{a}through @samp{f}inclusively.If the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range'sending point collates less than its starting point, the range (and theregular expression containing it) is invalid.  For example, the regularexpression @samp{[z-a]} would be invalid.  If this bit isn't set, thenRegex considers such a range to be empty.Since @samp{-} represents the range operator, if you want to make a@samp{-} character itselfa list item, you must do one of the following:@itemize @bullet@itemPut the @samp{-} either first or last in the list.@itemInclude a range whose starting point collates strictly lower than@samp{-} and whose ending point collates equal or higher.  Unless arange is the first item in a list, a @samp{-} can't be its startingpoint, but @emph{can} be its ending point.  That is because Regexconsiders @samp{-} to be the range operator unless it is preceded byanother @samp{-}.  For example, in the @sc{ascii} encoding, @samp{)},@samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} arecontiguous characters in the collating sequence.  You might think that@samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}.  Rather, ithas the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, soit matches, e.g., @samp{,}, not @samp{.}.@itemPut a range whose starting point is @samp{-} first in the list.@end itemizeFor example, @samp{[-a-z]} matches a lowercase letter or a hyphen (inEnglish, in @sc{ascii}).@node Grouping Operators, Back-reference Operator, List Operators, Common Operators@section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)})@kindex (@kindex )@kindex \(@kindex \)@cindex grouping@cindex subexpressions@cindex parenthesizingA @dfn{group}, also known as a @dfn{subexpression}, consists of an@dfn{open-group operator}, any number of other operators, and a@dfn{close-group operator}.  Regex treats this sequence as a unit, justas mathematics and programming languages treat a parenthesizedexpression as a unit.Therefore, using @dfn{groups}, you can:@itemize @bullet@itemdelimit the argument(s) to an alternation operator (@pxref{AlternationOperator}) or a repetition operator (@pxref{RepetitionOperators}).@item keep track of the indices of the substring that matched a given group.@xref{Using Registers}, for a precise explanation.This lets you:@itemize @bullet@itemuse the back-reference operator (@pxref{Back-reference Operator}).@item use registers (@pxref{Using Registers}).@end itemize@end itemizeIf the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} representsthe open-group operator and @samp{)} represents theclose-group operator; otherwise, @samp{\(} and @samp{\)} do.If the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and aclose-group operator has no matching open-group operator, then Regexconsiders it to match @samp{)}.@node Back-reference Operator, Anchoring Operators, Grouping Operators, Common Operators@section The Back-reference Operator (@dfn{\}@var{digit})@cindex back referencesIf the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizesback references.  A back reference matches a specified preceding group.The back reference operator is represented by @samp{\@var{digit}}anywhere after the end of a regular expression's @w{@var{digit}-th}group (@pxref{Grouping Operators}).@var{digit} must be between @samp{1} and @samp{9}.  The matcher assignsnumbers 1 through 9 to the first nine groups it encounters.  By usingone of @samp{\1} through @samp{\9} after the corresponding group'sclose-group operator, you can match a substring identical to theone that the group does.Back references match according to the following (in all examples below,@samp{(} represents the open-group, @samp{)} the close-group, @samp{@{}the open-interval and @samp{@}} the close-interval operator):@itemize @bullet@itemIf the group matches a substring, the back reference matches anidentical substring.  For example, @samp{(a)\1} matches @samp{aa} and@samp{(bana)na\1bo\1} matches @samp{bananabanabobana}.  Likewise,@samp{(.*)\1} matches any (newline-free if the syntax bit@code{RE_DOT_NEWLINE} isn't set) string that is composed of twoidentical halves; the @samp{(.*)} matches the first half and the@samp{\1} matches the second half.@itemIf the group matches more than once (as it might if followedby, e.g., a repetition operator), then the back reference matches thesubstring the group @emph{last} matched.  For example,@samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (theouter one) matches @samp{aab} and @w{group 2} (the inner one) matches@samp{aa}.  Then @w{group 1} matches @samp{ab} and @w{group 2} matches@samp{a}.  So, @samp{\1} matches @samp{ab} and @samp{\2} matches@samp{a}.@itemIf the group doesn't participate in a match, i.e., it is part of analternative not taken or a repetition operator allows zero repetitionsof it, then the back reference makes the whole match fail.  For example,@samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three}and @samp{two-and-four}, but not @samp{one-and-four} or@samp{two-and-three}.  For example, if the pattern matches@samp{one-and-}, then its @w{group 2} matches the empty string and its@w{group 3} doesn't participate in the match.  So, if it then matches@samp{four}, then when it tries to back reference @w{group 3}---which itwill attempt to do because @samp{\3} follows the @samp{four}---the matchwill fail because @w{group 3} didn't participate in the match.@end itemizeYou can use a back reference as an argument to a repetition operator.  Forexample, @samp{(a(b))\2*} matches @samp{a} followed by two or more@samp{b}s.  Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}.If there is no preceding @w{@var{digit}-th} subexpression, the regularexpression is invalid.@node Anchoring Operators,  , Back-reference Operator, Common Operators@section Anchoring Operators    @cindex anchoring@cindex regexp anchoringThese operators can constrain a pattern to match only at the beginning orend of the entire string or at the beginning or end of a line.@menu* Match-beginning-of-line Operator::  ^* Match-end-of-line Operator::        $@end menu@node Match-beginning-of-line Operator, Match-end-of-line Operator,  , Anchoring Operators@subsection The Match-beginning-of-line Operator (@code{^})@kindex ^@cindex beginning-of-line operator@cindex anchorsThis operator can match the empty string either at the beginning of thestring or after a newline character.  Thus, it is said to @dfn{anchor}the pattern to the beginning of a line.In the cases following, @samp{^} represents this operator.  (Otherwise,@samp{^} is ordinary.)@itemize @bullet@itemIt (the @samp{^}) is first in the pattern, as in @samp{^foo}.@cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})}@itemThe syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outsidea bracket expression.@cindex open-group operator and @samp{^}@cindex alternation operator and @samp{^}@itemIt follows an open-group or alternation operator, as in @samp{a\(^b\)}and @samp{a\|^b}.  @xref{Grouping Operators}, and @ref{AlternationOperator}.@end itemizeThese rules imply that some valid patterns containing @samp{^} cannot bematched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS}is set.@vindex not_bol @r{field in pattern buffer}If the @code{not_bol} field is set in the pattern buffer (@pxref{GNUPattern Buffers}), then @samp{^} fails to match at the beginning of thestring.  @xref{POSIX Matching}, for when you might find this useful.@vindex newline_anchor @r{field in pattern buffer}If the @code{newline_anchor} field is set in the pattern buffer, then@samp{^} fails to match after a newline.  This is useful when you do notregard the string to be matched as broken into lines.@node Match-end-of-line Operator,  , Match-beginning-of-line Operator, Anchoring Operators@subsection The Match-end-of-line Operator (@code{$})@kindex $@cindex end-of-line operator@cindex anchorsThis operator can match the empty string either at the end ofthe string or before a newline character in the string.  Thus, it issaid to @dfn{anchor} the pattern to the end of a line.It is always represented by @samp{$}.  For example, @samp{foo$} usuallymatches, e.g., @samp{foo} and, e.g., the first three characters of@samp{foo\nbar}.Its interaction with the syntax bits and pattern buffer fields isexactly the dual of @samp{^}'s; see the previous section.  (That is,``beginning'' becomes ``end'', ``next'' becomes ``previous'', and``after'' becomes ``before''.)@node GNU Operators, GNU Emacs Operators, Common Operators, Top@chapter GNU OperatorsFollowing are operators that @sc{gnu} defines (and @sc{posix} doesn't).@menu* Word Operators::* Buffer Operators::@end menu@node Word Operators, Buffer Operators,  , GNU Operators@section Word OperatorsThe operators in this section require Regex to recognize parts of words.Regex uses a syntax table to determine whether or not a character ispart of a word, i.e., whether or not it is @dfn{word-constituent}.@menu* Non-Emacs Syntax Tables::* Match-word-boundary Operator::	\b* Match-within-word Operator::		\B* Match-beginning-of-word Operator::	\<* Match-end-of-word Operator::		\>* Match-word-constituent Operator::	\w* Match-non-word-constituent Operator::	\W@end menu@node Non-Emacs Syntax Tables, Match-word-boundary Operator,  , Word Operators@subsection Non-Emacs Syntax Tables    A @dfn{syntax table} is an array indexed by the characters in yourcharacter set.  In the @sc{ascii} encoding, therefore, a syntax tablehas 256 elements.  Regex always uses a @code{char *} variable@code{re_syntax_table} as its syntax table.  In some cases, itinitializes this variable and in others it expects you to initialize it.@itemize @bullet@itemIf Regex is compiled with the preprocessor symbols @code{emacs} and@code{SYNTAX_TABLE} both undefined, then Regex allocates@code{re_syntax_table} and initializes an element @var{i} either to@code{Sword} (which it defines) if @var{i} is a letter, number, or@samp{_}, or to zero if it's not.@itemIf Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}defined, then Regex expects you to define a @code{char *} variable@code{re_syntax_table} to be a valid syntax table.@item@xref{Emacs Syntax Tables}, for what happens when Regex is compiled withthe preprocessor symbol @code{emacs} defined.@end itemize@node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators@subsection The Match-word-boundary Operator (@code{\b})@cindex @samp{\b}@cindex word boundaries, matchingThis operator (represented by @samp{\b}) matches the empty string ateither the beginning or the end of a word.  For example, @samp{\brat\b}matches the separate word @samp{rat}.@node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators@subsection The Match-within-word Operator (@code{\B})@cindex @samp{\B}This operator (represented by @samp{\B}) matches the empty string withina word. For example, @samp{c\Brat\Be} matches @samp{crate}, but@samp{dirty \Brat} doesn't match @samp{dirty rat}.@node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -