📄 re_syntax.n
字号:
'\"'\" Copyright (c) 1998 Sun Microsystems, Inc.'\" Copyright (c) 1999 Scriptics Corporation'\"'\" See the file "license.terms" for information on usage and redistribution'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.'\" '\" RCS: @(#) $Id: re_syntax.n,v 1.3 1999/07/14 19:09:36 jpeek Exp $'\".so man.macros.TH re_syntax n "8.1" Tcl "Tcl Built-In Commands".BS.SH NAMEre_syntax \- Syntax of Tcl regular expressions..BE.SH DESCRIPTION.PPA \fIregular expression\fR describes strings of characters.It's a pattern that matches certain strings and doesn't match others..SH "DIFFERENT FLAVORS OF REs"Regular expressions (``RE''s), as defined by POSIX, come in twoflavors: \fIextended\fR REs (``EREs'') and \fIbasic\fR REs (``BREs'').EREs are roughly those of the traditional \fIegrep\fR, while BREs areroughly those of the traditional \fIed\fR. This implementation addsa third flavor, \fIadvanced\fR REs (``AREs''), basically EREs withsome significant extensions..PPThis manual page primarily describes AREs. BREs mostly exist forbackward compatibility in some old programs; they will be discussed atthe end. POSIX EREs are almost an exact subset of AREs. Features ofAREs that are not present in EREs will be indicated..SH "REGULAR EXPRESSION SYNTAX".PPTcl regular expressions are implemented using the package written byHenry Spencer, based on the 1003.2 spec and some (not quite all) ofthe Perl5 extensions (thanks, Henry!). Much of the description ofregular expressions below is copied verbatim from his manual entry..PPAn ARE is one or more \fIbranches\fR,separated by `\fB|\fR',matching anything that matches any of the branches..PPA branch is zero or more \fIconstraints\fR or \fIquantified atoms\fR,concatenated.It matches a match for the first, followed by a match for the second, etc;an empty branch matches the empty string..PPA quantified atom is an \fIatom\fR possibly followedby a single \fIquantifier\fR.Without a quantifier, it matches a match for the atom.The quantifiers,and what a so-quantified atom matches, are:.RS 2.TP 6\fB*\fRa sequence of 0 or more matches of the atom.TP\fB+\fRa sequence of 1 or more matches of the atom.TP\fB?\fRa sequence of 0 or 1 matches of the atom.TP\fB{\fIm\fB}\fRa sequence of exactly \fIm\fR matches of the atom.TP\fB{\fIm\fB,}\fRa sequence of \fIm\fR or more matches of the atom.TP\fB{\fIm\fB,\fIn\fB}\fRa sequence of \fIm\fR through \fIn\fR (inclusive) matches of the atom;\fIm\fR may not exceed \fIn\fR.TP\fB*? +? ?? {\fIm\fB}? {\fIm\fB,}? {\fIm\fB,\fIn\fB}?\fR\fInon-greedy\fR quantifiers,which match the same possibilities,but prefer the smallest number rather than the largest numberof matches (see MATCHING).RE.PPThe forms using\fB{\fR and \fB}\fRare known as \fIbound\fRs.The numbers\fIm\fR and \fIn\fR are unsigned decimal integerswith permissible values from 0 to 255 inclusive..PPAn atom is one of:.RS 2.TP 6\fB(\fIre\fB)\fR(where \fIre\fR is any regular expression)matches a match for\fIre\fR, with the match noted for possible reporting.TP\fB(?:\fIre\fB)\fRas previous,but does no reporting(a ``non-capturing'' set of parentheses).TP\fB()\fRmatches an empty string,noted for possible reporting.TP\fB(?:)\fRmatches an empty string,without reporting.TP\fB[\fIchars\fB]\fRa \fIbracket expression\fR,matching any one of the \fIchars\fR (see BRACKET EXPRESSIONS for more detail).TP \fB.\fRmatches any single character.TP\fB\e\fIk\fR(where \fIk\fR is a non-alphanumeric character)matches that character taken as an ordinary character,e.g. \e\e matches a backslash character.TP\fB\e\fIc\fRwhere \fIc\fR is alphanumeric(possibly followed by other characters),an \fIescape\fR (AREs only),see ESCAPES below.TP\fB{\fRwhen followed by a character other than a digit,matches the left-brace character `\fB{\fR';when followed by a digit, it is the beginning of a\fIbound\fR (see above).TP\fIx\fRwhere \fIx\fR isa single character with no other significance, matches that character..RE.PPA \fIconstraint\fR matches an empty string when specific conditionsare met.A constraint may not be followed by a quantifier.The simple constraints are as follows; some more constraints aredescribed later, under ESCAPES..RS 2.TP 8\fB^\fRmatches at the beginning of a line.TP\fB$\fRmatches at the end of a line.TP\fB(?=\fIre\fB)\fR\fIpositive lookahead\fR (AREs only), matches at any pointwhere a substring matching \fIre\fR begins.TP\fB(?!\fIre\fB)\fR\fInegative lookahead\fR (AREs only), matches at any pointwhere no substring matching \fIre\fR begins.RE.PPThe lookahead constraints may not contain back references (see later),and all parentheses within them are considered non-capturing..PPAn RE may not end with `\fB\e\fR'..SH "BRACKET EXPRESSIONS"A \fIbracket expression\fR is a list of characters enclosed in `\fB[\|]\fR'.It normally matches any single character from the list (but see below).If the list begins with `\fB^\fR',it matches any single character(but see below) \fInot\fR from the rest of the list..PPIf two characters in the list are separated by `\fB\-\fR',this is shorthandfor the full \fIrange\fR of characters between those two (inclusive) in thecollating sequence,e.g.\fB[0\-9]\fRin ASCII matches any decimal digit.Two ranges may not share anendpoint, so e.g.\fBa\-c\-e\fRis illegal.Ranges are very collating-sequence-dependent,and portable programs should avoid relying on them..PPTo include a literal\fB]\fRor\fB\-\fRin the list,the simplest method is toenclose it in\fB[.\fR and \fB.]\fRto make it a collating element (see below).Alternatively,make it the first character(following a possible `\fB^\fR'),or (AREs only) precede it with `\fB\e\fR'.Alternatively, for `\fB\-\fR',make it the last character,or the second endpoint of a range.To use a literal\fB\-\fRas the first endpoint of a range,make it a collating elementor (AREs only) precede it with `\fB\e\fR'.With the exception of these, some combinations using\fB[\fR(see nextparagraphs), and escapes,all other special characters lose theirspecial significance within a bracket expression..PPWithin a bracket expression, a collating element (a character,a multi-character sequence that collates as if it were a single character,or a collating-sequence name for either)enclosed in\fB[.\fR and \fB.]\fRstands for thesequence of characters of that collating element.The sequence is a single element of the bracket expression's list.A bracket expression in a locale that hasmulti-character collating elementscan thus match more than one character..VS 8.2So (insidiously), a bracket expression that starts with \fB^\fRcan match multi-character collating elements even if none of themappear in the bracket expression!(\fINote:\fR Tcl currently has no multi-character collating elements.This information is only for illustration.).PPFor example, assume the collating sequence includes a \fBch\fRmulti-character collating element.Then the RE \fB[[.ch.]]*c\fR (zero or more \fBch\fP's followed by \fBc\fP)matches the first five characters of `\fBchchcc\fR'.Also, the RE \fB[^c]b\fR matches all of `\fBchb\fR'(because \fB[^c]\fR matches the multi-character \fBch\fR)..VE 8.2.PPWithin a bracket expression, a collating element enclosed in\fB[=\fRand\fB=]\fRis an equivalence class, standing for the sequences of charactersof all collating elements equivalent to that one, including itself.(If there are no other equivalent collating elements,the treatment is as if the enclosing delimiters were `\fB[.\fR'\&and `\fB.]\fR'.)For example, if\fBo\fRand\fB\o'o^'\fRare the members of an equivalence class,then `\fB[[=o=]]\fR', `\fB[[=\o'o^'=]]\fR',and `\fB[o\o'o^']\fR'\&are all synonymous.An equivalence class may not be an endpointof a range..VS 8.2(\fINote:\fR Tcl currently implements only the Unicode locale.It doesn't define any equivalence classes.The examples above are just illustrations.).VE 8.2.PPWithin a bracket expression, the name of a \fIcharacter class\fR enclosedin\fB[:\fRand\fB:]\fRstands for the list of all characters(not all collating elements!)belonging to thatclass.Standard character classes are:.PP.RS.ne 5.nf.ta 3c\fBalpha\fR A letter. \fBupper\fR An upper-case letter. \fBlower\fR A lower-case letter. \fBdigit\fR A decimal digit. \fBxdigit\fR A hexadecimal digit. \fBalnum\fR An alphanumeric (letter or digit). \fBprint\fR An alphanumeric (same as alnum).\fBblank\fR A space or tab character.\fBspace\fR A character producing white space in displayed text. \fBpunct\fR A punctuation character. \fBgraph\fR A character with a visible representation. \fBcntrl\fR A control character. .fi.RE.PPA locale may provide others..VS 8.2(Note that the current Tcl implementation has only one locale:the Unicode locale.).VE 8.2A character class may not be used as an endpoint of a range..PPThere are two special cases of bracket expressions:the bracket expressions\fB[[:<:]]\fRand\fB[[:>:]]\fRare constraints, matching empty strings atthe beginning and end of a word respectively.'\" note, discussion of escapes below references this definition of wordA word is defined as a sequence ofword charactersthat is neither preceded nor followed byword characters.A word character is an\fIalnum\fRcharacteror an underscore(\fB_\fR).These special bracket expressions are deprecated;users of AREs should use constraint escapes instead (see below)..SH ESCAPESEscapes (AREs only), which begin with a\fB\e\fRfollowed by an alphanumeric character,come in several varieties:character entry, class shorthands, constraint escapes, and back references.A\fB\e\fRfollowed by an alphanumeric character but not constitutinga valid escape is illegal in AREs.In EREs, there are no escapes:outside a bracket expression,a\fB\e\fRfollowed by an alphanumeric character merely stands for thatcharacter as an ordinary character,and inside a bracket expression,\fB\e\fRis an ordinary character.(The latter is the one actual incompatibility between EREs and AREs.).PPCharacter-entry escapes (AREs only) exist to make it easier to specifynon-printing and otherwise inconvenient characters in REs:.RS 2.TP 5\fB\ea\fRalert (bell) character, as in C.TP\fB\eb\fRbackspace, as in C.TP\fB\eB\fRsynonym for\fB\e\fRto help reduce backslash doubling in someapplications where there are multiple levels of backslash processing.TP\fB\ec\fIX\fR(where X is any character) the character whoselow-order 5 bits are the same as those of\fIX\fR,and whose other bits are all zero.TP\fB\ee\fRthe character whose collating-sequence nameis `\fBESC\fR',or failing that, the character with octal value 033.TP\fB\ef\fRformfeed, as in C.TP\fB\en\fRnewline, as in C.TP\fB\er\fRcarriage return, as in C.TP\fB\et\fRhorizontal tab, as in C.TP\fB\eu\fIwxyz\fR(where\fIwxyz\fRis exactly four hexadecimal digits)the Unicode character\fBU+\fIwxyz\fRin the local byte ordering.TP\fB\eU\fIstuvwxyz\fR(where\fIstuvwxyz\fRis exactly eight hexadecimal digits)reserved for a somewhat-hypothetical Unicode extension to 32 bits.TP\fB\ev\fRvertical tab, as in Care all available..TP\fB\ex\fIhhh\fR(where\fIhhh\fRis any sequence of hexadecimal digits)the character whose hexadecimal value is\fB0x\fIhhh\fR(a single character no matter how many hexadecimal digits are used)..TP\fB\e0\fRthe character whose value is\fB0\fR.TP\fB\e\fIxy\fR(where\fIxy\fRis exactly two octal digits,and is not a\fIback reference\fR (see below))the character whose octal value is\fB0\fIxy\fR.TP\fB\e\fIxyz\fR(where\fIxyz\fRis exactly three octal digits,and is not aback reference (see below))the character whose octal value is\fB0\fIxyz\fR.RE.PPHexadecimal digits are `\fB0\fR'-`\fB9\fR', `\fBa\fR'-`\fBf\fR',and `\fBA\fR'-`\fBF\fR'.Octal digits are `\fB0\fR'-`\fB7\fR'..PPThe character-entry escapes are always taken as ordinary characters.For example,\fB\e135\fRis\fB]\fRin ASCII,but\fB\e135\fRdoes not terminate a bracket expression.Beware, however, that some applications (e.g., C compilers) interpret such sequences themselves before the regular-expression packagegets to see them, which may require doubling (quadrupling, etc.) the `\fB\e\fR'..PPClass-shorthand escapes (AREs only) provide shorthands for certain commonly-usedcharacter classes:.RS 2.TP 10\fB\ed\fR\fB[[:digit:]]\fR.TP\fB\es\fR\fB[[:space:]]\fR.TP\fB\ew\fR\fB[[:alnum:]_]\fR(note underscore).TP\fB\eD\fR\fB[^[:digit:]]\fR.TP
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -