📄 00000002.htm
字号:
<HTML><HEAD> <TITLE>BBS水木清华站∶精华区</TITLE></HEAD><BODY><CENTER><H1>BBS水木清华站∶精华区</H1></CENTER>发信人: cybergene (基因~也许以后~~), 信区: Linux <BR>标 题: New Regular Expression Features in Tcl 8.1 <BR>发信站: BBS 水木清华站 (Thu Dec 14 15:56:19 2000) <BR> <BR> <BR>New Regular Expression Features in Tcl 8.1 <BR> <BR>TclPro Extensions | Wrap TclPro | Compile Tcl | Stub Libraries | Threads <BR> | Windows Extensions | Regular Expressions | I18N <BR> <BR>Tcl 8.1 now handles advanced regular expressions (REs). Previous regular <BR> expression handling is almost unchanged except that clumsy handling <BR>of escapes like \n has been much improved, and a few escapes that were <BR>previously legal (but useless) now won't work. <BR> <BR>Note that a few advanced features aren't useful yet but are ready for <BR>future Tcl releases. That's because Tcl 8.1 (apart from the regular <BR>expression engine) implements only the Unicode locale (where all <BR>characters sort in Unicode order, there are no multi-character collating <BR> elements and no equivalence classes). <BR> <BR>This document has an overview of the new regular expression features. <BR>For exact semantics and more details, see the new re_syntax(n) reference <BR> page. (The re_syntax(n) page was split from the 8.1 regexp(n) reference <BR> page, which used to cover RE syntax for all Tcl commands.) This howto <BR>document covers: <BR> <BR>1. Regular Expression Overview <BR> <BR>What are Regular Expressions? <BR>Regular Expressions in Tcl 8.0 and Before <BR>Overview of regexp and regsub <BR>Backslash Processing <BR>2. Regular Expressions in Tcl 8.1 <BR> <BR>Non-Greedy Quantifiers <BR>Backslash Escapes (\xxx) <BR>Bounds ({}) <BR>Character Classes ([: :]) <BR>Collating Elements ([. .]) <BR>Equivalence Classes ([= =]) <BR>Noncapturing Subpatterns ((?:re)) <BR>Lookahead Assertions ((?=re) and (?!re)) <BR>Switches <BR>Options ((?xyz)), Directors (***) <BR>3. Summary: Regular Expression changes in Tcl 8.1 <BR> <BR>Part 1. Regular Expression Overview <BR>This Part describes regular expressions (REs), explains REs from Tcl 8.0 <BR> and before, and describes the Tcl regexp and regsub commands. Part <BR>Two describes the new Tcl 8.1 REs. <BR> <BR>What are Regular Expressions? <BR>A regular expression, or RE, describes strings of characters (words or <BR>phrases or any arbitrary text). It's a pattern that matches certain <BR>strings and doesn't match others. For example, you could write an RE <BR>to tell you if a string contains a URL (World Wide Web Uniform <BR>Resource Locator, such as <A HREF="http://somehost/somefile.html).">http://somehost/somefile.html).</A> Regular <BR>expressions can be either broad and general or focused and precise. <BR>A regular expression uses metacharacters (characters that assume special <BR> meaning for matching other characters) such as *, [], $ and .. For <BR>example, the RE [Hh]ello!* would match Hello and hello and Hello! (and <BR>hello!!!!!). The RE [Hh](ello|i)!* would match Hello and Hi and Hi! (and <BR> so on). A backslash (\) disables the special meaning of the following <BR>character, so you could match the string [Hello] with the RE \[Hello\]. <BR> <BR> <BR>Regular Expressions in Tcl 8.0 and Before <BR>Regular expressions in Tcl 8.0 and before had the following <BR>metacharacters: . Match any single character (e.g., m.d matches mad, <BR>mod, m3d, etc.) <BR>[] Bracket expression: Match any one of the enclosed characters (e.g., <BR>[a-z0-9_] matches a lowercase ASCII letter, a digit, or an underscore) <BR>^ Start-of-string anchor: Match only at the start of a string (e.g., ^hi <BR> matches hi and his but not this) <BR>$ End-of-string anchor: Match only at the end of a string (e.g., hi$ <BR>matches hi and chi but not this) <BR>* Zero-or-more quantifier: makes the previous part of the RE match <BR>zero or more times (e.g., M.*D matches MD, MAD, MooD, M.D, etc.) <BR>? Zero-or-one quantifier: makes the previous part of the RE match zero <BR>or one time (e.g., hi!? matches hi or hi!) <BR>+ One-or-more quantifier: makes the previous part of the RE match one or <BR> more times (e.g., hi!+ matches hi! or hi!! or hi!!! or ...) <BR>| Alternation (vertical bar): Match just one alternative (e.g., <BR>this|that matches this or that) <BR>() Subpattern: Group part of the RE. Many uses, such as: <BR>Makes a quantifier apply to a group of text (e.g., ([0-9A-F][0-9A-F])+ <BR>matches groups of two hexadecimal digits: A9 or AB03 or 8A6E00, but <BR>not A or A2C). <BR>Set limits for alternation (e.g., "Eat (this|that)!" matches "Eat this!" <BR> or "Eat that!"). <BR>Used for subpattern matching in the regexp and regsub commands. <BR> <BR>\ Escape: Disables meaning of the following metacharacter (e.g., a\.* <BR>matches a or a. or a.. or etc.). Note that \ also has special meaning to <BR> the Tcl interpreter (and to applications, such as C compilers). <BR> <BR> <BR>The syntax above is supported in Tcl 8.1. Tcl 8.1 also supports advanced <BR> regular expressions (AREs). These powerful expressions are introduced <BR>in more detail in Part Two. Briefly, though, AREs support <BR>backreferences, lookahead, non-greedy matching, many escapes, features <BR>that are useful for internationalization (handling collation elements, <BR>equivalence classes and character classes), and much more. <BR> <BR>The Tcl 8.1 regular expression engine almost always interprets 8.0-style <BR> REs correctly. In the few cases that it doesn't, and when the problem <BR>is too difficult to fix, the 8.1 engine has an option to select 8.0 <BR>("ERE") interpretation. <BR> <BR>Overview of regexp and regsub <BR>The Tcl commands regexp and regsub use regular expressions: <BR>regexp compares a string to an RE. It returns a value of 1 if the RE <BR>matches part or all of the string or 0 if there's no match. Optionally, <BR> it stores the matched part of the string in a variable (and also can <BR>store subparts of a string in multiple variables). For example, to <BR>compare the string in $line against the RE [Hh]ello!*, you would write: <BR> <BR>regexp {[Hh]ello!*} $line match <BR>If part or all of the line variable matches the RE, regexp stores the <BR>matching part in the match variable and returns a value of 1. <BR> <BR>regsub substitutes part of a string that matches an RE. For instance, <BR>the following command edits the string from $in_line to replace all <BR>space or tab characters with a single space character; the edited line <BR>is stored in the out_line variable: <BR>regsub -all {[ \t]+} $in_line { } out_line <BR>Please also read the following section about backslash processing. <BR> <BR>Backslash Processing <BR>If you've used Tcl, you probably recognize the \t in the previous <BR>example as a character-entry escape that stands for a tab character. <BR>We actually used the 8.1 syntax above; the example wouldn't have <BR>worked under 8.0! In Tcl 8.0 and before, you had to surround the regular <BR> expression with double quotes so the Tcl backslash processor could <BR>convert the \t to a literal tab character. The square brackets had to be <BR> hidden from the backslash processor by adding backslashes before them, <BR> which made code harder to read and possibly more error-prone. Here's <BR>the previous example rewritten for Tcl 8.0 and before: <BR> <BR>regsub -all "\[ \t\]+" $in_line { } out_line <BR>For more about the simplified 8.1 syntax, see the section Backslash <BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -