📄 75.html
字号:
<HTML><TITLE>Regexp and Regsub: Use Parentheses to Build more Complicated Patterns</TITLE><BODY BGCOLOR="#FFF0E0" VLINK="#0FBD0F" TEXT="#101000" LINK="#0F0FDD">
<A NAME="top"><H1>Use Parentheses to Build more Complicated Patterns</H1></A>
<P> Now to change the rules in a way that lets more complicated
regular expressions be written:
<DL><DD> <CITE> A quasichar may be replaced with an entire pattern if that
pattern is placed inside parentheses and the resulting overall pattern does
not apply a repeater to a pattern that can match an empty string.
<P> </CITE> </DL>
<P> In the previous section, we built regular expressions from quasichars,
anchors, repeaters, and branches. The rules we gave for those regular
expressions did not really require that quasichars only match single
characters. That just made the rules easier to explain. All that mattered
was that a quasichar could be tested to see if it matches a substring
beginning at a definite place. A pattern, too, can be tested to see if it
matches a substring beginning at a definite place. So, there is no reason not
to let quasichars be patterns.
<P> Therefore, we do let quasichars be patterns but we insist that such
quasichar patterns be surrounded with parentheses to keep things unambiguous.
<P> Explaining why a quasichar pattern that matches an emptyf string cannot have
a repeater operand after it is more difficult. After all, the theory says that
the <TT>*</TT> repeater is idempotent which should mean that <TT>a**</TT> is the same as <TT>a*.</TT>
Why then should the practice forbid <TT>a**</TT> or <TT>(a*)*</TT>? I have not looked
at the code to see why but I suppose it has something to do with avoiding infinite
recursion or an infinite loop. Whatever the reason, theory and practice
differ here. However, the divergence is not very consequential.
<P> Now for an example. Consider this,
<PRE>
x*
</PRE>
which matches zero or more copies of the letter <TT>x</TT> and this,
<PRE>
cat|dog
</PRE>
which matches "cat" or "dog." If we replace the quasichar <TT>x</TT> with
the pattern in parentheses, we get
<PRE>
(cat|dog)*
</PRE>
which matches zero or more consecutive substrings, each of which is "cat"
or "dog."
<P> To be even more concrete,
<PRE>
regexp "(cat|dog)*" catdogcatbert Match
</PRE>
will return true and set <TT>Match</TT> to <TT>catdogcat</TT>.
<P> <P><A NAME="7.5a">
<STRONG>Exercise 7.5a</STRONG> </A><DL><DD>
<P> Which of the following will return true? Of those that
do, what is assigned to the variable <TT>Match?</TT> Of those that do not, why?
<PRE>
set NoLetter_ {[^A-Za-z]}
set OkChar_ {[a-z@\.]}
regexp "(cat | dog)*bert" catdogbert Match
regexp "($NoLetter_+|nil) + ($NoLetter_+|nil)" "Answer: 2.6 + nillem" Match
regexp -nocase "^(From:|To:) *$OkChar_+$" \
"From: jazimmer@acm.org\n" \
Match
</PRE> <P>
<A HREF="7.9.html#Sol7.5a" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.5a">Solution</A></DL>
<P> Here is a short example of the power of parentheses. Recall that the Tcl
pattern matcher interprets <TT>^</TT> as an empty string just before the first
character of the string you are trying to match. In other words, <TT>^</TT> is
not just a control character the way <TT>(</TT> is. Instead, <TT>^</TT> is seen as
matching something. Now, consider the following,
<PRE>
set LineBrk_ "\n"
regexp "(^|$LineBrk_)To:" $Str Match
</PRE>
This will match the first occurrence of "To:" which is immediately
preceded by the start of the given string or a break between lines.
In other words, it matches the first occurrence of "To:" at the
beginning of a line.
<P> <P><A NAME="7.5b">
<STRONG>Exercise 7.5b</STRONG> </A><DL><DD>
Finish implementing this procedure,
<PRE>
proc getSummary String { ... }
</PRE>
<TT>String</TT> is viewed as a sequence of lines. Lines are separated with the
<TT>\n</TT> character. There may be any number of lines. The last line may, or
may not, end with a <TT>\n.</TT>
<P> The purpose of <TT>getSummary</TT> is to return the complete line that begins
with the word "Summary" – not including any <TT>\n</TT>. "Summary" may be
indented. If the word "Summary" begins more than one line, then the first
one is returned. If the word "Summary" begins no lines, then the empty
string is returned.
<P> To discover that "Summary" begins a line, you have make sure the "S"
is the very first letter or follows a end-of-line character.
This may get an unwanted <TT>\n</TT> into your match. You can get rid of
it with a <TT>string</TT> action. (There is another way to accomplish this
match which is described in the next
section. Use it if you like.) <P>
<A HREF="7.9.html#Sol7.5b" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.5b">Solution</A></DL>
<!-- Linkbar -->
<P><CENTER><FONT SIZE=2><NOBR>
<STRONG>From</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/sbf/tcl/book/home.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/sbf/tcl/book/home.html'" tppabs="http://www.mapfree.com/sbf/tcl/book/home.html">Tcl/Tk For Programmers</A><WBR>
<STRONG>Previous</STRONG>
<A HREF="7.4.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.4.html">section</A><WBR>
<STRONG>Next</STRONG>
<A HREF="7.6.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.6.html">section</A><WBR>
<STRONG>All</STRONG>
<A HREF="7.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.html">sections</A><WBR>
<STRONG>Author</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/mp/jaz/home.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/mp/jaz/home.html'" tppabs="http://www.mapfree.com/mp/jaz/home.html">J. A. Zimmer</A><WBR>
<STRONG>Copyright</STRONG>
<A HREF="copyright.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/copyright.html">Notice</A><WBR>
<P>
<I>Jun 17, 1998</I>
</NOBR></FONT></CENTER></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -