📄 79.html
字号:
<HTML><TITLE>Regexp and Regsub: Solutions to Exercises</TITLE><BODY BGCOLOR="#FFF0E0" VLINK="#0FBD0F" TEXT="#101000" LINK="#0F0FDD">
<A NAME="top"><H1>Solutions to Exercises</H1></A>
<P> <A> <A NAME="Sol7.2a">
<STRONG> Solution To Exercise 7.2a</STRONG> </A> <P>
The <TT>\t</TT> will not be substituted with the nonprintable tab character unless
you are running version 8.1 or later. Instead, the regular expression
evaluator will see it as a letter "t." Fix the problem this way:
<PRE>
set Space_ "\[ \t]"
</PRE>
The backslash before the left square bracket prevents the Tcl interpreter from
doing command substitution.
<P> <A> <A NAME="Sol7.3a">
<STRONG> Solution To Exercise 7.3a</STRONG> </A> <P>
The first part, <TT>$Pre1_</TT>, is not interpreted by the Tcl interpreter during
preassignment. The backslashes tell the regular-expression translator that
there are no special symbols. This part matches <TT>[0-9]</TT> exactly.
<P> The second part, <TT>$Pre_2</TT>, is interpreted by the Tcl interpreter during
preassignment. The regular-expression translator sees <TT>[0-9]</TT>, which
matches any single digit.
<P> So, the whole regular expression pattern would match <TT>[0-9]3</TT>, but not
<TT>3[0-9]</TT>.
<P> The second preassignment to <TT>Pre2_</TT> would cause an error because <TT>0-9</TT>
is not a command name and so command substitution fails.
<P> By the way, the preassignment to <TT>Pre1_</TT> could have been written this way:
<PRE>
set Pre1_ {\[0-9]}
</PRE>
because the hypen and the right square bracket are not considered to be
special symbols by the regular-expression translator unless they follow a left
square bracket.
<P> <A> <A NAME="Sol7.3b">
<STRONG> Solution To Exercise 7.3b</STRONG> </A> <P>
<PRE>
regexp "^Tcl$" $Name
</PRE>
Variable substitution is not attempted when a symbol other than a letter,
number, or underscore follows a dollar sign. This rule is consistent
with what you have had to learn about safe variable names.
<P> <A> <A NAME="Sol7.3c">
<STRONG> Solution To Exercise 7.3c</STRONG> </A> <P>
<PRE>
set NoDot_ {[^\.]}
</PRE>
As it happens, the backslash is not necessary. Within square brackets, the
only special symbols that are recognized are <TT>^,</TT> <TT>-,</TT> and <TT>].</TT> I
prefer to ignore this rule and do the backslash substitutions for
nonalphameric characters. (The word "nonalphameric" is important here.
Indeed, with version 8.1, a backslash of a letter is either a request for a
special backslash substitution, such as <TT>\t</TT> or <TT>\n</TT>, or an error.) If
you want to take advantage of it, you should know that the rule even has a
counterpart with glob pattern matching that I did not mention there.
<P> <A> <A NAME="Sol7.3d">
<STRONG> Solution To Exercise 7.3d</STRONG> </A> <P>
<PRE>
% regexp -indices "\[a-z]ab" abab Match
1
% set Match
1 3
% regexp -indices t$ catbert Match
1
% set Match
6 6
</PRE>
<P> <A> <A NAME="Sol7.4a">
<STRONG> Solution To Exercise 7.4a</STRONG> </A> <P>
<PRE>
regexp -indices $Space_$Quote_ { "} Match
<CITE>Matches and</CITE> Match <CITE>is</CITE> 1 2
regexp $Digit_.$Digit_ 201 Match <CITE>Matches and</CITE> Match <CITE>is</CITE> 201
regexp $NoDot_*$Dot_ "Interesting. But not relevant." Match
<CITE>Matches and</CITE> Match <CITE>is</CITE> Interesting.
regexp ".*" "" Match <CITE>Matches and</CITE> Match <CITE>is the empty string.</CITE>
</PRE>
<P> <A> <A NAME="Sol7.4b">
<STRONG> Solution To Exercise 7.4b</STRONG> </A> <P>
<PRE>
regexp catbert|cat catbert Match <CITE>Matches and</CITE> Match <CITE>is</CITE> catbert
regexp cat|catbert catbert Match <CITE>Matches and</CITE>
Match <CITE>is</CITE> cat in version 8.0 and earlier
Match <CITE>is</CITE> catbert in version 8.1 and later
regexp c?t|at catbert Match <CITE>Matches and</CITE> Match <CITE>is</CITE> at
regexp $NoLowerCase_*at|catbert Catbert Match
<CITE>Matches and</CITE> Match <CITE>is</CITE> Cat
regexp $NoLowerCase_*bert|bert Catbert Match
<CITE>Matches and</CITE> Match <CITE>is</CITE> bert
</PRE>
In the last one it is the leftmost branch that is used. Remember that the
<TT>*</TT> repeater lets a quasichar match an empty string, an imaginary empty
string exists at the front of each character in a string, and that when two
matches are the same length the leftmost one prevails in all versions of Tcl.
<P> <A> <A NAME="Sol7.4c">
<STRONG> Solution To Exercise 7.4c</STRONG> </A> <P>
<PRE>
set CarriageRet_ "\n"
set NoCarriageRet_ "\[^\n]"
regexp "^$NoCarriageRet_*$CarriageRet_" $Str Match
</PRE>
<P> <A> <A NAME="Sol7.5a">
<STRONG> Solution To Exercise 7.5a</STRONG> </A> <P>
<P> This,
<PRE>
regexp "(cat | dog)*bert" catdogbert Match
</PRE>
returns 1, but the "<TT>(cat | dog)*</TT>" part had to match an empty string because
there is no space before the "<TT>dog</TT>" in "<TT>catdogbert</TT>;" <TT>Match</TT> is
"<TT>bert</TT>."
<P> This,
<PRE>
regexp "($NoLetter_+|nil) + ($NoLetter_+|nil)" "Answer: 2.6 +nillem" Match
</PRE>
returns 0. The <TT>+</TT> does not match the "+" in the string because
it is a repeater.
The match you may have thought you were getting happens with this version:
<PRE>
set Plus_ {\+}
regexp "($NoLetter_+|nil) $Plus_ ($NoLetter_+|nil)" "Answer: 2.6 + nillem"
</PRE>
<P> This,
<PRE>
regexp -nocase "^(From:|To:) *$OkChar_+$" \
"From: jazimmer@acm.org\n" \
Match
</PRE>
returns 0. Here it is the <TT>\n</TT> that causes the trouble. The <TT>$</TT> in the
pattern does not match it because it is the end of a line, not the end of a
string. This string, "<TT>From: jazimmer@acm.org</TT>," would match just fine.
<P> <A> <A NAME="Sol7.5b">
<STRONG> Solution To Exercise 7.5b</STRONG> </A> <P>
<PRE>
proc getSummary String {
set Beginning_ "(^|\n)"
set Space_ "\[ \t]"
set InLine_ "\[^\n]"
if [regexp "$Beginning_$Space_*Summary$InLine_*" $String Line] {
return [string trim $Line "\n "]
} else {
return ""
}
}
</PRE>
<P> Here is the way it is done using parentheses to extract subpatterns as
described above in
<A HREF="7.6.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.6.html">Use Parentheses to Extract Subpatterns</A>.
<PRE>
proc getSummary String {
set Beginning_ "(^|\n)"
set Space_ "\[ \t]"
set InLine_ "\[^\n]"
if [regexp "$Beginning_$Space_*(Summary$InLine_*)" $String \
Junk1 Junk2 Summary] \
{
return $Summary
} else {
return ""
}
}
</PRE>
<P> <A> <A NAME="Sol7.6a">
<STRONG> Solution To Exercise 7.6a</STRONG> </A> <P>
<PRE>
set Space_ "\[ \t]"
set Labl_ "\[^ \t]+"
set Int_ {[0-9]*}
regexp "$Space_*($Labl_)$Space_+($Int_)$Space_+($Int_)" $Line \
Junk Label Before After
</PRE>
<P> <A> <A NAME="Sol7.7a">
<STRONG> Solution To Exercise 7.7a</STRONG> </A> <P>
<PRE>
regsub -all & $Str && Str
</PRE>
<P> <A> <A NAME="Sol7.7b">
<STRONG> Solution To Exercise 7.7b</STRONG> </A> <P>
<PRE>
set ToLft_ "^|\[^a-zA-Z]"
set ToRght_ "\[^a-zA-Z]|$"
regsub -all ($ToLft_)cat($ToRght_) $Str \\1dog\\2 Str
regsub -all ($ToLft_)cat(s?)($ToRght_) $Str \\1dog\\2\\3 Str
</PRE>
<!-- Linkbar -->
<P><CENTER><FONT SIZE=2><NOBR>
<STRONG>From</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/sbf/tcl/book/home.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/sbf/tcl/book/home.html'" tppabs="http://www.mapfree.com/sbf/tcl/book/home.html">Tcl/Tk For Programmers</A><WBR>
<STRONG>Previous</STRONG>
<A HREF="7.8.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.8.html">section</A><WBR>
<STRONG>All</STRONG>
<A HREF="7.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.html">sections</A><WBR>
<STRONG>Author</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/mp/jaz/home.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/mp/jaz/home.html'" tppabs="http://www.mapfree.com/mp/jaz/home.html">J. A. Zimmer</A><WBR>
<STRONG>Copyright</STRONG>
<A HREF="copyright.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/copyright.html">Notice</A><WBR>
<P>
<I>Jun 17, 1998</I>
</NOBR></FONT></CENTER></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -