📄 74.html
字号:
<HTML><TITLE>Regexp and Regsub: Repetitions and Branches</TITLE><BODY BGCOLOR="#FFF0E0" VLINK="#0FBD0F" TEXT="#101000" LINK="#0F0FDD">
<A NAME="top"><H1>Repetitions and Branches</H1></A>
<P> I find the concept of a <CITE><NAME=#G7.4quasichar>quasichar</A></CITE> to be useful in explaining
regular-expression pattern matching. A quasichar is a part of the pattern
that matches a single ASCII character, for example, <TT>A</TT>, <TT>[A-Z],</TT> <TT>\01,</TT>
and so forth. Here are descriptions of the special symbols that are directly related
to quasichars.
<P><CENTER><TABLE BORDER><TR><TD><DL>
<P> <DT><STRONG><PRE>*</PRE></STRONG><DD> When following quasichar, <TT>*</TT> causes any number (including zero) of
copies of that quasichar to be used in the match. Preference is to the
longest match.
<P> So, the glob pattern <TT>*</TT> is the same as either of these regular
expressions: <TT>^.*$</TT> or <TT>.*</TT> (Why?)
<P> <DT><STRONG><PRE>+</PRE></STRONG><DD> When following a quasichar, <TT>+</TT> causes one or more copies of that
quasichar to be used in the match. Preference is to the longest match.
<P> So, the pattern <TT>[a-z]+</TT> will match the entire string "cat" because
this pattern
<PRE>
[a-z][a-z][a-z]
</PRE>
matches "cat" and no longer sequence of <TT>[a-z]</TT> could make a match. The
same pattern will match the first two characters of the string "hi!" and
would not match any substring of "1234."
<P> <DT><STRONG><PRE>?</PRE></STRONG><DD> When following a quasichar, <TT>?</TT> causes zero or one copies of that
quasichar to be used in the match. Preference is to the longest match.
</DL></TD></TR></TABLE></CENTER></P>
I call these special symbols <CITE><NAME=#G7.4repeaters>repeaters</A></CITE> for obvious reasons.
<P> Repeaters introduce the possibility that more than one matching substring
might begin at the same position in the string. Ties are broken as with glob
pattern matching:
<DL><DD> <CITE>If two matching substrings begin at the same
position in the string, the longer is chosen.</CITE>
</DL>
That is what "preference is
to the longest match" means in the descriptions above.
<P> Regular expressions are more than just a sequence of quasichars with possible
repeaters, they can be several sequences of quasichars with possible
repeaters. The special symbol <TT>|</TT> is used to separate these sequences
which are called <CITE><NAME=#G7.4branch>branch</A>es</CITE>. Each branch defines a different
possible match.
<P> Now for a major difference between Tcl's version 8.1 and everything that came
before. (Note that version 8.1 is experimental at the time of writing.)
<P><DL>
<P> <DT>For versions 8.0 and earlier<DD>
<DL><DD>
<P> <CITE>When more than one branch defines a matching substring at a given
position within a string, the leftmost branch will be used – even if the
match defined by another branch would choose a longer substring.</CITE>
</DL>
<P> <DT>For versions 8.1 and later<DD>
<DL><DD>
<P> <CITE>When more than one branch defines a matching substring at a given
position within a string, the longer will be used. If two are of the longest
length then the one to the left will be used.</CITE>
</DL>
</DL></P>
<P> Some examples will help. They depend on this preassignment:
<PRE>
set BC_ {[bBcC]}
</PRE>
<P> This,
<PRE>
regexp a|$BC_ cat Match
</PRE>
matches <TT>Match</TT> with "c" because that is the first substring of <TT>cat</TT>
which can be matched.
<P> This,
<PRE>
regexp $BC_?|$BC_* bbbb Match
</PRE>
matches <TT>Match</TT> with "b" in versions 8.0 and earlier and with
"bbbb" in versions 8.1 and later.
<P> This,
<PRE>
regexp ^$BC_* able Match
</PRE>
matches <TT>Match</TT> with the empty string. The <TT>*</TT> repeater enables a match
with the empty string. The empty string at the front of "able" is the
first possible match.
<P> This,
<PRE>
regexp ^able|^$BC_* able Match
</PRE>
matches <TT>Match</TT> with "able." The leftmost branch takes precedence here
because both patterns match at the first character of "able." It might
seem that a match to the empty string at the beginning of "able" would come
first. It does not.
<P>
<P> <P><A NAME="7.4a">
<STRONG>Exercise 7.4a</STRONG> </A><DL><DD>
Which of the following <TT>regexp</TT>s will return
true? Of those that do, what is assigned to the variable <TT>Match?</TT> Of those
that do not, why?
<PRE>
set Digit_ {[0-9]}
set Space_ "\[ \t]"
set Dot_ {\.}
set NoDot_ {[^\.]}
set Quote_ {"}
regexp -indices $Space_$Quote_ { "} Match
regexp $Digit_.$Digit_ 201 Match
regexp $NoDot_*$Dot_ "Interesting. But not relevant." Match
regexp ".*" "" Match
</PRE> <P>
<A HREF="7.9.html#Sol7.4a" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.4a">Solution</A></DL>
<P> <P><A NAME="7.4b">
<STRONG>Exercise 7.4b</STRONG> </A><DL><DD>
Which of the following will return true? Of those
that do, what is assigned to the variable <TT>Match?</TT> Of those that do not,
why?
<PRE>
regexp catbert|cat catbert Match
regexp cat|catbert catbert Match
regexp c?t|at catbert Match
set NoLowerCase_ {[^a-z]}
regexp $NoLowerCase_*at|atbert Catbert Match
regexp $NoLowerCase_*bert|bert Catbert Match
</PRE>
<P>
<A HREF="7.9.html#Sol7.4b" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.4b">Solution</A></DL>
<P> <P><A NAME="7.4c">
<STRONG>Exercise 7.4c</STRONG> </A><DL><DD>
Write a <TT>regexp</TT> command that matches
everything in a string <TT>Str</TT> up to, and including, the first end of
line.
<P> Test your answer with these strings: "<TT>Hi There\nBig Boy\n</TT>,"
"<TT>\nSecond
Line</TT>," and "First Line." You should obtain, respectively, the string
"<TT>Hi
There</TT>" followed by a new line, a new line without anything before or after it,
and no match. <P>
<A HREF="7.9.html#Sol7.4c" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.4c">Solution</A></DL>
<!-- Linkbar -->
<P><CENTER><FONT SIZE=2><NOBR>
<STRONG>From</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/sbf/tcl/book/home.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/sbf/tcl/book/home.html'" tppabs="http://www.mapfree.com/sbf/tcl/book/home.html">Tcl/Tk For Programmers</A><WBR>
<STRONG>Previous</STRONG>
<A HREF="7.3.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.3.html">section</A><WBR>
<STRONG>Next</STRONG>
<A HREF="7.5.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.5.html">section</A><WBR>
<STRONG>All</STRONG>
<A HREF="7.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.html">sections</A><WBR>
<STRONG>Author</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/mp/jaz/home.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/mp/jaz/home.html'" tppabs="http://www.mapfree.com/mp/jaz/home.html">J. A. Zimmer</A><WBR>
<STRONG>Copyright</STRONG>
<A HREF="copyright.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/copyright.html">Notice</A><WBR>
<P>
<I>Jun 17, 1998</I>
</NOBR></FONT></CENTER></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -