⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 74.html

📁 Tcl 语言的入门级图书
💻 HTML
字号:
<HTML><TITLE>Regexp and Regsub: Repetitions and Branches</TITLE><BODY BGCOLOR="#FFF0E0" VLINK="#0FBD0F" TEXT="#101000" LINK="#0F0FDD">
<A NAME="top"><H1>Repetitions and Branches</H1></A>


<P> I find the concept of a <CITE><NAME=#G7.4quasichar>quasichar</A></CITE> to be useful in explaining
regular-expression pattern matching.  A quasichar is a part of the pattern
that matches a single ASCII character, for example,  <TT>A</TT>, <TT>[A-Z],</TT> <TT>\01,</TT>
and so forth.  Here are descriptions of the special symbols that are directly related
to quasichars.

<P><CENTER><TABLE BORDER><TR><TD><DL>

<P> <DT><STRONG><PRE>*</PRE></STRONG><DD> When following quasichar, <TT>*</TT> causes any number (including zero) of
copies of that quasichar to be used in the match.  Preference is to the
longest match.

<P>  So, the glob pattern <TT>*</TT> is the same as either of these regular
expressions: <TT>^.*$</TT> or <TT>.*</TT>  (Why?)

<P> <DT><STRONG><PRE>+</PRE></STRONG><DD> When following a quasichar, <TT>+</TT> causes one or more copies of that
quasichar to be used in the match.  Preference is to the longest match.

<P> So, the pattern <TT>[a-z]+</TT> will match the entire string "cat" because
this pattern

<PRE>
[a-z][a-z][a-z]
</PRE>

matches "cat" and no longer sequence of <TT>[a-z]</TT> could make a match.   The
same pattern will match the first two characters of the string "hi!" and
would not match any substring of "1234."

<P> <DT><STRONG><PRE>?</PRE></STRONG><DD>  When following a quasichar, <TT>?</TT> causes zero or one copies of that
quasichar to be used in the match.  Preference is to the longest match.

</DL></TD></TR></TABLE></CENTER></P>

I call these special symbols <CITE><NAME=#G7.4repeaters>repeaters</A></CITE> for obvious reasons.

<P>  Repeaters introduce the possibility that more than one matching substring
might begin at the same position in the string.  Ties are broken as with glob
pattern matching:

<DL><DD> <CITE>If two matching substrings begin at the same
position in the string, the longer is chosen.</CITE>  
</DL>

That is what "preference is
to the longest match" means in the descriptions above.

<P> Regular expressions are more than just a sequence of quasichars with possible
repeaters, they can be several sequences of quasichars with possible
repeaters.  The special symbol <TT>|</TT> is used to separate these sequences
which are called <CITE><NAME=#G7.4branch>branch</A>es</CITE>.  Each branch defines a different
possible match.

<P> Now for a major difference between Tcl's version 8.1 and everything that came
before. (Note that version 8.1 is experimental at the time of writing.)

<P><DL>
<P> <DT>For versions 8.0 and earlier<DD>
<DL><DD>
<P>  <CITE>When more than one branch defines a matching substring at a given
position within a string, the leftmost branch will be used &#150; even if the
match defined by another branch would choose a longer substring.</CITE>
</DL>

<P> <DT>For versions 8.1 and later<DD>
<DL><DD>
<P>  <CITE>When more than one branch defines a matching substring at a given
position within a string, the longer will be used.  If two are of the longest
length then the one to the left will be used.</CITE>
</DL>
</DL></P>

<P> Some examples will help.  They depend on this preassignment:

<PRE>
set BC_ {[bBcC]}
</PRE>

<P> This,

<PRE>
regexp a|$BC_  cat Match
</PRE>

matches <TT>Match</TT> with "c" because that is the first substring of <TT>cat</TT>
which can be matched.

<P>  This,

<PRE>
regexp $BC_?|$BC_* bbbb Match
</PRE>

matches <TT>Match</TT> with "b" in versions 8.0 and earlier and with
"bbbb" in versions 8.1 and later.

<P> This,

<PRE>
regexp ^$BC_* able Match
</PRE>

matches <TT>Match</TT> with the empty string.  The <TT>*</TT> repeater enables a match
with the empty string.  The empty string at the front of "able" is the
first possible match.

<P> This,

<PRE>
regexp ^able|^$BC_* able Match
</PRE>

matches <TT>Match</TT> with "able."  The leftmost branch takes precedence here
because both patterns match at the first character of "able."  It might
seem that a match to the empty string at the beginning of "able" would come
first.  It does not.  
<P> 
<P> <P><A NAME="7.4a">
<STRONG>Exercise 7.4a</STRONG> </A><DL><DD>
  Which of the following <TT>regexp</TT>s will return
true?  Of those that do, what is assigned to the variable <TT>Match?</TT> Of those
that do not, why?

<PRE>
set Digit_ {[0-9]}
set Space_ "\[ \t]"
set Dot_ {\.}
set NoDot_ {[^\.]}
set Quote_ {"}
regexp -indices $Space_$Quote_ {  "} Match
regexp $Digit_.$Digit_ 201 Match
regexp $NoDot_*$Dot_ "Interesting. But not relevant." Match
regexp ".*" "" Match
</PRE> <P>
<A HREF="7.9.html#Sol7.4a" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.4a">Solution</A></DL>


<P> <P><A NAME="7.4b">
<STRONG>Exercise 7.4b</STRONG> </A><DL><DD>
  Which of the following will return true?  Of those
that do, what is assigned to the variable <TT>Match?</TT> Of those that do not,
why?

<PRE>
regexp catbert|cat catbert Match
regexp cat|catbert catbert Match
regexp c?t|at catbert Match
set NoLowerCase_ {[^a-z]}
regexp $NoLowerCase_*at|atbert Catbert Match
regexp $NoLowerCase_*bert|bert Catbert Match
</PRE>
<P>
<A HREF="7.9.html#Sol7.4b" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.4b">Solution</A></DL>


<P> <P><A NAME="7.4c">
<STRONG>Exercise 7.4c</STRONG> </A><DL><DD>
  Write a <TT>regexp</TT> command that matches
everything in a string <TT>Str</TT> up to, and including, the first end of
line.

<P> Test your answer with these strings: "<TT>Hi There\nBig Boy\n</TT>," 
"<TT>\nSecond
Line</TT>," and "First Line."  You should obtain, respectively, the string 
"<TT>Hi
There</TT>" followed by a new line, a new line without anything before or after it,
and no match.  <P>
<A HREF="7.9.html#Sol7.4c" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.4c">Solution</A></DL>



<!-- Linkbar -->
<P><CENTER><FONT SIZE=2><NOBR>
<STRONG>From</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/sbf/tcl/book/home.html  \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address.  \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/sbf/tcl/book/home.html'" tppabs="http://www.mapfree.com/sbf/tcl/book/home.html">Tcl/Tk For Programmers</A><WBR>
<STRONG>Previous</STRONG>
<A HREF="7.3.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.3.html">section</A><WBR>
<STRONG>Next</STRONG>
<A HREF="7.5.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.5.html">section</A><WBR>
<STRONG>All</STRONG>
<A HREF="7.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.html">sections</A><WBR>
<STRONG>Author</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/mp/jaz/home.html  \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address.  \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/mp/jaz/home.html'" tppabs="http://www.mapfree.com/mp/jaz/home.html">J. A. Zimmer</A><WBR>
<STRONG>Copyright</STRONG>
<A HREF="copyright.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/copyright.html">Notice</A><WBR>
<P>
<I>Jun 17, 1998</I>
 </NOBR></FONT></CENTER></BODY></HTML>


⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -