📄 73.html
字号:
<HTML><TITLE>Regexp and Regsub: Regular-Expression versus Glob Patterns</TITLE><BODY BGCOLOR="#FFF0E0" VLINK="#0FBD0F" TEXT="#101000" LINK="#0F0FDD">
<A NAME="top"><H1>Regular-Expression versus Glob Patterns</H1></A>
<P> Regular-expression pattern matching is similar to glob pattern matching
in these ways:
<UL>
<P> <P><LI> Both accept exact matches between the pattern string and the given string.
<P> However, regular-expression matching also accepts exact matches between the
pattern string and a substring of the given string. Substrings, of course,
contain consecutive characters. <CITE>When more than one substring matches, a
substring that begins earlier has precedence.</CITE> Rules for breaking other ties
are explained later.
<P> <P><LI> Both accept those inexact matches that are accounted for by special
characters in the pattern string.
<P> <P><LI> Both will match square brackets against a single character – the point
of the brackets being to define a set of acceptable characters. For example,
<TT>[a-z]</TT> will match a single lowercase letter and <TT>[:;]</TT> will match
either a colon or a semicolon. (See
<A HREF="NotHere.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/NotHere.html">Glob Patterns</A>.)
<P> However, regular-expression matching provides another way of specifying the
set of characters that are acceptable matches. If the character immediately
after the <TT>[</TT> is a <TT>^</TT>, then the match will be with <CITE>any</CITE> ASCII
character which is <CITE>not</CITE> listed. For example, <TT>[^&]</TT> matches any
character except the ampersand and <TT>[^a-zA-Z]</TT> matches any character except
a letter. (Some glob pattern matchers accept this kind of pattern too,
but this feature is not universal in the glob world.)
<P> Another small difference is that the strange acceptance of <TT>[z-a]</TT> by
Tcl's glob pattern matcher is not carried over to regular-expression pattern
matching.
<P> <P><LI> Both use backslash substitution, called backslash quoting, to permit
special characters to appear without their special meanings. For example,
<TT>\[</TT> in both means the left square bracket itself and is not a special way
of matching a single character.
</UL>
<P>
<P> The special characters for regular-expression pattern matching are
a little different than for glob pattern matching. Here they are:
<PRE>
?*[]-^.+$|()
</PRE>
<P> You have seen how the square brackets and hyphen are used. These
are the only globlike symbols in the list. Others which appear to
be globlike have different meanings in the regular expression world.
<P><A NAME="7.3a">
<STRONG>Exercise 7.3a</STRONG> </A><DL><DD>
What will the regular expression <TT>"$Pre1_$Pre2"</TT> match
after these preassignments?
<PRE>
set Pre1_ {\[0\-9\]}
set Pre2_ "\[0-9]"
</PRE>
Suppose the second assignment is
<PRE>
set Pre2_ "[0-9]"
</PRE>
What happens?
<P>
<A HREF="7.9.html#Sol7.3a" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.3a">Solution</A></DL>
<P> Tcl's regular-expression pattern matcher is invoked with the
<TT><NAME=#Cregexp>regexp</A></TT> command. Here are two forms in which <TT>regexp</TT> can be
used. The next section explains a form in which more arguments are used.
Switches are described later in this section.
<P><CENTER><TABLE BORDER><TR><TD><DL>
<P> <DT><STRONG><PRE>regexp <CITE>?SWITCHES? PATTERN STRING</CITE></PRE></STRONG><DD> This command returns true or
false depending whether <CITE>PATTERN</CITE> matches <CITE>STRING</CITE>.
<P> <DT><STRONG><PRE>regexp <CITE>?SWITCHES? PATTERN STRING VARIABLE_NAME</CITE></PRE></STRONG><DD> This form is like
the one above except that, when there is a match, the matched substring is
assigned to <CITE>VARIABLE_NAME</CITE>.
</DL></TD></TR></TABLE></CENTER></P>
<P> Also see the use of <TT>regexp</TT> with parentheses below in
<A HREF="7.5.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.5.html">Use Parentheses to Build more Complicated Patterns</A> and
<A HREF="7.6.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.6.html">Use Parentheses to Extract Subpatterns</A>.
<P> You can try this command on some glob patterns that use square
brackets in a way acceptable for regular expressions:
<PRE>
% regexp {[a-z][A-Z]} aX Match
1
% set Match
aX
% regexp {[a-z][A-Z]} AbCdEf Match
1
% set Match
bC
</PRE>
However, this violates my convention for writing regular expressions and
so I would write it this way:
<PRE>
% set Letters_ {[a-z][A-Z]}
[a-z][A-Z]
% regexp $Letters_ aX Match
1
% set Match
aX
% regexp $Letters_ AbCdEf Match
1
% set Match
bC
</PRE>
<P> The second example shows one of the differences between regular expressions
and globs: regular expressions will match substrings. When, as in this
case, more than one substring could match, the one which begins first
is chosen.
<P> If you want to force a match with a whole string, it is possible. Two of
the special symbols help.
<P> One of these is <TT>^.</TT> Although <TT>^</TT> has the meaning described above
when it follows the special symbol <TT>[,</TT> the meaning is different when
<TT>^</TT> appears at the <CITE>beginning of a pattern.</CITE> At the beginning of a
pattern, it matches an imaginary empty substring that appears just before the
beginning of the string to be matched. Here are some examples.
<PRE>
% set SmallLetter_ [a-z]
% regexp "^$SmallLetter_" AbCdEf Match
0
% regexp "^$SmallLetter_" ab Match
1
% set Match
a
% regexp "^" ab Match
1
% set Match
</PRE>
The first <TT>regexp</TT> returns 0 because the empty string before AbCdEf
is followed by the letter A which does not match <TT>[a-z]</TT>. The last
<TT>regexp</TT> command returns the empty string, the one found just before the
<TT>ab</TT> in the pattern.
<P> <STRONG>Remark</STRONG> <DL><DD> This interpretation of what <TT>^</TT> matches and the
fact that "^" matches "ab," are not universal truths in the world of
regular-expression pattern matching. For example, my Perl interpreter
does not agree with the last of these examples.
</DL>
<P> Another special symbol that helps force a pattern to match a whole
string is <TT>$.</TT> At the end of a pattern, this matches an imaginary
empty string that appears immediately after the string to be matched.
<P> When the symbols <TT>^</TT> and <TT>$</TT> are used as just described, they are
called <CITE><NAME=#G7.3anchor>anchor</A>s</CITE> because they have the effect of anchoring the
matching substring at the beginning or ending of the given string.
<P> <P><A NAME="7.3b">
<STRONG>Exercise 7.3b</STRONG> </A><DL><DD>
Rewrite the following with <TT>regexp.</TT>
<PRE>
string match {Tcl} $Name
</PRE>
<P>
<A HREF="7.9.html#Sol7.3b" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.3b">Solution</A></DL>
<P> Another special symbol is the period <TT>.</TT> which matches any single
character. This is analogous to the use of <TT>?</TT> in glob pattern
matching.
<P> <P><A NAME="7.3c">
<STRONG>Exercise 7.3c</STRONG> </A><DL><DD>
Preassign a subpattern, <TT>NoDot_</TT>, that matches
any character that is not a period. <P>
<A HREF="7.9.html#Sol7.3c" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.3c">Solution</A></DL>
<P> To finish up this section, here are the switches for <TT>regexp:</TT>
<P><CENTER><TABLE BORDER><TR><TD><DL>
<P>
<P> <DT><STRONG><PRE>-nocase</PRE></STRONG><DD> This causes letters in <CITE>STRING</CITE> to be converted to
lowercase before matching begins. The change only affects a copy of
<CITE>STRING</CITE> used in matching, <CITE>STRING</CITE> itself is unchanged. The
effect is that lowercase letters in your pattern will match letters
of either case in <CITE>STRING</CITE>.
<P> <DT><STRONG><PRE>-indices</PRE></STRONG><DD> This works with the second form or <TT>regexp</TT>. Its
effect is to cause a two-number list to be assigned to
<CITE>VARIABLE_NAME</CITE> – the first number is the first index in <CITE>STRING</CITE> of
the matching substring and the second number is the last index in <CITE>STRING</CITE>
of the matching substring.
</DL></TD></TR></TABLE></CENTER></P>
<P> <P><A NAME="7.3d">
<STRONG>Exercise 7.3d</STRONG> </A><DL><DD>
Fill in the question marks.
<PRE>
% regexp -indices "\[a-z]ab" abab Match
1
% set Match
?
% regexp -indices t$ catbert Match
1
% set Match
?
</PRE>
<P>
<A HREF="7.9.html#Sol7.3d" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.9.html#Sol7.3d">Solution</A></DL>
<!-- Linkbar -->
<P><CENTER><FONT SIZE=2><NOBR>
<STRONG>From</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/sbf/tcl/book/home.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/sbf/tcl/book/home.html'" tppabs="http://www.mapfree.com/sbf/tcl/book/home.html">Tcl/Tk For Programmers</A><WBR>
<STRONG>Previous</STRONG>
<A HREF="7.2.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.2.html">section</A><WBR>
<STRONG>Next</STRONG>
<A HREF="7.4.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.4.html">section</A><WBR>
<STRONG>All</STRONG>
<A HREF="7.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/7.html">sections</A><WBR>
<STRONG>Author</STRONG>
<A HREF="javascript:if(confirm('http://www.mapfree.com/mp/jaz/home.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.mapfree.com/mp/jaz/home.html'" tppabs="http://www.mapfree.com/mp/jaz/home.html">J. A. Zimmer</A><WBR>
<STRONG>Copyright</STRONG>
<A HREF="copyright.html" tppabs="http://www.mapfree.com/sbf/tcl/book/select/Html/copyright.html">Notice</A><WBR>
<P>
<I>Jun 17, 1998</I>
</NOBR></FONT></CENTER></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -