📄 regexp.shtml.htm

📁 mfc资料集合5
💻 HTM
📖 第 1 页 / 共 2 页
字号:
12 下一页
<HTML>
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
   <META NAME="Author" CONTENT="Guy Gascoigne - Piggford">
   <TITLE>Regex - An alternative Regular Expression Class</TITLE>
</HEAD>
<body background="../fancyhome/back.gif" tppabs="http://www.codeguru.com/fancyhome/back.gif" bgcolor="#FFFFFF" link="#B50029" vlink="#8E2323" alink="#FF0000" bgproperties="fixed">
<table WIDTH="100%">
<tr WIDTH="100%">
<td><td>
</tr>
</table>


<CENTER>
<H3>
<FONT COLOR="#AOAO99">An alternative Regular Expression Class</FONT></H3></CENTER>

<CENTER>
<H3>

<HR></H3></CENTER>

<p>This article was contributed by <a href="mailto:guy@wyrdrune.com">Guy Gascoigne -
Piggford</a></p>

<p>This is another regular expression library. Like Zafir's it is based upon the work of
Henry Spencer. I started using this a long time ago and called my class <tt>Regexp</tt>
(rather than <tt>CRegExp</tt>). Actually I prefer Zafir's name but I have too much code
using the other name to want to change it, so right now my class is called <tt>Regexp</tt>
(change it if you like).</p>

<p>So why put up another version? I hear you ask. Well the two classes took the same base
code and then developed to solve different problems. <tt>CRegExp</tt> is geared to Search
and Replace operations whereas <tt>Regexp</tt> was written to simplify tokenisation. I
wanted a class that could be given a 'program' and from that, return specific substrings
from it's input. Regular expressions may not be the fastest way to parse input (though
with careful anchoring they can be made so that they fail quickly if they are going to)
but once you have a working library they do allow for fairly rapid coding. On the whole
this is good enough, worry about making it faster once you have it working and actually
know that your optimization effort isn't going unnoticed.<font SIZE="2"></p>

<p></font>For example:</p>

<pre><tt>Regexp re( &quot;^[\t ]*(.*)[\t ]*\\((.*)\\)&quot; );
CString str( &quot;wyrdrune.com!kelly (Kelly)\n&quot; );
CString name, addr;

if ( re.Match( str ) &amp;&amp; re.SubStrings() == 2 )
{
	name = re[2];
	addr = re[1];
}</pre>
</tt>

<p><tt>Will give:</tt></p>

<p><tt>name == &quot;Kelly&quot; and addr == &quot;wyrdrune.com!kelly&quot;</tt></p>

<p>If you decompose the regular expression you get:<font SIZE="2"></p>
</font>

<table BORDER="1" CELLSPACING="1" CELLPADDING="7" WIDTH="100%">
  <tr>
    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>^</tt></font></td>
    <td WIDTH="509" VALIGN="TOP">Beginning of line anchor.</td>
  </tr>
  <tr>
    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>[\t ]*</tt></font></td>
    <td WIDTH="509" VALIGN="TOP">Any amount (that is zero or more characters) of tabs or
    spaces.</td>
  </tr>
  <tr>
    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>(.*)</tt></font></td>
    <td WIDTH="509" VALIGN="TOP">Field 1: A tagged expression matching any string of
    characters &#150; this will be the longest string that will still allow the rest of the
    pattern to match.</td>
  </tr>
  <tr>
    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>[\t ]*</tt></font></td>
    <td WIDTH="509" VALIGN="TOP">Any amount of tabs or spaces.</td>
  </tr>
  <tr>
    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>\\(</tt></font></td>
    <td WIDTH="509" VALIGN="TOP">An escaped open parenthesis. The double slash is a C/C++
    convention since this is the escape character and we want a literal slash to be passed
    through to the regular expression code. If the user were typing this sort of thing into
    your program they would only enter one slash. We escape the parenthesis so that it
    doesn&#146;t get interpreted as a regular expression special character.</td>
  </tr>
  <tr>
    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>(.*)</tt></font></td>
    <td WIDTH="509" VALIGN="TOP">Field 2: A tagged expression matching any string of
    characters.</td>
  </tr>
  <tr>
    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>\\)</tt></font></td>
    <td WIDTH="509" VALIGN="TOP">An escaped closing parenthesis.</td>
  </tr>
</table>
<font SIZE="2">

<p></font>BTW: the phrase <i>tagged regular expression</i> refers to any part of the
regular expression that is, because it was surrounded by parenthesis, accessible after a
match has been made as a separate substring.&nbsp; See <a
href="#Regular Expression Syntax">here</a> for more information about Regular Expression
syntax.</p>

<p>In English, we are looking for two fields. The first will be all characters from the
start of the line through to the second field (without any surrounding white space), and
the second will be all characters within parenthesis following the first field.</p>
<font FACE="Arial" SIZE="4"><b><a name="The Class">

<p>The Class</a></b></font> </p>

<p>The library itself comes as two source files, Regexp.cpp and Regexp.h. The header
defines the <tt>Regexp</tt> class with the following members:</p>

<p><strong><tt>Regexp::NSUBEXP </tt></strong></p>

<blockquote>
  <p>A constant defining how many subexpressions that the library will support (usually 10),
  attempting to use a regular expression with more than this number will generate an error.</p>
</blockquote>

<p><strong><tt>Regexp::Regexp() </tt></strong></p>

<blockquote>
  <p>A boring constructor, this must be initialized by assignment before anything useful can
  be done with it.</p>
</blockquote>

<p><strong><tt>Regexp::Regexp( TCHAR * exp, BOOL iCase = 0 ) </tt></strong></p>

<blockquote>
  <p><tt>exp</tt> : </p>
</blockquote>

<blockquote>
  <p>The regular expression itself, this format of which is defined later. The success or
  failure of the compilation can be discovered by using either <tt>GetErrorString()</tt> or <tt>CompiledOK()</tt>.</p>
</blockquote>

<blockquote>
  <p><tt>iCase</tt>:</p>
</blockquote>

<blockquote>
  <p>If <tt>TRUE</tt> the regular expression is compiled so that differences in case are
  ignored when matching.</p>
</blockquote>

<p><strong><tt>Regexp::Regexp( const Regexp &amp;r ) </tt></strong></p>

<blockquote>
  <p>Construct a new regular expression taking the compiled form from another <tt>Regexp</tt>.
  </p>
</blockquote>

<p><strong><tt>const Regexp::Regexp &amp; operator=( const Regexp &amp; r );</tt></strong></p>

<blockquote>
  <p>Assign <tt>Regexp r</tt> to the current object.</p>
</blockquote>

<p><strong><tt>bool Regexp::Match( const TCHAR * s ); </tt></strong></p>

<blockquote>
  <p>Examine the <tt>TCHAR</tt> array s with this regular expression, returning true if
  there is a match. This match updates the state of this Regexp object so that the
  substrings of the match can be obtained. The 0th substring is the substring of string that
  matched the whole regular expression. The others are those substrings that matched
  parenthesized expressions within the regular expression, with parenthesized expressions
  numbered in left-to-right order of their opening parentheses. If a parenthesized
  expression does not participate in the match at all, its length is 0. It is an error if
  this <tt>Regexp</tt> has not been successfully initialized.</p>
</blockquote>

<p><strong><tt>int Regexp::SubStrings() const; </tt></strong></p>

<blockquote>
  <p>Return the number of substrings found after a successful <tt>Match()</tt>.</p>
</blockquote>

<p><strong><tt>const CString Regexp::operator[]( unsigned int i ) const; </tt></strong></p>

<blockquote>
  <p>Return the <tt>i</tt>th matched substring after a successful <tt>Match()</tt>.</p>
</blockquote>

<p><strong><tt>int Regexp::SubStart( unsigned int i ) const; </tt></strong></p>

<blockquote>
  <p>Return the starting offset of the <tt>i</tt>th matched substring from the beginning of
  the <tt>TCHAR </tt>array used in <tt>Match()</tt>.</p>
</blockquote>

<p><strong><tt>int Regexp::SubLength( unsigned int i ) const; </tt></strong></p>

<blockquote>
  <p>Return the length of the <tt>i</tt>th matched substring</p>
</blockquote>

<blockquote>
  <p>Using the same example Regexp as before:</p>
  <pre><tt>Regexp re( &quot;^[\t ]*(.*)[\t ]*\\((.*)\\)&quot; );
CString str( &quot;wyrdrune.com!kelly (Kelly)\n&quot; );
if ( re.Match( str ) &amp;&amp; re.SubStrings() == 2 )
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -