📄 regexp.shtml

📁 mfc资源大全包含MFC编程各个方面的源码
💻 SHTML
📖 第 1 页 / 共 2 页
字号:
12 下一页
<HTML><HEAD>   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">   <META NAME="Author" CONTENT="Guy Gascoigne - Piggford">   <TITLE>Regex - An alternative Regular Expression Class</TITLE></HEAD><body background="../fancyhome/back.gif" bgcolor="#FFFFFF" link="#B50029" vlink="#8E2323" alink="#FF0000" bgproperties="fixed"><table WIDTH="100%"><tr WIDTH="100%"><td align=center><!--#exec cgi="/cgi/ads.cgi"--><td></tr></table><CENTER><H3><FONT COLOR="#AOAO99">An alternative Regular Expression Class</FONT></H3></CENTER><CENTER><H3><HR></H3></CENTER><p>This article was contributed by <a href="mailto:guy@wyrdrune.com">Guy Gascoigne -Piggford</a></p><p>This is another regular expression library. Like Zafir's it is based upon the work ofHenry Spencer. I started using this a long time ago and called my class <tt>Regexp</tt>(rather than <tt>CRegExp</tt>). Actually I prefer Zafir's name but I have too much codeusing the other name to want to change it, so right now my class is called <tt>Regexp</tt>(change it if you like).</p><p>So why put up another version? I hear you ask. Well the two classes took the same basecode and then developed to solve different problems. <tt>CRegExp</tt> is geared to Searchand Replace operations whereas <tt>Regexp</tt> was written to simplify tokenisation. Iwanted a class that could be given a 'program' and from that, return specific substringsfrom it's input. Regular expressions may not be the fastest way to parse input (thoughwith careful anchoring they can be made so that they fail quickly if they are going to)but once you have a working library they do allow for fairly rapid coding. On the wholethis is good enough, worry about making it faster once you have it working and actuallyknow that your optimization effort isn't going unnoticed.<font SIZE="2"></p><p></font>For example:</p><pre><tt>Regexp re( &quot;^[\t ]*(.*)[\t ]*\\((.*)\\)&quot; );CString str( &quot;wyrdrune.com!kelly (Kelly)\n&quot; );CString name, addr;if ( re.Match( str ) &amp;&amp; re.SubStrings() == 2 ){	name = re[2];	addr = re[1];}</pre></tt><p><tt>Will give:</tt></p><p><tt>name == &quot;Kelly&quot; and addr == &quot;wyrdrune.com!kelly&quot;</tt></p><p>If you decompose the regular expression you get:<font SIZE="2"></p></font><table BORDER="1" CELLSPACING="1" CELLPADDING="7" WIDTH="100%">  <tr>    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>^</tt></font></td>    <td WIDTH="509" VALIGN="TOP">Beginning of line anchor.</td>  </tr>  <tr>    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>[\t ]*</tt></font></td>    <td WIDTH="509" VALIGN="TOP">Any amount (that is zero or more characters) of tabs or    spaces.</td>  </tr>  <tr>    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>(.*)</tt></font></td>    <td WIDTH="509" VALIGN="TOP">Field 1: A tagged expression matching any string of    characters &#150; this will be the longest string that will still allow the rest of the    pattern to match.</td>  </tr>  <tr>    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>[\t ]*</tt></font></td>    <td WIDTH="509" VALIGN="TOP">Any amount of tabs or spaces.</td>  </tr>  <tr>    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>\\(</tt></font></td>    <td WIDTH="509" VALIGN="TOP">An escaped open parenthesis. The double slash is a C/C++    convention since this is the escape character and we want a literal slash to be passed    through to the regular expression code. If the user were typing this sort of thing into    your program they would only enter one slash. We escape the parenthesis so that it    doesn&#146;t get interpreted as a regular expression special character.</td>  </tr>  <tr>    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>(.*)</tt></font></td>    <td WIDTH="509" VALIGN="TOP">Field 2: A tagged expression matching any string of    characters.</td>  </tr>  <tr>    <td WIDTH="77" VALIGN="TOP"><font face="Courier"><tt>\\)</tt></font></td>    <td WIDTH="509" VALIGN="TOP">An escaped closing parenthesis.</td>  </tr></table><font SIZE="2"><p></font>BTW: the phrase <i>tagged regular expression</i> refers to any part of theregular expression that is, because it was surrounded by parenthesis, accessible after amatch has been made as a separate substring.&nbsp; See <ahref="#Regular Expression Syntax">here</a> for more information about Regular Expressionsyntax.</p><p>In English, we are looking for two fields. The first will be all characters from thestart of the line through to the second field (without any surrounding white space), andthe second will be all characters within parenthesis following the first field.</p><font FACE="Arial" SIZE="4"><b><a name="The Class"><p>The Class</a></b></font> </p><p>The library itself comes as two source files, Regexp.cpp and Regexp.h. The headerdefines the <tt>Regexp</tt> class with the following members:</p><p><strong><tt>Regexp::NSUBEXP </tt></strong></p><blockquote>  <p>A constant defining how many subexpressions that the library will support (usually 10),  attempting to use a regular expression with more than this number will generate an error.</p></blockquote><p><strong><tt>Regexp::Regexp() </tt></strong></p><blockquote>  <p>A boring constructor, this must be initialized by assignment before anything useful can  be done with it.</p></blockquote><p><strong><tt>Regexp::Regexp( TCHAR * exp, BOOL iCase = 0 ) </tt></strong></p><blockquote>  <p><tt>exp</tt> : </p></blockquote><blockquote>  <p>The regular expression itself, this format of which is defined later. The success or  failure of the compilation can be discovered by using either <tt>GetErrorString()</tt> or <tt>CompiledOK()</tt>.</p></blockquote><blockquote>  <p><tt>iCase</tt>:</p></blockquote><blockquote>  <p>If <tt>TRUE</tt> the regular expression is compiled so that differences in case are  ignored when matching.</p></blockquote><p><strong><tt>Regexp::Regexp( const Regexp &amp;r ) </tt></strong></p><blockquote>  <p>Construct a new regular expression taking the compiled form from another <tt>Regexp</tt>.  </p></blockquote><p><strong><tt>const Regexp::Regexp &amp; operator=( const Regexp &amp; r );</tt></strong></p><blockquote>  <p>Assign <tt>Regexp r</tt> to the current object.</p></blockquote><p><strong><tt>bool Regexp::Match( const TCHAR * s ); </tt></strong></p><blockquote>  <p>Examine the <tt>TCHAR</tt> array s with this regular expression, returning true if  there is a match. This match updates the state of this Regexp object so that the  substrings of the match can be obtained. The 0th substring is the substring of string that  matched the whole regular expression. The others are those substrings that matched  parenthesized expressions within the regular expression, with parenthesized expressions  numbered in left-to-right order of their opening parentheses. If a parenthesized  expression does not participate in the match at all, its length is 0. It is an error if  this <tt>Regexp</tt> has not been successfully initialized.</p></blockquote><p><strong><tt>int Regexp::SubStrings() const; </tt></strong></p><blockquote>  <p>Return the number of substrings found after a successful <tt>Match()</tt>.</p></blockquote><p><strong><tt>const CString Regexp::operator[]( unsigned int i ) const; </tt></strong></p><blockquote>  <p>Return the <tt>i</tt>th matched substring after a successful <tt>Match()</tt>.</p></blockquote><p><strong><tt>int Regexp::SubStart( unsigned int i ) const; </tt></strong></p><blockquote>  <p>Return the starting offset of the <tt>i</tt>th matched substring from the beginning of  the <tt>TCHAR </tt>array used in <tt>Match()</tt>.</p></blockquote><p><strong><tt>int Regexp::SubLength( unsigned int i ) const; </tt></strong></p><blockquote>  <p>Return the length of the <tt>i</tt>th matched substring</p></blockquote><blockquote>  <p>Using the same example Regexp as before:</p>  <pre><tt>Regexp re( &quot;^[\t ]*(.*)[\t ]*\\((.*)\\)&quot; );CString str( &quot;wyrdrune.com!kelly (Kelly)\n&quot; );if ( re.Match( str ) &amp;&amp; re.SubStrings() == 2 )
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -