⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chap34.htm

📁 介绍算法的书籍
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<pre>6              <B>then</B> <I>s</I> <img src="../images/arrlt12.gif"> <I>i</I> - <I>m</I></sub></sup></pre><P>
<pre>7                   print &quot;Pattern occurs with shift&quot; <I>s</I></sub></sup></pre><P>
The simple loop structure of <FONT FACE="Courier New" SIZE=2>FINITE</FONT>-<FONT FACE="Courier New" SIZE=2>AUTOMATON</FONT>-<FONT FACE="Courier New" SIZE=2>MATCHER</FONT> implies that its running time on a text string of length <I>n</I> is <I>O</I>(<I>n</I>). This running time, however, does not include the time required to compute the transition function <img src="../images/delta12.gif"><I></I>. We address this problem later, after proving that the procedure <FONT FACE="Courier New" SIZE=2>FINITE</FONT>-<FONT FACE="Courier New" SIZE=2>AUTOMATON</FONT>-M<FONT FACE="Courier New" SIZE=2>ATCHER </FONT>operates correctly.<P>
Consider the operation of the automaton on an input text <I>T</I>[1 . . <I>n</I>]. We shall prove that the automaton is in state <img src="../images/sum14.gif"><I></I>(<I>T<SUB>j</I></SUB>) after scanning character <I>T</I>[<I>i</I>]. Since <img src="../images/sum14.gif"><I></I>(<I>T<SUB>i</I></SUB>) = <I>m</I> if and only if <img src="866_b.gif">, the machine is in the accepting state <I>m</I> if and only if the pattern <I>P</I> has just been scanned. To prove this result, we make use of the following two lemmas about the suffix function <img src="../images/sum14.gif"><I>.</I><P>
<a name="09d0_1bf1">Lemma 34.2<a name="09d0_1bf1"><P>
<a name="09d0_1beb">For any string <I>x</I> and character <I>a</I><B>,</B> we have <img src="../images/sum14.gif"><I>(</I>xa<I>)</I> <I><img src="../images/lteq12.gif"> </I><img src="../images/sum14.gif"><I>(</I>x<I>) </I>+ <I>1</I>.<P>
<I><B>Proof     </I></B>Referring to Figure 34.7, let <I>r</I> = <img src="../images/sum14.gif"><I></I>(<I>xa</I>). If <I>r</I> = 0, then the conclusion <I>r</I> <img src="../images/lteq12.gif"> <img src="../images/sum14.gif"><I></I>(<I>x</I>) + 1 is trivially satisfied, by the nonnegativity of <img src="../images/sum14.gif"><I></I>(<I>x</I>). So assume that <I>r</I> &gt; 0. Now, <img src="866_c.gif">, by the definition of <img src="../images/sum14.gif"><I></I>. Thus, <img src="866_d.gif">, by dropping the <I>a</I> from the end of <I>P<SUB>r</I></SUB> and from the end of <I>xa</I>. Therefore, <I>r</I> - 1 <img src="../images/lteq12.gif"> <img src="../images/sum14.gif"><I></I>(<I>x</I>), since <img src="../images/sum14.gif"><I></I>(<I>x</I>) is largest <I>k</I> such that <img src="866_e.gif">.      <P>
<a name="09d0_1bf2">Lemma 34.3<a name="09d0_1bf2"><P>
<a name="09d0_1bec">For any string <I>x</I> and character <I>a</I>, if <I>q</I> = <img src="../images/sum14.gif"><I></I>(<I>x</I>), then <img src="../images/sum14.gif"><I></I>(<I>xa</I>) = <img src="../images/sum14.gif"><I></I>(<I>P<SUB>q</SUB>a</I>).<P>
<I><B>Proof</I></B>     From the definition of <img src="../images/sum14.gif">, we have <img src="866_f.gif">. As Figure 34.8 shows, <img src="866_g.gif">. If we let <I>r</I> = <img src="../images/sum14.gif"><I></I>(<I>xa</I>), then <I>r</I> <img src="../images/lteq12.gif"> <I>q</I> + 1 by Lemma 34.2. Since <img src="867_b.gif">, and <img src="../images/sglvrt.gif"><I>Pr</I><img src="../images/sglvrt.gif"> <img src="../images/lteq12.gif"> <img src="../images/sglvrt.gif"><I>Pqa</I><img src="../images/sglvrt.gif">, Lemma 34.1 implies that <img src="867_c.gif">. Therefore, <I>r </I><img src="../images/lteq12.gif"> <I><img src="../images/sum14.gif"></I><I></I>(<I>P<SUB>q</SUB>a)</I>, that is, <img src="../images/sum14.gif"><I></I>(<I>xa</I>) <img src="../images/lteq12.gif"> <img src="../images/sum14.gif"><I></I>(<I>P<SUB>q</SUB>a</I>). But we also have <img src="../images/sum14.gif"><I></I>(<I>P<SUB>q</SUB>a</I>) <img src="../images/lteq12.gif"> <img src="../images/sum14.gif"><I></I>(<I>xa</I>), since <img src="867_d.gif">. Thus, <img src="../images/sum14.gif"><I></I>(<I>xa</I>) = <img src="../images/sum14.gif"><I></I>(<I>P<SUB>q</SUB>a</I>).      <P>
<img src="867_a.gif"><P>
<h4><a name="09d0_1bf3">Figure 34.8 An illustration for the proof of Lemma 34.3. The figure shows that r = <img src="../images/sum14.gif">(P<SUB>q</SUB>a), where q = <img src="../images/sum14.gif">(x) and r = <img src="../images/sum14.gif">(xa).<a name="09d0_1bf3"></sub></sup></h4><P>
We are now ready to prove our main theorem characterizing the behavior of a string-matching automaton on a given input text. As noted above, this theorem shows that the automaton is merely keeping track, at each step, of the longest prefix of the pattern that is a suffix of what has been read so far.<P>
<a name="09d0_1bf4">Theorem 34.4<a name="09d0_1bf4"><P>
If <img src="867_e.gif"><I> </I>is the final-state function of a string-matching automaton for a given pattern <I>P</I> and <I>T</I>[1 . . <I>n</I>] is an input text for the automaton, then<P>
<img src="867_f.gif"><P>
<I><B>Proof     </I></B>The proof is by induction on <I>i</I>. For <I>i</I> = 0, the theorem is trivially true, since <I>T</I><SUB>0</SUB> = <img src="../images/epsilon.gif"><I></I>. Thus, <img src="867_g.gif">.<P>
Now, we assume that <img src="867_h.gif"> and prove that <img src="867_i.gif">. Let <I>q</I> denote <img src="867_j.gif">, and let <I>a</I> denote <I>T</I>[<I>i</I> + 1]. Then,<P>
<img src="867_k.gif"><P>
By induction, the theorem is proved.      <P>
By Theorem 34.4, if the machine enters state <I>q</I> on line 4, then <I>q</I> is the largest value such that <img src="867_l.gif">. Thus, we have <I>q = m</I> on line 5 if and only if an occurrence of the pattern <I>P</I> has just been scanned. We conclude that <FONT FACE="Courier New" SIZE=2>FINITE</FONT>-<FONT FACE="Courier New" SIZE=2>AUTOMATON</FONT>-<FONT FACE="Courier New" SIZE=2>MATCHER</FONT> operates correctly.<P>
<P>







<h2>Computing the transition function</h2><P>
<a name="09d1_1bed">The following procedure computes the transition function <img src="../images/delta12.gif"> <I>from a given pattern </I>P<I>[1 . . </I>m<I>].</I><P>
<pre><a name="09d1_1bee">COMPUTE-TRANSITION-FUNCTION(<I>P</I>, <img src="../images/sum14.gif">)</sub></sup></pre><P>
<pre>1 <I>m</I> <img src="../images/arrlt12.gif"> <I>length</I>[<I>P</I>]</sub></sup></pre><P>
<pre>2 <B>for</B> <I>q</I> <img src="../images/arrlt12.gif"> 0 <B>to</B> <I>m</I></sub></sup></pre><P>
<pre>3     <B>do for</B> each character <I>a</I> <img src="../images/memof12.gif"> *</sub></sup></pre><P>
<pre>4            <B>do</B> <I>k</I> <img src="../images/arrlt12.gif"> min (<I>m</I> + 1, <I>q</I> + 2)</sub></sup></pre><P>
<pre>5               <B>repeat</B> <I>k</I> <img src="../images/arrlt12.gif"> <I>k</I> - 1</sub></sup></pre><P>
<img src="868_a.gif"><P>
<pre>7             <img src="../images/delta12.gif"><I>(</I>q, a<I>) <img src="../images/arrlt12.gif"> </I>k</sub></sup></pre><P>
<pre>8 <B>return</B> <img src="../images/delta12.gif"></sub></sup></pre><P>
This procedure computes <img src="../images/delta12.gif"><I>(</I>q, a<I>) in a straightforward manner according to its definition. The nested loops beginning on lines 2 and 3 consider all states </I>q<I> and characters </I>a<I>, and lines 4-7 set <img src="../images/delta12.gif"></I>(<I>q, a</I>) to be the largest <I>k</I> such that <img src="868_b.gif">. The code starts with the largest conceivable value of <I>k</I>, which is min(<I>m, q</I> + 1), and decreases <I>k</I> until <img src="868_c.gif">.<P>
The running time of <FONT FACE="Courier New" SIZE=2>COMPUTE</FONT>-<FONT FACE="Courier New" SIZE=2>TRANSITION</FONT>-<FONT FACE="Courier New" SIZE=2>FUNCTION</FONT> is <I>O</I>(<I>m</I><SUP>3</SUP><img src="../images/sglvrt.gif"><img src="../images/sum14.gif"><img src="../images/sglvrt.gif">), because the outer loops contribute a factor of <I>m </I><img src="../images/sglvrt.gif"><img src="../images/sum14.gif"><img src="../images/sglvrt.gif"> , the inner <B>repeat</B> loop can run at most <I>m</I> + 1 times, and the test <img src="868_d.gif"> on line 6 can require comparing up to <I>m</I> characters. Much faster procedures exist; the time required to compute <img src="../images/delta12.gif"><I> </I>from <I>P</I> can be improved to <I>O</I>(<I>m</I><img src="../images/sglvrt.gif"><img src="../images/sum14.gif"><img src="../images/sglvrt.gif">) by utilizing some cleverly computed information about the pattern <I>P</I> (see Exercise 34.4-6). With this improved procedure for computing <img src="../images/delta12.gif">, the total running time to find all occurrences of a length-<I>m</I> pattern in a length-<I>n</I> text over an alphabet <img src="../images/sum14.gif"> is <I>O(n</I> + <I>m</I><img src="../images/sglvrt.gif"><img src="../images/sum14.gif"><img src="../images/sglvrt.gif">).<P>
<P>







<h2><a name="09d2_1bf2">Exercises<a name="09d2_1bf2"></h2><P>
<a name="09d2_1bf3">34.3-1<a name="09d2_1bf3"><P>
Construct the string-matching automaton for the pattern <I>P</I> = <FONT FACE="Courier New" SIZE=2>aabab</FONT> and illustrate its operation on the text string <I>T</I> = <FONT FACE="Courier New" SIZE=2>aaababaabaababaab</FONT>.<P>
<a name="09d2_1bf4">34.3-2<a name="09d2_1bf4"><P>
Draw a state-transition diagram for a string-matching automaton for the pattern <FONT FACE="Courier New" SIZE=2>ababbabbababbababbabb</FONT> over the alphabet <img src="../images/sum14.gif"> = {<FONT FACE="Courier New" SIZE=2>a</FONT>, <FONT FACE="Courier New" SIZE=2>b</FONT>}.<P>
<a name="09d2_1bf5">34.3-3<a name="09d2_1bf5"><P>
<a name="09d2_1bef">We call a pattern <I>P</I> <I><B>nonoverlappable</I></B> if <img src="868_e.gif"> implies <I>k</I> = 0 or <I>k</I> = <I>q</I>. Describe the state-transition diagram of the string-matching automaton for a nonoverlappable pattern.<P>
<a name="09d2_1bf6">34.3-4<a name="09d2_1bf6"><P>
<a name="09d2_1bf0">Given two patterns <I>P</I> and <I>P</I>', describe how to construct a finite automaton that determines all occurrences of <I>either</I> pattern. Try to minimize the number of states in your automaton.<P>
<a name="09d2_1bf7">34.3-5<a name="09d2_1bf7"><P>
<a name="09d2_1bf1">Given a pattern <I>P</I> containing gap characters (see Exercise 34.1-5), show how to build a finite automaton that can find an occurrence of <I>P</I> in a text <I>T </I>in <I>O</I>(<I>n</I>) time, where <I>n</I> = <img src="../images/sglvrt.gif"><I>T</I><img src="../images/sglvrt.gif">.<P>
<P>


<P>


⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -