📄 chap32.htm
字号:
<pre>5 <I>w<SUB>m </I></SUB><img src="../images/arrlt12.gif"> e<SUP>2</SUP><img src="../images/piuc.gif"><SUP><I>i</I>/<I>m</I></sub></sup></pre><P>
<pre>6 <I>w </I><img src="../images/arrlt12.gif"> 1</sub></sup></pre><P>
<pre>7<B> for</B> <I>j</I> <img src="../images/arrlt12.gif"> 0 <B>to</B> <I>m</I>/2 - 1</sub></sup></pre><P>
<pre>8<B> do for</B> <I>k</I> <img src="../images/arrlt12.gif"> <I>j</I> <B>to</B> <I>n</I> - 1 <B>by</B> <I>m</I></sub></sup></pre><P>
<pre>9<B> do</B> <I>t</I> <img src="../images/arrlt12.gif"> <I>w</I> <I>A</I>[<I>k</I> + <I>m</I>/2]</sub></sup></pre><P>
<pre>10 <I>u</I> <img src="../images/arrlt12.gif"> <I>A</I>[<I>k</I>]</sub></sup></pre><P>
<pre>11 <I>A</I>[<I>k</I>] <img src="../images/arrlt12.gif"> <I>u</I> + <I>t</I></sub></sup></pre><P>
<pre>12 <I>A</I>[<I>k</I> + <I>m</I>/2] <img src="../images/arrlt12.gif"> <I>u</I> - <I>t</I></sub></sup></pre><P>
<pre>13 <I>w </I><img src="../images/arrlt12.gif"> <I>ww<SUB>m</I></sub></sup></pre><P>
<pre>14<B> return</B> <I>A</I></sub></sup></pre><P>
How does B<FONT FACE="Courier New" SIZE=2>IT-</FONT>R<FONT FACE="Courier New" SIZE=2>EVERSE-</FONT><FONT FACE="Courier New" SIZE=2>COPY</FONT> get the elements of the input vector <I>a</I> into the desired order in the array <I>A</I>? The order in which the leaves appear in Figure 32.4 is "bit-reverse binary." That is, if we let rev(<I>k</I>) be the lg <I>n</I>-bit integer formed by reversing the bits of the binary representation of <I>k</I>, then we want to place vector element <I>a<SUB>k</I></SUB> in array position <I>A</I>[rev(<I>k</I>)]. In Figure 32.4, for example, the leaves appear in the order 0, 4, 2, 6, 1, 5, 3, 7; this sequence in binary is 000, 100, 010, 110, 001, 101, 011, 111, and in bit- reverse binary we get the sequence 000, 001, 010, 011, 100, 101, 110, 111. To see that we want bit-reverse binary order in general, we note that at the top level of the tree, indices whose low-order bit is 0 are placed in the left subtree and indices whose low-order bit is 1 are placed in the right subtree. Stripping off the low-order bit at each level, we continue this process down the tree, until we get the bit-reverse binary order at the leaves.<P>
Since the function rev (<I>k</I>) is easily computed, the B<FONT FACE="Courier New" SIZE=2>IT-</FONT>R<FONT FACE="Courier New" SIZE=2>EVERSE-</FONT>C<FONT FACE="Courier New" SIZE=2>OPY </FONT>procedure can be written as follows.<P>
<pre><a name="0998_1b23">BIT-REVERSE-COPY(<I>a, A</I>)</sub></sup></pre><P>
<pre>1 <I>n</I> <img src="../images/arrlt12.gif"> <I>length</I>[<I>a</I>]</sub></sup></pre><P>
<pre>2<B> for </B><I>k</I> <img src="../images/arrlt12.gif"> 0 <B>to</B> <I>n</I> - 1</sub></sup></pre><P>
<pre>3<B> do </B><I>A</I>[rev(<I>k</I>)] <img src="../images/arrlt12.gif"> <I>a<SUB>k</I></sub></sup></pre><P>
The iterative FFT implementation runs in time <img src="../images/bound.gif">(<I>n </I>lg <I>n). </I>The call to B<FONT FACE="Courier New" SIZE=2>IT-</FONT>R<FONT FACE="Courier New" SIZE=2>EVERSE-</FONT><FONT FACE="Courier New" SIZE=2>COPY</FONT>(<I>a,A</I>) certainly runs in <I>O(n </I>lg <I>n</I>) time, since we iterate <I>n</I> times and can reverse an integer between 0 and <I>n</I> - 1, with lg <I>n</I> bits, in <I>O(</I>lg <I>n</I>) time. (In practice, we usually know the initial value of <I>n</I> in advance, so we would probably code a table mapping <I>k</I> to rev(<I>k</I>), making B<FONT FACE="Courier New" SIZE=2>IT-</FONT>R<FONT FACE="Courier New" SIZE=2>EVERSE-</FONT><FONT FACE="Courier New" SIZE=2>COPY</FONT> run in <img src="../images/bound.gif">(<I>n</I>) time with a low hidden constant. Alternatively, we could use the clever amortized reverse binary counter scheme described in Problem 18-1.) To complete the proof that I<FONT FACE="Courier New" SIZE=2>TERATIVE-</FONT>FFT runs in time <img src="../images/bound.gif">(<I>n </I>lg <I>n</I>), we show that <I><FONT FACE="Courier New" SIZE=2>L(n</I></FONT>), the number of times the body of the innermost loop (lines 9-12) is executed, is <img src="../images/bound.gif">(<I>n </I>lg <I>n</I>). We have<P>
<img src="795_a.gif"><P>
<P>
<h2>A parallel FFT circuit</h2><P>
<a name="0999_1b24"><a name="0999_1b25">We can exploit many of the properties that allowed us to implement an efficient iterative FFT algorithm to produce an efficient parallel algorithm for the FFT. (See Chapter 29 for a description of the combinational-circuit model.) The combinational circuit P<FONT FACE="Courier New" SIZE=2>ARALLEL-</FONT>FFT that computes the FFT on <I>n</I> inputs is shown in Figure 32.5 for <I>n</I> = 8. The circuit begins with a bi-reverse permutation of the inputs, followed by lg <I>n</I> stages, each stage consisting of <I>n</I>/2 butterflies executed in parallel. The depth of the circuit is therefore <img src="../images/bound.gif">(lg <I>n</I>).<P>
The leftmost part of the circuit P<FONT FACE="Courier New" SIZE=2>ARALLEL-</FONT>FFT performs the bit-reverse permutation, and the remainder mimics the iterative FFT-<FONT FACE="Courier New" SIZE=2>BASE</FONT> procedure. We take advantage of the fact that each iteration of the outermost <B>for</B> loop performs <I>n</I>/2 independent butterfly operations that can be performed in parallel. The value of s in each iteration within FFT-<FONT FACE="Courier New" SIZE=2>BASE</FONT> corresponds to a stage of butterflies shown in Figure 32.5. Within stage s, for <I>s</I> = 1, 2, . . . , lg <I>n</I>, there are <I>n</I>/2<I><SUP>s</I></SUP> groups of butterflies (corresponding to each value of <I>k</I> in FFT-<FONT FACE="Courier New" SIZE=2>BASE</FONT>), with 2<I><SUP>s</I>-1</SUP> butterflies per group (corresponding to each value of <I>j</I> in FFT-<FONT FACE="Courier New" SIZE=2>BASE</FONT>). The butterflies shown in Figure 32.5 correspond to the butterfly operations of the innermost loop (lines 8-11 of FFT-<FONT FACE="Courier New" SIZE=2>BASE</FONT>). Note also that the values of <I>w</I> used in the butterflies correspond to those used in FFT-<FONT FACE="Courier New" SIZE=2>BASE</FONT>: in stage <I>s</I>, we use <img src="796_b.gif">, where <I>m = 2<SUP>s</SUP>.</I><P>
<img src="796_a.gif"><P>
<h4><a name="0999_1b26">Figure 32.5 A combinational circuit <FONT FACE="Courier New" SIZE=2>PARALLEL-FFT</FONT> that computes the FFT, here shown on n = 8 inputs. The stages of butterflies are labeled to correspond to iterations of the outermost loop of the FFT-BASE procedure. An FFT on n inputs can be computed in <img src="../images/bound.gif">(1g n) depth with <img src="../images/bound.gif">(n lg n) combinational elements.<a name="0999_1b26"></sub></sup></h4><P>
<P>
<h2><a name="099a_0001">Exercises<a name="099a_0001"></h2><P>
<a name="099a_0002">32.3-1<a name="099a_0002"><P>
Show how I<FONT FACE="Courier New" SIZE=2>TERATIVE-</FONT>FFT computes the DFT of the input vector (0, 2, 3, -1, 4, 5, 7, 9).<P>
<a name="099a_0003">32.3-2<a name="099a_0003"><P>
Show how to implement an FFT algorithm with the bit-reversal permutation occurring at the end, rather than at the beginning, of the computation. (<I>Hint</I>: Consider the inverse DFT.)<P>
<a name="099a_0004">32.3-3<a name="099a_0004"><P>
To compute DFT<I><SUB>n</I></SUB>, how many addition, subtraction, and multiplication elements, and how many wires, are needed in the <FONT FACE="Courier New" SIZE=2>PARALLEL</FONT>-FFT circuit described in this section? (Assume that only one wire is needed to carry a number from one place to another.)<P>
<a name="099a_0005">32.3-4<a name="099a_0005"><P>
Suppose that the adders in the FFT circuit sometimes fail in such a manner that they always produce a zero output, independent of their inputs. Suppose that exactly one adder has failed, but that you don't know which one. Describe how you can identify the failed adder by supplying inputs to the overall FFT circuit and observing the outputs. Try to make your proeedure efficient.<P>
<P>
<P>
<h1><a name="099b_1b2f">Problems<a name="099b_1b2f"></h1><P>
<a name="099b_1b30">32-1 Divide-and-conquer multiplication<a name="099b_1b30"><P>
<a name="099b_1b26"><a name="099b_1b27"><a name="099b_1b28"><I><B>a.</I></B> Show how to multiply two linear polynomials <I>ax</I> + <I>b</I> and <I>cx</I> + <I>d</I> using only three multiplications. (<I>Hint</I>: One of the multiplications is (<I>a</I> + <I>b</I>) <img src="../images/dot10.gif"> (<I>c</I> + <I>d</I>).)<P>
<I><B>b.</I></B> Give two divide-and-conquer algorithms for multiplying two polynomials of degree-bound <I>n</I> that run in time <img src="../images/bound.gif">(<I>n</I><SUP>1<I>g</I> 3</SUP>). The first algorithm should divide the input polynomial coefficients into a high half and a low half, and the second algorithm should divide them according to whether their index is odd or even.<P>
<I><B>c.</I></B> Show that two <I>n</I>-bit integers can be multiplied in <I>O</I>(<I>n</I><SUP>1<I>g</I> 3</SUP>) steps, where each step operates on at most a constant number of 1-bit values.<P>
<a name="099b_1b31">32-2 Toeplitz matrices<a name="099b_1b31"><P>
<a name="099b_1b29"><a name="099b_1b2a">A <I><B>Toeplitz matrix</I></B> is an <I>n</I> X <I>n</I> matrix <I>A</I> = (<I>a<SUB>ij</I></SUB>) such that <I>a<SUB>ij</I></SUB> = <I>a<SUB>i</I>-1, <I>j</I>-1</SUB> for <I>i</I> = 2, 3, . . . , <I>n</I> and <I>j</I> = 2, 3, . . . , <I>n</I>.<P>
<I><B>a. </I></B>Is the sum of two Toeplitz matrices necessarily Toeplitz? What about the product?<P>
<I><B>b.</I></B> Describe how to represent a Toeplitz matrix so that two <I>n</I> X <I>n</I> Toeplitz matrices ean be added in <I>O</I>(<I>n</I>) time<I>.</I><P>
<I><B>c. </I></B>Give an<I> O</I>(<I>n </I>lg <I>n</I>)-time algorithm for multiplying an <I>n</I> X <I>n</I> Toeplitz matrix by a vector of length <I>n.</I> Use your representation from part (b).<P>
<I><B>d.</I></B> Give an efficient algorithm for multiplying two <I>n</I> X <I>n</I> Toeplitz matrices. Analyze its running time.<P>
<a name="099b_1b32">32-3 Evaluating all derivatives of a polynomial at a point<a name="099b_1b32"><P>
<a name="099b_1b2b">Given a polynomial <I>A</I>(<I>x</I>) of degree-bound <I>n</I>, its <I>t</I>th derivative is defined by<P>
<img src="798_a.gif"><P>
From the coefficient representation (<I>a</I><SUB>0</SUB>, <I>a</I><SUB>1</SUB>, . . . , <I>a<SUB>n</I>-1</SUB>) of <I>A</I>(<I>x</I>) and a given point <I>x</I><SUB>0</SUB>, we wish to determine <I>A</I><SUP>(<I>t</I>)</SUP> (<I>x</I><SUB>0</SUB>) for <I>t</I> = 0, 1, . . . , <I>n - </I>1<I>.</I><P>
<I><B>a.</I></B> Given coefficients <I>b</I><SUB>0</SUB>, <I>b</I><SUB>1</SUB>, . . . , <I>b<SUB>n</I>-1</SUB> such that<P>
<img src="798_b.gif"><P>
show how to compute <I>A</I><SUP>(<I>t</I>) </SUP>(<I>x</I><SUB>0</SUB>), for <I>t</I> = 0, 1, . . . , <I>n</I> - 1, in <I>O</I>(<I>n</I>) time.<P>
<I><B>b. </I></B>Explain how to find <I>b</I><SUB>0</SUB>, <I>b</I><SUB>1</SUB>, . . . , <I>b<SUB>n</I>-1</SUB> in <I>O</I>(<I>n </I>lg <I>n</I>) time, given <img src="798_c.gif"> for <I>k</I> = 0, 1, . . . , <I>n</I> - 1.<P>
<I><B>c.</I></B> Prove that<P>
<img src="798_d.gif"><P>
where <I>f</I>(<I>j</I>) = <I>a<SUB>j</I></SUB> <img src="../images/dot10.gif"> <I>j</I>! and<P>
<img src="798_e.gif"><P>
<I><B>d.</I></B> Explain how to evaluate <img src="798_f.gif"> for <I>k</I> = 0, 1, . . . , <I>n</I> - 1 in <I>O</I>(<I>n</I> lg <I>n</I>) time. Conclude that all nontrivial derivatives of <I>A</I>(<I>x</I>) can be evaluated at <I>x</I><SUB>0</SUB> in <I>O</I>(<I>n</I> lg <I>n</I>) time.<P>
<a name="099b_1b33">32-4 Polynomial evaluation at multiple points<a name="099b_1b33"><P>
<a name="099b_1b2c"><a name="099b_1b2d">We have observed that the problem of evaluating a polynomial of degree-bound <I>n</I> - 1 at a single point can be solved in <I>O</I>(<I>n</I>) time using Horner's rule. We have also discovered that such a polynomial can be evaluated at all <I>n</I> complex roots of unity in <I>O</I>(<I>n</I> lg <I>n</I>) time using the FFT. We shall now show how to evaluate a polynomial of degree-bound <I>n</I> at <I>n</I> arbitrary points in <I>O</I>(<I>n</I> lg<SUP>2</SUP> <I>n</I>) time.<P>
To do so, we shall use the fact that we can compute the polynomial remainder when one such polynomial is divided by another in <I>O</I>(<I>n </I>lg <I>n</I>) time, a result that we assume without proof. For example, the remainder of 3<I>x</I><SUP>3</SUP> + <I>x</I><SUP>2</SUP> - 3<I>x</I> + 1 when divided by <I>x</I><SUP>2</SUP> + <I>x</I> + 2 is<P>
<pre>(3<I>x</I><SUP>3</SUP> + <I>x</I><SUP>2</SUP> - 3<I>x</I> + 1) mod (<I>x</I><SUP>2</SUP> + <I>x</I> + 2) = 5<I>x</I> - 3.</sub></sup></pre><P>
Given the coefficient representation of a polynomial <img src="799_a.gif"> and <I>n</I> points <I>x</I><SUB>0</SUB>, <I>x</I><SUB>1</SUB>, . . . , <I>x</I><SUB>n-1</SUB>, we wish to compute the <I>n</I> values <I>A</I>(<I>x</I><SUB>0</SUB>), <I>A</I>(<I>x</I><SUB>1</SUB>), <I>. . . , A</I>(<I>x<SUB>n-</I>1</SUB>). For 0 <img src="../images/lteq12.gif"> <I>i</I> <img src="../images/lteq12.gif"> <I>j</I> <img src="../images/lteq12.gif"> <I>n</I> - 1, define the polynomials <img src="799_b.gif"> and <I>Q<SUB>ij</I></SUB>(<I>x</I>) = <I>A</I>(<I>x</I>) mod <I>p<SUB>ij</I></SUB>(<I>x</I>). Note that <I>Q<SUB>ij</I></SUB>(<I>x</I>) has degree-bound at most <I>j - i</I>.<P>
<I><B>a.</I></B> Prove that <I>A</I>(<I>x</I>) mod (<I>x - z</I>) = <I>A</I>(<I>z</I>) for any point <I>z</I>.<P>
<I><B>b.</I></B> Prove that <I>Q<SUB>kk</I></SUB>(<I>x</I>) = <I>A</I>(<I>x<SUB>k</I></SUB>) and that <I>Q</I><SUB>0<I>,n</I>-1</SUB>(<I>x</I>) = <I>A</I>(<I>x</I>).<P>
<I><B>c.</I></B> Prove that for <I>i</I> <img src="../images/lteq12.gif"> <I>k</I> <img src="../images/lteq12.gif"> <I>j</I>, we have <I>Q<SUB>ik</I></SUB>(<I>x</I>) = <I>Q<SUB>ij</I></SUB>(<I>x</I>) mod <I>P<SUB>ik</I></SUB>(<I>x</I>) and <I>Q<SUB>kj</I></SUB>(<I>x</I>) = <I>Q<SUB>ij</I>(<I>x</I>) mod <I>Pkj</I></SUB> (<I>x</I>).<P>
<I><B>d.</I></B> Give an <I>O</I>(<I>n</I> lg<SUP>2</SUP> <I>n</I>)-time algorithm to evaluate <I>A</I>(<I>x<SUB>0</I></SUB>), <I>A</I>(<I>x</I><SUB>1</SUB>), . . . , <I>A</I>(<I>x<SUB>n-</I>1</SUB>).<P>
<a name="099b_1b34">32-5 FFT using modular arithmetic<a name="099b_1b34"><P>
<a name="099b_1b2e">As defined, the Discrete Fourier Transform requires the use of complex numbers, which can result in a loss of precision due to round-off errors. For some problems, the answer is known to contain only integers, and it is desirable to utilize a variant of the FFT based on modular ari
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -