📄 chap29.htm
字号:
<pre>[<I>i</I>,<I>j</I>] = <I>x<SUB>i</SUB> </I><img src="../images/circx.gif"> <I>x<SUB>i</I>+1</SUB> <img src="../images/circx.gif"> <img src="../images/circx.gif"><img src="../images/circx.gif"><img src="../images/circx.gif"> <img src="../images/circx.gif"> <I>x<SUB>j</I></SUB>.</sub></sup></pre><P>
Thus, for <I>i</I> = 0, 1, . . . , <I>n,</I> we have [<I>i, i</I>] <I>= x<SUB>i</SUB>,</I> since the composition of just one carry status <I>x<SUB>i</SUB> </I>is itself. For <I>i, j,</I> and <I>k</I> satisfying 0 <img src="../images/lteq12.gif"> <I>i</I> < <I>j</I> <img src="../images/lteq12.gif"> <I>k</I> <img src="../images/lteq12.gif"> <I>n</I>, we also have the identity<P>
<pre>[<I>i</I>,<I>k</I>] = [<I>i</I>,<I>j</I> - 1] <img src="../images/circx.gif"> [<I>j</I>,<I>k</I>],</sub></sup></pre><P>
<h4><a name="0937_1988">(29.5)<a name="0937_1988"></sub></sup></h4><P>
since the carry-status operator is associative. The goal of a prefix computation, in terms of this notation, is to compute <I>y<SUB>i</I></SUB> = [0, <I>i</I>] for <I>i= </I>0, 1<I>, . . . , n.</I><P>
The only combinational element used in the parallel prefix circuit is a circuit that computes the <img src="../images/circx.gif"> operator. Figure 29.8 shows how pairs of <img src="../images/circx.gif"> elements are organized to form the internal nodes of a complete binary tree, and Figure 29.9 illustrates the parallel prefix circuit for <I>n</I> = 8. Note that the wires in the circuit follow the structure of a tree, but the circuit itself is not a tree, although it is purely combinational. The inputs <I>x</I><SUB>1</SUB>, <I>x</I><SUB>2</SUB>, . . . , <I>x<SUB>n </I></SUB>are supplied at the leaves, and the input <I>x<SUB>0</I></SUB> is provided at the root. The outputs <I>y</I><SUB>0</SUB><I>, y</I><SUB>1</SUB><I>, . . . , y<SUB>n - </I>1</SUB> are produced at leaves, and the output <I>y<SUB>n</SUB> </I>is produced at the root. (For ease in understanding the prefix computation, variable indices increase from left to right in Figures 29.8 and 29.9, rather than from right to left as in other figures of this section.)<P>
The two <img src="../images/circx.gif"> elements in each node typically operate at different times and have different depths in the circuit. As shown in Figure 29.8, if the subtree rooted at a given node spans some range <I>x<SUB>i</I></SUB>, <I>x<SUB>i</I>+1</SUB>, . . . , <I>x<SUB>k </I></SUB>of inputs, its left subtree spans the range <I>x<SUB>i</I></SUB>, <I>x<SUB>i</I>+1</SUB>, . . . , <I>x<SUB>j </I>- 1</SUB>, and its right subtree spans the range <I>x<SUB>j</I></SUB>, <I>x<SUB>j</I>+1</SUB>, . . . , <I>x<SUB>k</I></SUB>, then the node must produce for its parent the product [<I>i, k</I>] of all inputs spanned by its subtree. Since we can assume inductively that the node's left and right children produce the products [<I>i, j - </I>1] and [<I>j, k</I>], the node simply uses one of its two elements to compute [<I>i, k</I>]<I> </I><img src="../images/arrlt12.gif"> <I>[</I>i, j - <I>1] <img src="../images/circx.gif"> </I>[<I>j, k</I>].<P>
Some time after this upward phase of computation, the node receives from its parent the product [0, <I>i</I> - 1] of all inputs that come before the leftmost input <I>x<SUB>i</I></SUB> that it spans. The node now likewise computes values for its children. The leftmost input spanned by the node's left child is also <I>x<SUB>i</I></SUB>, and so it passes the value [0, <I>i</I> - 1] to the left child unchanged. The leftmost input spanned by its right child is <I>x<SUB>j</I></SUB>, and so it must produce [0, <I>j</I> - 1]. Since the node receives the value [0, <I>i</I> - 1] from its parent and the value [<I>i</I>,<I> j</I> - 1] from its left child, it simply computes [0, <I>j</I> - 1] <img src="../images/arrlt12.gif"> [0, <I>i</I> - 1] <img src="../images/circx.gif"> [<I>i</I>, k] and sends this value to the right child.<P>
<img src="666_a.gif"><P>
<h4><a name="0937_1989">Figure 29.8 The organization of a parallel prefix circuit. The node shown is the root of a subtree whose leaves input the values x<SUB>i</SUB> to x<SUB>k</SUB>. The node's left subtree spans inputs x<SUB>i</SUB> to x<SUB>j - 1</SUB>, and its right subtree spans inputs x<SUB>j</SUB> to x<SUB>k</SUB>. The node consists of two <img src="../images/circx.gif"> elements, which operate at different times during the operation of the circuit. One element computes [i, k] <img src="../images/arrlt12.gif"> [i, j - 1] <img src="../images/circx.gif"> [j, k], and the other element computes [0, j - 1] <img src="../images/arrlt12.gif"> [0, i - 1] <img src="../images/circx.gif"> [i, j - 1]. The values computed are shown on the wires.<a name="0937_1989"></sub></sup></h4><P>
Figure 29.9 shows the resulting circuit, including the boundary case that arises at the root. The value <I>x</I><SUB>0</SUB> = [0, 0] is provided as input at the root, and one more <img src="../images/circx.gif"> element is used to compute (in general) the value <I>y<SUB>n</SUB> = </I>[0<I>, n</I>] = [0, 0] <img src="../images/circx.gif"> [1, <I>n</I>].<P>
If <I>n</I> is an exact power of 2, then the parallel prefix circuit uses 2<I>n</I> - 1 <img src="../images/circx.gif"> elements. It takes only <I>O</I>(lg<I> n</I>) time to compute all <I>n</I> + 1 prefixes, since the computation proceeds up the tree and then back down. Exercise 29.2-5 studies the depth of the circuit in more detail.<P>
<P>
<h3>Completing the carry-lookahead adder</h3><P>
<a name="0938_1988">Now that we have a parallel prefix circuit, we can complete the description of the carry-lookahead adder. Figure 29.10 shows the construction. An <I>n</I>-bit <I><B>carry-lookahead adder</I></B> consists of <I>n</I> + 1 <I><B>KPG boxes</I></B>, each of <img src="../images/bound.gif">(1) size, and a parallel prefix circuit with inputs <I>x</I><SUB>0</SUB><I>, x</I><SUB>1</SUB><I>, . . . , x<SUB>n</SUB> (x</I><SUB>0</SUB> is hardwired to <FONT FACE="Courier New" SIZE=2>k</FONT>) and outputs <I>y</I><SUB>0</SUB><I>, y</I><SUB>1</SUB><I>, . . . , y<SUB>n</I></SUB>. KPG box <I>KPG<B><SUB>i</I></B></SUB> takes external inputs <I>a<SUB>i </I></SUB>and <I>b<SUB>i</I></SUB> and produces sum bit <I>s<SUB>i</I></SUB>. (Input bits <I>a<SUB>n</I></SUB> and <I>b<SUB>n</I></SUB> are hardwired to 0.) Given <I>a<SUB>i-</I>1</SUB> and <I>b<SUB>i-</I>1</SUB>, box<I> KPG<SUB>i</SUB><B><SUB>-</I></B>1</SUB> computes <I>x<SUB>i</SUB> </I><img src="../images/memof12.gif"><I> </I>{<FONT FACE="Courier New" SIZE=2>k,p,</FONT>g} according to equation (29.3) and sends this value as the external input <I>x<SUB>i</SUB> </I>of the parallel prefix circuit. (The value of <I>x<SUB>n</I>+1</SUB> is ignored.) Computing all the <I>x<SUB>i</I></SUB> takes <img src="../images/bound.gif">(1) time. After a delay of <I>O</I>(lg <I>n</I>), the parallel prefix circuit produces <I>y</I><SUB>0</SUB>, <I>y</I><SUB>1</SUB>, . . . , <I>y</I><SUB>n.</SUB> By Lemma 29.1, <I>y</I><SUB>i</SUB> is either <FONT FACE="Courier New" SIZE=2>k</FONT> or g; it cannot be <FONT FACE="Courier New" SIZE=2>p</FONT>. Each value <I>y<SUB>i</I></SUB> indicates the carry-in to full adder <I>FA<SUB>i</SUB> </I>in the ripple-carry adder: <I>y<SUB>i</SUB> </I>= <FONT FACE="Courier New" SIZE=2>k</FONT> implies <I>c<SUB>i</SUB> </I>= 0, and <I>y<SUB>i</SUB> </I>= g implies <I>c<SUB>i</SUB> </I>= 1. Thus, the value of <I>y<SUB>i</SUB> </I>is fed into <I>KPG<SUB>i</SUB> </I>to indicate the carry-in <I>c<SUB>i</I></SUB>, and the sum bit <I>s<SUB>i</SUB> </I>= parity(<I>a<SUB>i</I></SUB>, <I>b<SUB>i</I></SUB>, <I>c<SUB>i</I></SUB>) is produced in constant time. Thus, the carry-lookahead adder operates in <I>O</I>(lg <I>n</I>) time and has <img src="../images/bound.gif">(<I>n</I>) size.<P>
<img src="667_a.gif"><P>
<h4><a name="0938_1989">Figure 29.9 A parallel prefix circuit for n = 8. (a) The overall structure of the circuit, and the values carried on each wire. (b) The same circuit with values corresponding to Figures 29.3 and 29.7.<a name="0938_1989"></sub></sup></h4><P>
<img src="668_a.gif"><P>
<h4><a name="0938_198a">Figure 29.10 The construction of an n-bit carry-lookahead adder, shown here for n = 8. It consists of n + 1 KPG boxes KPG<SUB>i</SUB> <FONT FACE="Times New Roman" SIZE=2>for i = 0, 1, . . . , n. Each box KPG<SUB>i </SUB><FONT FACE="Times New Roman" SIZE=2>takes external inputs a<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2> and b<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2> (where a<SUB>n</SUB><FONT FACE="Times New Roman" SIZE=2> and b<SUB>n</SUB><FONT FACE="Times New Roman" SIZE=2> are hardwired to 0, as indicated by the diamond) and computes carry status x<SUB>i+1</SUB><FONT FACE="Times New Roman" SIZE=2>. These values are fed into the parallel prefix circuit, which returns the results y<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2> of the prefix computation. Each box KPG<SUB>i</SUB> now takes y<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2> as input, interprets it as the carry-in bit c<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2>, and then outputs the sum bit s<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2> = parity (a<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2>, b<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2>, c<SUB>i</SUB><FONT FACE="Times New Roman" SIZE=2>). Sample values corresponding to those shown in Figures 29.3 and 29.9 are shown.<a name="0938_198a"></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></FONT></sub></sup></h4><P>
<P>
<P>
<h2><a name="0939_198b">29.2.3 Carry-save addition<a name="0939_198b"></h2><P>
<a name="0939_1989"><a name="0939_198a">A carry-lookahead adder can add two <I>n</I>-bit numbers in <I>O</I>(lg<I> n</I>) time. Perhaps surprisingly, adding three <I>n</I>-bit numbers takes only a constant additional amount of time. The trick is to reduce the problem of adding three numbers to the problem of adding just two numbers.<P>
Given three <I>n</I>-bit numbers <I>x</I> = <img src="../images/lftwdchv.gif"><I>x<SUB>n-</I>1</SUB>, <I>x<SUB>n</I>-2</SUB>, <I>. . . . , x</I><SUB>0</SUB><img src="../images/wdrtchv.gif">, <I>y </I>= <img src="../images/lftwdchv.gif"><I>y<SUB>n-</I>1</SUB>, <I>y<SUB>n-</I>2</SUB>, <I>. . . , y</I><SUB>0</SUB><img src="../images/wdrtchv.gif">, and <I>z</I> = <img src="../images/lftwdchv.gif"><I>z<SUB>n-</I>1</SUB>,<I>z<SUB>n-</I>2</SUB>, <I>. . . , z</I><SUB>0</SUB><img src="../images/wdrtchv.gif">, an <I>n-</I>bit <I><B>carry-save adder</I></B> produces an <I>n</I>-bit number <I>u</I> = <img src="../images/lftwdchv.gif"><I>u<SUB>n-</I>1</SUB>, <I>u<SUB>n-</I>2</SUB>, <I>. . . , u</I><SUB>0</SUB><img src="../images/wdrtchv.gif"> and an (<I>n + </I>1)-bit number <I>v</I> = <img src="../images/lftwdchv.gif"><I>v<SUB>n</I></SUB>, <I>v<SUB>n-</I>1</SUB>, . . . , <I>v</I><SUB>0</SUB><img src="../images/wdrtchv.gif"><SUB> </SUB>such that<P>
<pre><I>u </I>+ <I>v</I> = <I>x </I>+ <I>y </I>+ <I>z</I>.</sub></sup></pre><P>
As shown in Figure 29.11 (a), it does this by computing<P>
<pre><I>u<SUB>i</I></SUB> = parity (<I>x<SUB>i</I></SUB>, <I>y<SUB>i</I></SUB>, z<I><SUB>i</I></SUB>),</sub></sup></pre><P>
<pre><I>v<SUB>i + </I>1</SUB> = majority (<I>x<SUB>i</I></SUB>, <I>y<SUB>i</I></SUB>, <I>z<SUB>i</I></SUB>),</sub></sup></pre><P>
for<I> i</I> = 0, 1, . . . , <I>n</I> - 1. Bit <I>v</I><SUB>0</SUB> always equals 0.<P>
The <I>n</I>-bit carry-save adder shown in Figure 29.11(b) consists of <I>n</I> full adders <I>FA</I><SUB>0</SUB>, <I>FA</I><SUB>1</SUB><I>, . . . , FA<SUB>n </I>- 1</SUB>. For <I>i</I> = 0, 1, . . . , <I>n</I> - 1, full adder <I>FA<SUB>i</I></SUB> takes inputs <I>x<SUB>i</I></SUB>, <I>y<SUB>i</I></SUB>, and z<I><SUB>i</I></SUB>. The sum-bit output of <I>FA<SUB>i </I></SUB>is taken as<I> u<SUB>i</I></SUB>, and the carry-out of <I>FA<SUB>i</I></SUB> is taken as <I>v<SUB>i+</I>1</SUB>. Bit <I>v</I><SUB>0</SUB> is hardwired to 0.<P>
Since the computations of all 2<I>n + </I>1 output bits are independent, they can be performed in parallel. Thus, a carry-save adder operates in <img src="../images/bound.gif">(1) time and has <img src="../images/bound.gif">(<I>n</I>) size. To sum three <I>n</I>-bit numbers, therefore, we need only perform a carry-save addition, taking <img src="../images/bound.gif">(1) time, and then perform a carry-lookahead addition, taking <I>O</I>(lg<I> n</I>) time. Although this method is not asymptotically better than the method of using two carry-lookahead additions, it is much faster in practice. Moreover, we shall see in Section 29.3 that carry-save addition is central to fast algorithms for multiplication.<P>
<img src="669_a.gif"><P>
<h4><a name="0939_198c">Figure 29.11 (a) Carry-save addition. Given three n-bit numbers x, y, and z, we produce an n-bit number u and an (n + 1)-bit number v such that x + y + z = u + v. The ith pair of shaded bits are a function of x<SUB>i</SUB>, y<SUB>i</SUB>, and z<SUB>i</SUB>. (b) An 8-bit carry-save adder. Each full adder FA<SUB>i</SUB> takes inputs x<SUB>i</SUB>, y<SUB>i</SUB>, and z<SUB>i</SUB> and produces sum bit u<SUB>i</SUB> and carry-out bit v<SUB>i + 1</SUB>. Bit v<SUB>0</SUB> is hardwired to 0.<a name="0939_198c"></sub></sup></h4><P>
<P>
<h2><a name="093a_1991">Exercises<a name="093a_1991"></h2><P>
<a name="093a_1992">29.2-1<a name="093a_1992"><P>
Let <I>a</I> = <img src="../images/lftwdchv.gif">01111111<img src="../images/wdrtchv.gif">, <I>b</I> = <img src="../images/lftwdchv.gif">00000001<img src="../images/wdrtchv.gif">, and <I>n</I> = 8. Show the sum and carry bits output by full adders when ripple-carry addition is performed on these two sequences. Show the carry statuses <I>x</I><SUB>0</SUB>, <I>x</I><SUB>1</SUB>, . . . , <I>x</I><SUB>8</SUB> corresponding to <I>a</I> and <I>b</I>, label each wire of the parallel prefix circuit of Figure 29.9 with the value it has given these <I>x<SUB>i</I></SUB> inputs, and show the resulting outputs <I>y</I><SUB>0</SUB>, <I>y</I><SUB>1</SUB>, . . . , <I>y</I><SUB>8</SUB>.<P>
<a name="093a_1993">29.2-2<a name="093a_1993"><P>
Prove that the carry-status operator <img src="../images/circx.gif"> given by Figure 29.5 is associative.<P>
<img src="670_a.gif"><P>
<h4><a name="093a_1994">Figure 29.12 A parallel prefix circuit for use in Exercise 29.2-6.<a name="093a_1994"></sub></sup></h4><P>
<a name="093a_1995">29.2-3<a name="093a_1995"><P>
<a name="093a_198b">Show by example how to construct an <I>O</I>(lg<I> n</I>)-time parallel prefix circuit for values of <I>n</I> that are not exact powers of 2 by drawing a parallel prefix circuit for <I>n</I> = 11. Characterize the performance of parallel prefix circuits built in the shape of arbitrary binary trees.<P>
<a name="093a_1996">29.2-4<a name="093a_1996"><P>
Show the gate-level construction of the box <I>KPG<SUB>i</I></SUB>. Assume that each output <I>x<SUB>i</I></SUB> is represented by <img src="../images/lftwdchv.gif">00<img src="../images/wdrtchv.gif"> if <I>x<SUB>i</I></SUB> = <FONT FACE="Courier New" SIZE=2>k</FONT>, by <img src="../images/lftwdchv.gif">11<img src="../images/wdrtchv.gif"> if <I>x<SUB>i</I></SUB> = g, and by <img src="../images/lftwdchv.gif">01<img src="../images/wdrtchv.gif"> or <img src="../images/lftwdchv.gif">10<img src="../images/wdrtchv.gif"> if <I>x<SUB>i</I></SUB> = <FONT FACE="Courier New" SIZE=2>p</FONT>. Assume also that each input <I>y<SUB>i</SUB> </I>is represented by 0 if <I>y<SUB>i</SUB> </I>= <FONT FACE="Courier New" SIZE=2>k</FONT> and by 1 if <I>y<SUB>i</SUB> </I>= g.<P>
<a name="093a_1997">29.2-5<a name="093a_1997"><P>
<a name="093a_198c"><a name="093a_198d">Label each wire in the parallel prefix circuit of Figure 29.9(a) with its depth. A <I><B>critical path</I></B><I> </I>in a circuit is a path with the largest number of combinational elements on any path from inputs to outputs. Identify the critical path in Figure 29.9(a), and show that its length is <I>O</I>(lg<I> n</I>). Show that some node has <img src="../images/circx.gif"> elements that operate <img src="../images/bound.gif">(lg <I>n</I>) time apart. Is there a node whose <img src="../images/circx.gif"> elements operate simultaneously?<P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -