📄 ch02.16.htm
字号:
to maximize the performance of the multiplier. In <B>tree-based multipliers</B>
there are two ways to do thisworking forward and working backward.</P>
<P><P CLASS="Body"><A NAME="pgfId=180485"></A>In a <B>Wallace-tree multiplier</B>
we work forward from the multiplier inputs, compressing the number of signals
to be added at each stage [Wallace, 1960]. We can view an FA as a <B>3:2
compressor</B> or <B>(3, 2) counter</B> it counts the number of '1's on
the inputs. Thus, for example, an input of '101' (two '1's) results in an
output '10' (2). A half adder is a <B>(2, 2) counter</B> . To form P<SUB CLASS="Subscript">
5</SUB> in Figure 2.29 we must add 6 summands (S<SUB CLASS="Subscript">
05</SUB> , S<SUB CLASS="Subscript"> 14</SUB> , S<SUB CLASS="Subscript">
23</SUB> , S<SUB CLASS="Subscript"> 32</SUB> , S<SUB CLASS="Subscript">
41</SUB> , and S<SUB CLASS="Subscript"> 50</SUB> ) and 4 carries from the
P<SUB CLASS="Subscript"> 4</SUB> column. We add these in stages 17, compressing
from 6:3:2:2:3:1:1. Notice that we wait until stage 5 to add the last carry
from column P<SUB CLASS="Subscript"> 4</SUB> <SPAN CLASS="White"> </SPAN>,
and this means we expand (rather than compress) the number of signals (from
2 to 3) between stages 3 and 5. The maximum delay through the CSA array
of Figure 2.29 is 6 adder delays. To this we must add the delay of
the 4-bit (9 inputs) CPA (stage 7). There are 26 adders (6 half adders)
plus the 4 adders in the CPA.</P>
<P><TABLE BORDER="0" CELLSPACING="2" CELLPADDING="0">
<TR>
<TD><P CLASS="TableFigure"><A NAME="pgfId=192041"></A><IMG SRC="CH02-78.gif"
ALIGN="BASELINE" WIDTH="446" HEIGHT="302" NATURALSIZEFLAG="3"> </TD></TR>
<TR>
<TD><P CLASS="TableFigureTitle"><A NAME="pgfId=192047"></A>FIGURE 2.29 A
6-bit Wallace-tree multiplier. The carry-save adder (CSA) requires 26 adders
(cells 126, six are half adders). The final carry-propagate adder (CPA)
consists of 4 adder cells (2730). The delay of the CSA is 6 adders. The
delay of the CPA is 4 adders.</TD></TR>
</TABLE>
<P CLASS="Body"><A NAME="pgfId=180768"></A>In a <B>Dadda multiplier</B>
(Figure 2.30) we work backward from the final product [Dadda, 1965].
Each stage has a maximum of 2, 3, 4, 6, 9, 13, 19, .<SPAN CLASS="White"> </SPAN>.<SPAN CLASS="White"> </SPAN>.<SPAN CLASS="White"> </SPAN>outputs
(each successive stage is 3/2 times largerrounded down to an integer). Thus,
for example, in Figure 2.28(d) we require 3 stages (with 3 adder delaysplus
the delay of a 10-bit output CPA) for a 6-bit Dadda multiplier. There are
19 adders (4 half adders) in the CSA plus the 10 adders (2 half adders)
in the CPA. A Dadda multiplier is usually faster and smaller than a Wallace-tree
multiplier.</P>
<P><TABLE BORDER="0" CELLSPACING="2" CELLPADDING="0">
<TR>
<TD><P CLASS="TableFigure"><A NAME="pgfId=192071"></A><IMG SRC="CH02-79.gif"
ALIGN="BASELINE" WIDTH="451" HEIGHT="230" NATURALSIZEFLAG="3"> </TD></TR>
<TR>
<TD><P CLASS="TableFigureTitle"><A NAME="pgfId=192077"></A>FIGURE 2.30 The
6-bit Dadda multiplier. The carry-save adder (CSA) requires 20 adders (cells
120, four are half adders). The carry-propagate adder (CPA, cells 2130)
is a ripple-carry adder (RCA). The CSA is smaller (20 versus 26 adders),
faster (3 adder delays versus 6 adder delays), and more regular than the
Wallace-tree CSA of Figure 2.29. The overall speed of this implementation
is approximately the same as the Wallace-tree multiplier of Figure 2.29;
however, the speed may be increased by substituting a faster CPA.</TD></TR>
</TABLE>
<P CLASS="Body"><A NAME="pgfId=181284"></A>In general, the number of stages
and thus delay (in units of an FA delayexcluding the CPA) for an <SPAN CLASS="EquationVariables">
n</SPAN> -bit tree-based multiplier using (3, 2) counters is</P>
<P><P CLASS="EqnNmbrdAlign"><A NAME="pgfId=180773"></A> log<SUB CLASS="Subscript">
1.5</SUB> <SPAN CLASS="White"> </SPAN><SPAN CLASS="EquationVariables">
n</SPAN> <SPAN CLASS="White"> </SPAN>=<SPAN CLASS="White"> </SPAN>log<SUB CLASS="Subscript">
10</SUB> <SPAN CLASS="White"> </SPAN><SPAN CLASS="EquationVariables">
n</SPAN> /log<SUB CLASS="Subscript"> 10</SUB> <SPAN CLASS="White"> </SPAN>1.5<SPAN CLASS="White"> </SPAN>=<SPAN CLASS="White"> </SPAN>log<SUB CLASS="Subscript">
10</SUB> <SPAN CLASS="White"> </SPAN><SPAN CLASS="EquationVariables">
n</SPAN> /0.176.(2.64)</P>
<P><P CLASS="Body"><A NAME="pgfId=191766"></A>Figure 2.31(a) shows
how the partial-product array is constructed in a conventional 4-bit multiplier.
The <B>FerrariStefanelli multiplier</B> (Figure 2.31b) "nests"
multipliersthe 2-bit submultipliers reduce the number of partial products
[Ferrari and Stefanelli, 1969].</P>
<P><TABLE BORDER="0" CELLSPACING="2" CELLPADDING="0">
<TR>
<TD><P CLASS="TableFigure"><A NAME="pgfId=191775"></A><IMG SRC="CH02-80.gif"
ALIGN="BASELINE" WIDTH="445" HEIGHT="180" NATURALSIZEFLAG="3"> </TD></TR>
<TR>
<TD><P CLASS="TableFigureTitle"><A NAME="pgfId=191781"></A>FIGURE 2.31 FerrariStefanelli
multiplier. (a) A conventional 4-bit array multiplier using AND gates
to calculate the summands with (2, 2) and (3, 2) counters to sum the partial
products. (b) A 4-bit FerrariStefanelli multiplier using 2-bit submultipliers
to construct the partial product array. (c) A circuit implementation
for an inverting 2-bit submultiplier.</TD></TR>
</TABLE>
<P CLASS="Body"><A NAME="pgfId=180774"></A>There are several issues in deciding
between parallel multiplier architectures:</P>
<OL>
<LI><A NAME="pgfId=180775"></A>Since it is easier to fold triangles rather
than trapezoids into squares, a Wallace-tree multiplier is more suited
to full-custom layout, but is slightly larger, than a Dadda multiplierboth
are less regular than an array multiplier. For cell-based ASICs, a Dadda
multiplier is smaller than a Wallace-tree multiplier.
<LI><A NAME="pgfId=180776"></A>The overall multiplier speed does depend
on the size and architecture of the final CPA, but this may be optimized
independently of the CSA array. This means a Dadda multiplier is always
at least as fast as the Wallace-tree version.
<LI><A NAME="pgfId=180777"></A>The low-order bits of any parallel multiplier
settle first and can be added in the CPA before the remaining bits settle.
This allows multiplication and the final addition to be overlapped in time.
<LI><A NAME="pgfId=180968"></A>Any of the parallel multiplier architectures
may be pipelined. We may also use a <B>variably pipelined</B> approach
that tailors the register locations to the size of the multiplier.
<LI><A NAME="pgfId=195933"></A>Using (4, 2), (5, 3), (7, 3), or (15, 4)
counters increases the stage compression and permits the size of the stages
to be tuned. Some ASIC cell libraries contain a (7, 3) countera <B>2-bit
full-adder</B> . A (15, 4) counter is a 3-bit full adder. There is
a trade-off in using these counters between the speed and size of the logic
cells and the delay as well as area of the interconnect.
<LI><A NAME="pgfId=181081"></A>Power dissipation is reduced by the tree-based
structures. The simplified carry-save logic produces fewer signal transitions
and the tree structures produce fewer glitches than a chain.
<LI><A NAME="pgfId=195908"></A>None of the multiplier structures we have
discussed take into account the possibility of staggered arrival times
for different bits of the multiplicand or the multiplier. Optimization
then requires a logic-synthesis tool.
</OL>
<P><HR ALIGN=LEFT></P>
<P><A HREF="CH02.12.htm">Chapter start</A> <A
HREF="CH02.15.htm">Previous page</A> <A HREF="CH02.17.htm">Next page</A>
</BODY>
<!--#include file="Copyright.html"--><!--#include file="footer.html"-->
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -