⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch02.16.htm

📁 介绍asci设计的一本书
💻 HTM
📖 第 1 页 / 共 3 页
字号:
to maximize the performance of the multiplier. In <B>tree-based multipliers</B>

there are two ways to do thisworking forward and working backward.</P>



<P><P CLASS="Body"><A NAME="pgfId=180485"></A>In a <B>Wallace-tree multiplier</B>

we work forward from the multiplier inputs, compressing the number of signals

to be added at each stage [Wallace, 1960]. We can view an FA as a <B>3:2

compressor</B> or <B>(3, 2) counter</B> it counts the number of '1's on

the inputs. Thus, for example, an input of '101' (two '1's) results in an

output '10' (2). A half adder is a <B>(2, 2) counter</B> . To form P<SUB CLASS="Subscript">

5</SUB> in Figure&nbsp;2.29 we must add 6 summands (S<SUB CLASS="Subscript">

05</SUB> , S<SUB CLASS="Subscript"> 14</SUB> , S<SUB CLASS="Subscript">

23</SUB> , S<SUB CLASS="Subscript"> 32</SUB> , S<SUB CLASS="Subscript">

41</SUB> , and S<SUB CLASS="Subscript"> 50</SUB> ) and 4 carries from the

P<SUB CLASS="Subscript"> 4</SUB> column. We add these in stages 17, compressing

from 6:3:2:2:3:1:1. Notice that we wait until stage 5 to add the last carry

from column P<SUB CLASS="Subscript"> 4</SUB> <SPAN CLASS="White">&nbsp;</SPAN>,

and this means we expand (rather than compress) the number of signals (from

2 to 3) between stages 3 and 5. The maximum delay through the CSA array

of Figure&nbsp;2.29 is 6 adder delays. To this we must add the delay of

the 4-bit (9 inputs) CPA (stage 7). There are 26 adders (6 half adders)

plus the 4 adders in the CPA.</P>



<P><TABLE BORDER="0" CELLSPACING="2" CELLPADDING="0">

<TR>

<TD><P CLASS="TableFigure"><A NAME="pgfId=192041"></A><IMG SRC="CH02-78.gif"

ALIGN="BASELINE" WIDTH="446" HEIGHT="302" NATURALSIZEFLAG="3"> &nbsp;</TD></TR>

<TR>

<TD><P CLASS="TableFigureTitle"><A NAME="pgfId=192047"></A>FIGURE&nbsp;2.29&nbsp;&nbsp;A

6-bit Wallace-tree multiplier. The carry-save adder (CSA) requires 26 adders

(cells 126, six are half adders). The final carry-propagate adder (CPA)

consists of 4 adder cells (2730). The delay of the CSA is 6 adders. The

delay of the CPA is 4 adders.</TD></TR>

</TABLE>

<P CLASS="Body"><A NAME="pgfId=180768"></A>In a <B>Dadda multiplier</B>

(Figure&nbsp;2.30) we work backward from the final product [Dadda, 1965].

Each stage has a maximum of 2, 3, 4, 6, 9, 13, 19, .<SPAN CLASS="White">&nbsp;</SPAN>.<SPAN CLASS="White">&nbsp;</SPAN>.<SPAN CLASS="White">&nbsp;</SPAN>outputs

(each successive stage is 3/2 times largerrounded down to an integer). Thus,

for example, in Figure&nbsp;2.28(d) we require 3 stages (with 3 adder delaysplus

the delay of a 10-bit output CPA) for a 6-bit Dadda multiplier. There are

19 adders (4 half adders) in the CSA plus the 10 adders (2 half adders)

in the CPA. A Dadda multiplier is usually faster and smaller than a Wallace-tree

multiplier.</P>



<P><TABLE BORDER="0" CELLSPACING="2" CELLPADDING="0">

<TR>

<TD><P CLASS="TableFigure"><A NAME="pgfId=192071"></A><IMG SRC="CH02-79.gif"

ALIGN="BASELINE" WIDTH="451" HEIGHT="230" NATURALSIZEFLAG="3"> &nbsp;</TD></TR>

<TR>

<TD><P CLASS="TableFigureTitle"><A NAME="pgfId=192077"></A>FIGURE&nbsp;2.30&nbsp;&nbsp;The

6-bit Dadda multiplier. The carry-save adder (CSA) requires 20 adders (cells

120, four are half adders). The carry-propagate adder (CPA, cells 2130)

is a ripple-carry adder (RCA). The CSA is smaller (20 versus 26 adders),

faster (3 adder delays versus 6 adder delays), and more regular than the

Wallace-tree CSA of Figure&nbsp;2.29. The overall speed of this implementation

is approximately the same as the Wallace-tree multiplier of Figure&nbsp;2.29;

however, the speed may be increased by substituting a faster CPA.</TD></TR>

</TABLE>

<P CLASS="Body"><A NAME="pgfId=181284"></A>In general, the number of stages

and thus delay (in units of an FA delayexcluding the CPA) for an <SPAN CLASS="EquationVariables">

n</SPAN> -bit tree-based multiplier using (3, 2) counters is</P>



<P><P CLASS="EqnNmbrdAlign"><A NAME="pgfId=180773"></A>&nbsp;&nbsp;log<SUB CLASS="Subscript">

1.5</SUB> <SPAN CLASS="White">&nbsp;</SPAN><SPAN CLASS="EquationVariables">

n</SPAN> <SPAN CLASS="White">&nbsp;</SPAN>=<SPAN CLASS="White">&nbsp;</SPAN>log<SUB CLASS="Subscript">

10</SUB> <SPAN CLASS="White">&nbsp;</SPAN><SPAN CLASS="EquationVariables">

n</SPAN> /log<SUB CLASS="Subscript"> 10</SUB> <SPAN CLASS="White">&nbsp;</SPAN>1.5<SPAN CLASS="White">&nbsp;</SPAN>=<SPAN CLASS="White">&nbsp;</SPAN>log<SUB CLASS="Subscript">

10</SUB> <SPAN CLASS="White">&nbsp;</SPAN><SPAN CLASS="EquationVariables">

n</SPAN> /0.176.(2.64)</P>



<P><P CLASS="Body"><A NAME="pgfId=191766"></A>Figure&nbsp;2.31(a) shows

how the partial-product array is constructed in a conventional 4-bit multiplier.

The <B>FerrariStefanelli multiplier</B> (Figure&nbsp;2.31b) &quot;nests&quot;

multipliersthe 2-bit submultipliers reduce the number of partial products

[Ferrari and Stefanelli, 1969].</P>



<P><TABLE BORDER="0" CELLSPACING="2" CELLPADDING="0">

<TR>

<TD><P CLASS="TableFigure"><A NAME="pgfId=191775"></A><IMG SRC="CH02-80.gif"

ALIGN="BASELINE" WIDTH="445" HEIGHT="180" NATURALSIZEFLAG="3"> &nbsp;</TD></TR>

<TR>

<TD><P CLASS="TableFigureTitle"><A NAME="pgfId=191781"></A>FIGURE&nbsp;2.31&nbsp;&nbsp;FerrariStefanelli

multiplier. (a)&nbsp;A conventional 4-bit array multiplier using AND gates

to calculate the summands with (2, 2) and (3, 2) counters to sum the partial

products. (b)&nbsp;A 4-bit FerrariStefanelli multiplier using 2-bit submultipliers

to construct the partial product array. (c)&nbsp;A circuit implementation

for an inverting 2-bit submultiplier.</TD></TR>

</TABLE>

<P CLASS="Body"><A NAME="pgfId=180774"></A>There are several issues in deciding

between parallel multiplier architectures:</P>



<OL>

  <LI><A NAME="pgfId=180775"></A>Since it is easier to fold triangles rather

  than trapezoids into squares, a Wallace-tree multiplier is more suited

  to full-custom layout, but is slightly larger, than a Dadda multiplierboth

  are less regular than an array multiplier. For cell-based ASICs, a Dadda

  multiplier is smaller than a Wallace-tree multiplier.

  <LI><A NAME="pgfId=180776"></A>The overall multiplier speed does depend

  on the size and architecture of the final CPA, but this may be optimized

  independently of the CSA array. This means a Dadda multiplier is always

  at least as fast as the Wallace-tree version.

  <LI><A NAME="pgfId=180777"></A>The low-order bits of any parallel multiplier

  settle first and can be added in the CPA before the remaining bits settle.

  This allows multiplication and the final addition to be overlapped in time.

  <LI><A NAME="pgfId=180968"></A>Any of the parallel multiplier architectures

  may be pipelined. We may also use a <B>variably pipelined</B> approach

  that tailors the register locations to the size of the multiplier.

  <LI><A NAME="pgfId=195933"></A>Using (4, 2), (5, 3), (7, 3), or (15, 4)

  counters increases the stage compression and permits the size of the stages

  to be tuned. Some ASIC cell libraries contain a (7, 3) countera <B>2-bit

  full-adder</B> . A (15,&nbsp;4) counter is a 3-bit full adder. There is

  a trade-off in using these counters between the speed and size of the logic

  cells and the delay as well as area of the interconnect.

  <LI><A NAME="pgfId=181081"></A>Power dissipation is reduced by the tree-based

  structures. The simplified carry-save logic produces fewer signal transitions

  and the tree structures produce fewer glitches than a chain.

  <LI><A NAME="pgfId=195908"></A>None of the multiplier structures we have

  discussed take into account the possibility of staggered arrival times

  for different bits of the multiplicand or the multiplier. Optimization

  then requires a logic-synthesis tool.

</OL>



<P><HR ALIGN=LEFT></P>



<P><A HREF="CH02.12.htm">Chapter&nbsp;&nbsp;start</A>&nbsp;&nbsp;&nbsp;<A

HREF="CH02.15.htm">Previous&nbsp;&nbsp;page</A>&nbsp;&nbsp;<A HREF="CH02.17.htm">Next&nbsp;&nbsp;page</A>

</BODY>



<!--#include file="Copyright.html"--><!--#include file="footer.html"-->

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -