ch02.6.htm

来自「介绍asci设计的一本书」· HTM 代码 · 共 2,255 行 · 第 1/5 页
HTM
2,255 行
i</SPAN>

] <SPAN CLASS="Symbol">

&#8853;</SPAN>

 A2[<SPAN CLASS="EquationVariables">

i</SPAN>

] <SPAN CLASS="Symbol">

&#8853;</SPAN>

 A3[<SPAN CLASS="EquationVariables">

i</SPAN>

 ] = PARITY(A1[<SPAN CLASS="EquationVariables">

i</SPAN>

], A2[<SPAN CLASS="EquationVariables">

i</SPAN>

], A3[<SPAN CLASS="EquationVariables">

i</SPAN>

 ]) ,</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnNumber">

<A NAME="pgfId=381005">

 </A>

(2.52)</P>

</TD>

</TR>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnLeft">

<A NAME="pgfId=381007">

 </A>

COUT = A1[<SPAN CLASS="EquationVariables">

i</SPAN>

] &#183; A2[<SPAN CLASS="EquationVariables">

i</SPAN>

] + [(A1[<SPAN CLASS="EquationVariables">

i</SPAN>

] + A2[<SPAN CLASS="EquationVariables">

i</SPAN>

]) &#183; A3[<SPAN CLASS="EquationVariables">

i</SPAN>

 ]] = MAJ(A1[<SPAN CLASS="EquationVariables">

i</SPAN>

], A2[<SPAN CLASS="EquationVariables">

i</SPAN>

], A3[<SPAN CLASS="EquationVariables">

i</SPAN>

 ]) .</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnNumber">

<A NAME="pgfId=381009">

 </A>

(2.53)</P>

</TD>

</TR>

</TABLE>

<P CLASS="Body">

<A NAME="pgfId=176807">

 </A>

The inputs, A1, A2, and A3; and outputs, S1 and S2, are buses. The input, CIN, is the carry from stage (<SPAN CLASS="EquationVariables">

i</SPAN>

 &#8211; 1). The carry in, CIN, is connected directly to the output bus S1&#8212;indicated by the schematic symbol (Figure&nbsp;2.23a). We connect CIN[0] to VSS. The output, COUT, is the carry out to stage (<SPAN CLASS="EquationVariables">

i</SPAN>

 + 1).</P>

<P CLASS="Body">

<A NAME="pgfId=177617">

 </A>

A 4-bit CSA is shown in Figure&nbsp;2.23(b). The arithmetic overflow signal for ones&#8217; complement or two&#8217;s complement arithmetic, OV, is XOR(COUT[MSB], COUT[MSB &#8211; 1]) as shown in Figure&nbsp;2.23(c). In a CSA the carries are &#8220;saved&#8221; at each stage and shifted left onto the bus S1. There is thus no carry propagation and the delay of a CSA is constant. At the output of a CSA we still need to add the S1 bus (all the saved carries) and the S2 bus (all the sums) to get an <SPAN CLASS="EquationVariables">

n</SPAN>

-bit result using a final stage that is not shown in Figure&nbsp;2.23(c). We might regard the <SPAN CLASS="EquationVariables">

n</SPAN>

-bit sum as being encoded in the two buses, S1 and S2, in the form of the parity and majority functions.</P>

<P CLASS="Body">

<A NAME="pgfId=176824">

 </A>

We can use a CSA to add multiple inputs&#8212;as an example, an adder with four 4-bit inputs is shown in Figure&nbsp;2.23(d). The last stage sums two input buses using a <SPAN CLASS="Definition">

carry-propagate adder</SPAN>

 (<SPAN CLASS="Definition">

CPA</SPAN>

). We have used an RCA as the CPA in Figure&nbsp;2.23(d) and (e), but we can use any type of adder. Notice in Figure&nbsp;2.23(e) how the two CSA cells and the RCA cell abut together horizontally to form a <SPAN CLASS="Definition">

bit slice</SPAN>

 (or slice) and then the slices are stacked vertically to form the datapath.</P>

<TABLE>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableFigure">

<A NAME="pgfId=176846">

 </A>

&nbsp;</P>

<DIV>

<IMG SRC="CH02-34.gif">

</DIV>

</TD>

</TR>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableFigureTitle">

<A NAME="pgfId=176850">

 </A>

FIGURE&nbsp;2.22&nbsp;The carry-save adder (CSA). (a)&nbsp;A CSA cell. (b)&nbsp;A 4-bit CSA. (c)&nbsp;Symbol for a CSA. (d)&nbsp;A four-input CSA. (e)&nbsp;The datapath for a four-input, 4-bit adder using CSAs with a ripple-carry adder (RCA) as the final stage. (f)&nbsp;A pipelined adder. (g)&nbsp;The datapath for the pipelined version showing the pipeline registers as well as the clock control lines that use m2.</P>

</TD>

</TR>

</TABLE>

<P CLASS="Body">

<A NAME="pgfId=99434">

 </A>

We can register the CSA stages by adding vectors of flip-flops as shown in Figure&nbsp;2.23(f). This reduces the adder delay to that of the slowest adder stage, usually the CPA. By using registers between stages of combinational logic we use <SPAN CLASS="Definition">

pipelining</SPAN>

 to increase the speed and pay a price of increased area (for the registers) and introduce <SPAN CLASS="Definition">

latency</SPAN>

. It takes a few clock cycles (the latency, equal to <SPAN CLASS="EquationVariables">

n</SPAN>

 clock cycles for an <SPAN CLASS="EquationVariables">

n</SPAN>

-stage pipeline) to fill the pipeline, but once it is filled, the answers emerge every clock cycle. Ferris wheels work much the same way. When the fair opens it takes a while (latency) to fill the wheel, but once it is full the people can get on and off every few seconds. (We can also pipeline the RCA of Figure&nbsp;2.20. We add <SPAN CLASS="EquationVariables">

 i</SPAN>

&nbsp;registers on the A and B inputs before ADD[<SPAN CLASS="EquationVariables">

i</SPAN>

] and add (<SPAN CLASS="EquationVariables">

n</SPAN>

 <SPAN CLASS="EquationVariables">

&#8211;</SPAN>

 <SPAN CLASS="EquationVariables">

i</SPAN>

)&nbsp;registers after the output S[<SPAN CLASS="EquationVariables">

i</SPAN>

], with a single register before each C[<SPAN CLASS="EquationVariables">

i</SPAN>

].)</P>

<P CLASS="Body">

<A NAME="pgfId=170268">

 </A>

The problem with an RCA is that every stage has to wait to make its carry decision, C[<SPAN CLASS="EquationVariables">

i</SPAN>

], until the previous stage has calculated C[<SPAN CLASS="EquationVariables">

i</SPAN>

 &#8211; 1]. If we examine the propagate signals we can bypass this critical path. Thus, for example, to bypass the carries for bits 4&#8211;7 (stages 5&#8211;8) of an adder we can compute BYPASS = P[4].P[5].P[6].P[7] and then use a MUX as follows: &nbsp;</P>

<TABLE>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnCenter">

<A NAME="pgfId=381127">

 </A>

C[7] = (G[7] + P[7] &#183; C[6]) &#183; BYPASS' + C[3] &#183; BYPASS .</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnNumber">

<A NAME="pgfId=381129">

 </A>

(2.54)</P>

</TD>

</TR>

</TABLE>

<P CLASS="Body">

<A NAME="pgfId=165238">

 </A>

Adders based on this principle are called <SPAN CLASS="Definition">

carry-bypass adders</SPAN>

 (<SPAN CLASS="Definition">

CBA</SPAN>

) [Sato et al., 1992]. Large, custom adders employ <SPAN CLASS="Definition">

Manchester-carry chains</SPAN>

 to compute the carries and the bypass operation using TGs or just pass transistors [Weste and Eshraghian, 1993, pp.&nbsp;530&#8211;531]. These types of carry chains may be part of a predesigned ASIC adder cell, but are not used by ASIC designers. </P>

<P CLASS="Body">

<A NAME="pgfId=177373">

 </A>

Instead of checking the propagate signals we can check the inputs. For example we can compute SKIP = (A[<SPAN CLASS="EquationVariables">

i</SPAN>

 &#8211; 1] <SPAN CLASS="Symbol">

&#8853;</SPAN>

 B[<SPAN CLASS="EquationVariables">

i</SPAN>

 &#8211; 1]) + (A[<SPAN CLASS="EquationVariables">

i</SPAN>

] <SPAN CLASS="Symbol">

&#8853;</SPAN>

 B[<SPAN CLASS="EquationVariables">

i</SPAN>

] ) and then use a 2:1 MUX to select C[<SPAN CLASS="EquationVariables">

i</SPAN>

]. Thus,  </P>

<TABLE>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnLeft">

<A NAME="pgfId=381144">

 </A>

CSKIP[<SPAN CLASS="EquationVariables">

i</SPAN>

] = (G[<SPAN CLASS="EquationVariables">

i</SPAN>

] + P[<SPAN CLASS="EquationVariables">

i</SPAN>

] &#183; C[<SPAN CLASS="EquationVariables">

i</SPAN>

 &#8211; 1]) &#183; SKIP' + C[<SPAN CLASS="EquationVariables">

i</SPAN>

 &#8211; 2] &#183; SKIP .</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnNumber">

<A NAME="pgfId=381146">

 </A>

(2.55)</P>

</TD>

</TR>

</TABLE>

<P CLASS="BodyAfterHead">

<A NAME="pgfId=165350">

 </A>

This is a <SPAN CLASS="Definition">

carry-skip adder</SPAN>

 [Keutzer, Malik, and Saldanha, 1991; Lehman, 1961]. Carry-bypass and carry-skip adders may include redundant logic (since the carry is computed in two different ways&#8212;we just take the first signal to arrive). We must be careful that the redundant logic is not optimized away during logic synthesis.</P>

<P CLASS="Body">

<A NAME="pgfId=165312">

 </A>

If we evaluate Eq.&nbsp;2.44 recursively for <SPAN CLASS="EquationVariables">

i</SPAN>

 = 1, we get the following:  </P>

<TABLE>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnRight">

<A NAME="pgfId=381301">

 </A>

C[1]</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnCenter">

<A NAME="pgfId=381303">

 </A>

=</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnLeft">

<A NAME="pgfId=381269">

 </A>

G[1] + P[1] &#183; C[0]</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqn">

<A NAME="pgfId=381271">

 </A>

&nbsp;</P>

</TD>

</TR>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnRight">

<A NAME="pgfId=381329">

 </A>

&nbsp;</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnCenter">

<A NAME="pgfId=381331">

 </A>

=</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnLeft">

<A NAME="pgfId=381333">

 </A>

G[1] + P[1] &#183; (G[0] + P[1] &#183; C[&#8211;1])</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqn">

<A NAME="pgfId=381335">

 </A>

&nbsp;</P>

</TD>

</TR>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnRight">

<A NAME="pgfId=381305">

 </A>

&nbsp;</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnCenter">

<A NAME="pgfId=381307">

 </A>

=</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnLeft">

<A NAME="pgfId=381297">

 </A>

G[1] + P[1] &#183; G[0] .</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnNumber">

<A NAME="pgfId=381299">

 </A>

(2.56)</P>

</TD>

</TR>

</TABLE>

<P CLASS="Body">

<A NAME="pgfId=85691">

 </A>

This result means that we can &#8220;look ahead&#8221; by two stages and calculate the carry into the third stage (bit 2), which is C[1], using only the first-stage inputs (to calculate G[0]) and the second-stage inputs. This is a <SPAN CLASS="Definition">

carry-lookahead adder</SPAN>

 (<SPAN CLASS="Definition">

CLA</SPAN>

) [MacSorley, 1961]. If we continue expanding Eq.&nbsp;2.44, we find:  </P>

<TABLE>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnRight">

<A NAME="pgfId=381347">

 </A>

C[2]</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnCenter">

<A NAME="pgfId=381349">

 </A>

=</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnLeft">

<A NAME="pgfId=381351">

 </A>

G[2] + P[2] &#183; G[1] + P[2] &#183; P[1] &#183; G[0] ,</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqn">

<A NAME="pgfId=381353">

 </A>

&nbsp;</P>

</TD>

</TR>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnRight">

<A NAME="pgfId=381355">

 </A>

&nbsp;</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnCenter">

<A NAME="pgfId=381357">

 </A>

&nbsp;</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnLeft">

<A NAME="pgfId=381359">

 </A>

&nbsp;</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqn">

<A NAME="pgfId=381361">

 </A>

&nbsp;</P>

</TD>

</TR>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnRight">

<A NAME="pgfId=381363">

 </A>

C[3]</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnCenter">

<A NAME="pgfId=381365">

 </A>

=</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnLeft">

<A NAME="pgfId=381367">

 </A>

G[3] + P[2] &#183; G[2] + P[2] &#183; P[1] &#183; G[1] + P[3] &#183; P[2] &#183; P[1] &#183; G[0] .</P>

</TD>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableEqnNumber">

<A NAME="pgfId=381369">

 </A>

(2.57)</P>

</TD>

</TR>

</TABLE>

<P CLASS="Body">

<A NAME="pgfId=170855">

 </A>

As we look ahead further these equations become more complex, take longer to calculate, and the logic becomes less regular when implemented using cells with a limited number of inputs. Datapath layout must fit in a bit slice, so the physical and logical structure of each bit must be similar. In a standard cell or gate array we are not so concerned about a regular physical structure, but a regular logical structure simplifies design. The Brent&#8211;Kung adder reduces the delay and increases the regularity of the carry-lookahead scheme [Brent and Kung, 1982]. Figure&nbsp;2.24(a) shows a regular 4-bit CLA, using the carry-lookahead generator cell (CLG) shown in Figure&nbsp;2.24(b).</P>

<TABLE>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableFigure">

<A NAME="pgfId=176735">

 </A>

<IMG SRC="CH02-35.gif" ALIGN="BASELINE">

&nbsp;</P>

</TD>

</TR>

<TR>

<TD ROWSPAN="1" COLSPAN="1">

<P CLASS="TableFigureTitle">

<A NAME="pgfId=176738">

 </A>

FIGURE&nbsp;2.23&nbsp;The Brent&#8211;Kung carry-lookahead adder (CLA). (a)&nbsp;Carry generation in a 4-bit CLA. (b)&nbsp;A cell to generate the lookahead terms, C[0]&#8211;C[3]. (c)&nbsp;Cells L1, L2, and L3 are rearranged into a tree that has less delay. Cell L4 is added to calculate C[2] that is lost in the translation. (d) and (e)&nbsp;Simplified representations of parts a and c. (f)&nbsp;The lookahead logic for an 8-bit adder. The inputs, 0&#8211;7, are the propagate and carry terms formed from the inputs to the adder. (g)&nbsp;An 8-bit Brent&#8211;Kung CLA. The outputs of the lookahead logic are the carry bits that (together with the inputs) form the sum. One advantage of this adder is that delays from the inputs to the outputs are more nearly equal than in other adders. This tends to reduce the number of unwanted and unnecessary switching events and thus reduces power dissipation.</P>

</TD>

</TR>

</TABLE>

<P CLASS="Body">

<A NAME="pgfId=165169">

 </A>

In a <SPAN CLASS="Definition">

carry-select adder </SPAN>

we duplicate two small adders (usually 4-bit or 8-bit adders&#8212;often CLAs) for the cases CIN = '0' and CIN = '1' and then use a MUX to select the case that we need&#8212;wasteful, but fast [Bedrij, 1962]. A carry-select adder is often used as the fast adder in a datapath library because its layout is regular. </P>

<P CLASS="Body">

<A NAME="pgfId=177413">

 </A>

We can use the carry-select, carry-bypass, and carry-skip architectures to split a 12-bit adder, for example, into three blocks. The delay of the adder is then partly dependent on the delays of the MUX between each block. Suppose the delay due to 1-bit in an adder block (we shall call this a bit delay) is approximately equal to the MUX delay. In this case may be faster to make the blocks 3, 4, and 5-bits long instead of being equal in size. Now the delays into the final MUX are equal&#8212;3&nbsp;bit-delays plus 2 MUX delays for the carry signal from bits 0&#8211;6 and 5 bit-delays for the carry from bits 7&#8211;11. Adjusting the block size reduces the delay of large adders (more than 16 bits).</P>

<P CLASS="Body">

<A NAME="pgfId=165654">

 </A>

We can extend the idea behind a carry-select adder as follows. Suppose we have an <SPAN CLASS="EquationVariables">

n</SPAN>

-bit adder that generates two sums: One sum assumes a carry-in condition of '0', the other sum assumes a carry-in condition of '1'. We can split this <SPAN CLASS="EquationVariables">

n</SPAN>

-bit adder into an <SPAN CLASS="EquationVariables">

i</SPAN>

-bit adder for the <SPAN CLASS="EquationVariables">

i</SPAN>

 LSBs and an (<SPAN CLASS="EquationVariables">

n</SPAN>

 &#8211; <SPAN CLASS="EquationVariables">

i</SPAN>

)-bit adder for the <SPAN CLASS="EquationVariables">

n</SPAN>

 &#8211; <SPAN CLASS="EquationVariables">

i</SPAN>

 MSBs. Both of the smaller adders generate two conditional sums as well as true and complement carry signals. The two (true and complement) carry signals from the LSB adder are used to select between the two (<SPAN CLASS="EquationVariables">

n</SPAN>

 &#8211; <SPAN CLASS="EquationVariables">

i</SPAN>

 + 1)-bit conditional sums from the MSB adder using 2(<SPAN CLASS="EquationVariables">

n</SPAN>

 &#8211; <SPAN CLASS="EquationVariables">

i</SPAN>

 + 1) two-input MUXes. This is a <SPAN CLASS="Definition">

conditional-sum adder</SPAN>

 (also often abbreviated to CSA) [Sklansky, 1960]. We can recursively apply this technique. For example, we can split a 16-bit adder using <SPAN CLASS="EquationVariables">

i</SPAN>

 = 8 and <SPAN CLASS="EquationVariables">
ch02.6.htm - 源码说明

本页面展示了「介绍asci设计的一本书」中的 ch02.6.htm 源码文件，采用 HTM 编程语言编写，共 2,255 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与asci相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?