ch12.c.htm
来自「介绍asci设计的一本书」· HTM 代码 · 共 443 行 · 第 1/2 页
HTM
443 行
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML EXPERIMENTAL 970324//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="Adobe FrameMaker 5.5/HTML Export Filter">
<TITLE> 12.12 Optimization of the Viterbi Decoder</TITLE></HEAD><!--#include file="top.html"--><!--#include file="header.html"-->
<DIV>
<P>[ <A HREF="CH12.htm">Chapter start</A> ] [ <A HREF="CH12.b.htm">Previous page</A> ] [ <A HREF="CH12.d.htm">Next page</A> ]</P><!--#include file="AmazonAsic.html"--><HR></DIV>
<H1 CLASS="Heading1">
<A NAME="pgfId=262995">
</A>
12.12 <A NAME="28656">
</A>
Optimization of the Viterbi Decoder</H1>
<P CLASS="BodyAfterHead">
<A NAME="pgfId=268408">
</A>
Returning to the Viterbi decoder example (from Section 12.4), we first set the <SPAN CLASS="Definition">
environment</SPAN>
<A NAME="marker=269311">
</A>
for the design using the following worst-case conditions: a die temperature of 25<SPAN CLASS="Symbol">
∞</SPAN>
C (fastest logic) to 120<SPAN CLASS="Symbol">
∞</SPAN>
C (slowest logic); a power supply voltage of <SPAN CLASS="EquationVariables">
V</SPAN>
<SUB CLASS="Subscript">
DD</SUB>
= 5.5 V (fastest logic) to <SPAN CLASS="EquationVariables">
V</SPAN>
<SUB CLASS="Subscript">
DD</SUB>
= 4.5 V (slowest logic); and worst process (slowest logic) to best process (fastest logic). Assume that this ASIC should run at a clock frequency of at least 33 MHz (clock period of 30 ns). An initial synthesis run gives a critical path delay at nominal conditions (the default setting) of about 25 ns and nearly 35 ns under worst-case conditions using a high-density 0.6 <SPAN CLASS="Symbol">
m</SPAN>
m standard-cell target library. </P>
<P CLASS="Body">
<A NAME="pgfId=269147">
</A>
Estimates (using simulation and calculation) show that data arrives at the input pins 5 ns (worst-case) after the rising edge of the clock. The reset signal arrives 10 ns (worst-case) after the rising edge of the clock. The outputs of the Viterbi decoder must be stable at least 4 ns before the rising edge of the clock. This allows these signals to be driven to another ASIC in time to be clocked. These timing constraints are particularly devastating. Together they effectively reduce the clock period that is available for use by 9 ns. However, these figures are typical for board-level delays.</P>
<P CLASS="Body">
<A NAME="pgfId=264788">
</A>
The initial synthesis runs reveal the critical path is through the following six modules:</P>
<P CLASS="ComputerOneLine">
<A NAME="pgfId=269141">
</A>
subset_decode -> compute_metric -> <BR>
compare_select -> reduce -> metric -> output_decision</P>
<P CLASS="BodyAfterHead">
<A NAME="pgfId=264785">
</A>
The logic synthesizer can do little or no optimization across these module boundaries. The next step, then, is to rearrange the design hierarchy for synthesis. <SPAN CLASS="Definition">
Flattening</SPAN>
<A NAME="marker=319750">
</A>
(<A NAME="marker=319751">
</A>
merging or <A NAME="marker=319752">
</A>
ungrouping) the six modules into a new cell, called <SPAN CLASS="BodyComputer">
critical</SPAN>
, allows the synthesizer to reduce the critical path delay by optimizing one large module. </P>
<P CLASS="Body">
<A NAME="pgfId=297751">
</A>
At present the last module in the critical path is <SPAN CLASS="BodyComputer">
output_decision</SPAN>
. This combinational logic adds 2–3 ns to the output delay requirement of 4 ns (this means the outputs of the module <SPAN CLASS="BodyComputer">
metric</SPAN>
must be stable 6–7 ns before the rising clock edge). Registering the output reduces this overhead and removes the module <SPAN CLASS="BodyComputer">
output_decision</SPAN>
from the critical path. The disadvantage is an increase in latency by one clock cycle, but the latency is already 12 clock cycles in this design. If registering the output decreases the critical path delay by more than a factor of 12 / 13, performance will still improve.</P>
<P CLASS="Body">
<A NAME="pgfId=269241">
</A>
To register the output, alter the code (on pages 575–576) as follows:</P>
<P CLASS="ComputerFirst">
<A NAME="pgfId=251151">
</A>
<B CLASS="Keyword">
module</B>
viterbi_ASIC</P>
<P CLASS="Computer">
<A NAME="pgfId=251156">
</A>
... </P>
<P CLASS="Computer">
<A NAME="pgfId=251244">
</A>
<B CLASS="Keyword">
wire</B>
[2:0] Out, Out_r; // Change: add Out_r.</P>
<P CLASS="Computer">
<A NAME="pgfId=251175">
</A>
...<B CLASS="Keyword">
</B>
</P>
<P CLASS="Computer">
<A NAME="pgfId=251243">
</A>
asPadOut #(3,"30,31,32") u30 (padOut, Out_r); // Change: Out_r.</P>
<P CLASS="Computer">
<A NAME="pgfId=251185">
</A>
Outreg o_1 (Out, Out_r, Clk, Res); // Change: add output register.</P>
<P CLASS="ComputerLast">
<A NAME="pgfId=251179">
</A>
...</P>
<P CLASS="ComputerLast">
<A NAME="pgfId=251896">
</A>
<B CLASS="Keyword">
endmodule</B>
</P>
<P CLASS="Computer">
<A NAME="pgfId=251897">
</A>
<B CLASS="Keyword">
module</B>
Outreg (Out, Out_r, Clk, Res); // Change: add this module.</P>
<P CLASS="Computer">
<A NAME="pgfId=251898">
</A>
<B CLASS="Keyword">
input</B>
[2:0] Out; <B CLASS="Keyword">
input</B>
Clk, Rst; <B CLASS="Keyword">
output</B>
[2:0] Out_r; </P>
<P CLASS="Computer">
<A NAME="pgfId=251882">
</A>
dff #(3) reg1(Out, Out_r, Clk, Res);</P>
<P CLASS="ComputerLast">
<A NAME="pgfId=251902">
</A>
<B CLASS="Keyword">
endmodule</B>
</P>
<P CLASS="BodyAfterHead">
<A NAME="pgfId=269126">
</A>
These changes move the performance closer to the target. Prelayout estimates indicate the die perimeter required for the I/O pads will allow more than enough area to hold the core logic. Since there is unused area in the core, it makes sense to switch to a high-performance standard-cell library with a slightly larger cell height (96<SPAN CLASS="Symbol">
l</SPAN>
versus 72<SPAN CLASS="Symbol">
l</SPAN>
). This cell library is less dense, but faster.</P>
<P CLASS="Body">
<A NAME="pgfId=251974">
</A>
Typically, at this point, the design is improved by altering the HDL, the hierarchy, and the synthesis controls in an iterative manner until the desired performance is achieved. However, remember there is still no information from the layout. The best that can be done is to estimate the contribution of the interconnect using wire-load models. As soon as possible the netlist should be passed to the floorplanner (or the place-and-route software in the absence of a floorplanner) to generate better estimates of interconnect delays.</P>
<TABLE>
<TR>
<TD ROWSPAN="1" COLSPAN="2">
<P CLASS="TableTitle">
<A NAME="pgfId=393981">
</A>
TABLE 12.13 <A NAME="26932">
</A>
Critical-path timing report for the Viterbi decoder.</P>
</TD>
</TR>
<TR>
<TD ROWSPAN="1" COLSPAN="1">
<P CLASS="TableFirst">
<A NAME="pgfId=393985">
</A>
Instance name</P>
</TD>
<TD ROWSPAN="1" COLSPAN="1">
<P CLASS="TableFirst">
<A NAME="pgfId=393990">
</A>
Delay information<A HREF="#pgfId=393989" CLASS="footnote">
1</A>
</P>
</TD>
</TR>
<TR>
<TD ROWSPAN="1" COLSPAN="1">
<P CLASS="Computer">
<A NAME="pgfId=393992">
</A>
v_1.u100</P>
<P CLASS="Computer">
<A NAME="pgfId=393993">
</A>
</P>
<P CLASS="Computer">
<A NAME="pgfId=393994">
</A>
u1.subout5.Q_ff_b0</P>
<P CLASS="Computer">
<A NAME="pgfId=393995">
</A>
B1_i67 </P>
<P CLASS="Computer">
<A NAME="pgfId=393996">
</A>
B1_i66 </P>
<P CLASS="Computer">
<A NAME="pgfId=393997">
</A>
B1_i64 </P>
<P CLASS="Computer">
<A NAME="pgfId=393998">
</A>
B1_i68 </P>
<P CLASS="Computer">
<A NAME="pgfId=393999">
</A>
B1_i316</P>
<P CLASS="Computer">
<A NAME="pgfId=394000">
</A>
u3.add_rip1.u4</P>
<P CLASS="Computer">
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?