⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 interfacing to external static ram.mht

📁 Interfacing to External Static Ram This module colntroller is for srams
💻 MHT
📖 第 1 页 / 共 3 页
字号:
From: <Saved by Windows Internet Explorer 7>
Subject: Interfacing to External Static Ram
Date: Mon, 25 May 2009 14:50:43 +0330
MIME-Version: 1.0
Content-Type: multipart/related;
	type="text/html";
	boundary="----=_NextPart_000_0000_01C9DD48.2E3E7A00"
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138

This is a multi-part message in MIME format.

------=_NextPart_000_0000_01C9DD48.2E3E7A00
Content-Type: text/html;
	charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Content-Location: http://www.birdcomputer.ca/Documents/interfacing_to_external_static_ram.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Interfacing to External Static Ram</TITLE>
<META http-equiv=3DContent-Language content=3Den-us>
<META content=3D"MSHTML 6.00.6000.20591" name=3DGENERATOR>
<META content=3DFrontPage.Editor.Document name=3DProgId>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dwindows-1252"><!--mstheme--><LINK=20
href=3D"http://www.birdcomputer.ca/_themes/sandston/sand1011.css" =
type=3Dtext/css=20
rel=3Dstylesheet>
<META content=3D"sandston 1011, default" name=3D"Microsoft =
Theme"></HEAD>
<BODY>
<P><FONT size=3D5>Interfacing to External Static Ram</FONT></P>
<P>Robert Finch</P>
<P><A =
href=3D"mailto:email@birdcomputer.ca">email@birdcomputer.ca</A></P>
<P><FONT size=3D2><!--webbot bot=3D"Timestamp" S-Type=3D"EDITED" =
S-Format=3D"%B, %Y" startspan -->November,=20
2004<!--webbot bot=3D"Timestamp" i-checksum=3D"30640" endspan =
--></FONT></P>
<P>&nbsp;</P>
<P><FONT size=3D4>Abstract</FONT></P>
<P>This documents covers the interfacing of an external asynchronous =
static ram=20
to a system on a programmable chip (SoC). It is assumed that the system =
is=20
moderately complex, and the ram is shared between two or more system=20
devices.</P>
<P>&nbsp;</P>
<P><FONT size=3D4>Introduction</FONT></P>
<P>Many systems need to make use of an external RAM as the memory =
provided on=20
the programmable chip may be insufficient to meet the needs of the =
application.=20
Static RAM is often used because it offers the highest speed, low power=20
consumption and simple interfacing. For true random access cycles such =
as when=20
multiple devices access the same RAM, static RAM provides the highest=20
performance.</P>
<P>The external static RAM is likely to be shared by several devices in =
the=20
system, for example a cpu and separate DMA channels for a video =
controller and=20
disk controller. For the best system performance, it is desirable to =
obtain=20
maximum use of the RAM=92s memory bandwidth. One way to minimize the =
cycle time of=20
the RAM is to drive it directly from flip-flops (ff=92s) without any =
intervening=20
logic, and receive the output of the RAM directly into ff=92s, once =
again without=20
any intervening logic. Placing the RAM directly between ff=92s allows =
for the use=20
of registers close to the I/O ports interfaced to the RAM. This =
minimizes=20
propagation delays transferring data between the FPGA and external =
static=20
RAM.</P>
<P>&nbsp;</P>
<P><FONT size=3D4>Interfacing Options</FONT></P>
<P>There are several different ways to interface to an external RAM. =
Probably=20
the simplest method is to directly drive the RAM inputs from the SoC =
address bus=20
and place the RAM outputs directly on the SoC databus. By directly it is =
meant=20
without registering the signals beforehand. The advantage of this =
approach is=20
that it=92s fairly straightforward to implement, the disadvantage is =
that it can=20
be severely lacking in performance. The problem is that the system =
performance=20
is limited due to the combination of bus multiplexing, routing and the =
RAM=20
access time. This results in a long cycle time required to access the =
RAM.=20
Without registering the signals, the RAM access time would include =
address=20
multiplexing, routing, and decoding times, as well as data input bus=20
multiplexing, and routing times. One way to decrease the cycle time, and =

increase performance, is to break the ram access across multiple cycles =
rather=20
than having one long cycle. See the following example for cycle time=20
calculations.</P>
<P>Example (cycle time):</P>
<P>Access time for read access by a cpu (the long cycle time):</P>
<UL>
  <OL><FONT size=3D2>
    <LI>Request from cpu 0ns=20
    <LI>address bus mux 10ns (muxing in other address sources such as =
video=20
    controller, disk dma)=20
    <LI>routing delay 10ns for the address bus (assuming a reasonably =
sized=20
    system)=20
    <LI>I/O pad delay 5 ns (out to the ram)=20
    <LI>Ram read access time 15ns=20
    <LI>I/O pad delay 5 ns (back onto the FPGA)=20
    <LI>Databus routing delay 10 ns=20
    <LI>Databus input mux delay 10ns </LI></OL></UL></FONT>
<P>Total time for read access by cpu: 65ns. With a 16 bit SRAM this =
limits=20
system performance to about 30MB/s.</P>
<P>Breaking the access into multiple cycles reduces the cycle time, =
because the=20
cycle time becomes limited by the slowest stage.</P>
<P>Access time for read using registered inputs and outputs:</P><FONT =
size=3D2>
<P>Stage 1:</P>
<OL>
  <LI>Request from cpu 0 ns=20
  <LI>address bus mux&nbsp;&nbsp;&nbsp; 10 ns=20
  <LI>routing delay&nbsp;&nbsp;&nbsp; 10 ns </LI></OL>
<P>Stage 2</P>
<OL>
  <LI>Latched address from previous stage 0 ns=20
  <LI>I/O pad delay 5ns=20
  <LI>Ram read access time 15 ns=20
  <LI>I/O pad delay 5 ns </LI></OL>
<P>Stage 3:</P></FONT>
<OL>
  <LI><FONT size=3D2>databus routing delay 10ns</FONT>=20
  <LI><FONT size=3D2>databus input mux delay 10 ns</FONT> =
</LI></OL><FONT size=3D2>
<P>&nbsp;</P></FONT>
<P>As can be seen from stage 2, the longest delay (which sets the cycle=20
time)&nbsp; is 25 ns.</P><FONT size=3D2></FONT>
<P>Total time for read access: 25 ns. With a 16 bit SRAM this limits =
system=20
performance to 80 MB/s.</P>
<P>&nbsp;</P>
<P>Simply by breaking the RAM access into stages, performance has almost =
been=20
tripled (according to the cycle time decrease). However it now requires =
three=20
clock cycles to access the RAM.</P>
<P>&nbsp;</P>
<P><FONT size=3D4>Pipelining</FONT></P>
<P>It would seem that spreading out the RAM access across three clock =
cycles,=20
rather than using a single clock cycle that is three times as long =
wouldn't have=20
any impact on performance, but it does. Even if nothing else is done to =
the RAM=20
interface, the clock cycle time can now be one-third of what it was =
before. This=20
means that operations that don't use the external static RAM can now =
occur three=20
times as fast. For example, a cpu that executes some code from ROM =
internal to=20
the chip can now benefit from a faster clock cycle.</P>
<P>Another way to significantly increase performance is to pipeline =
access to=20
the RAM. With access occurring across multiple clock cycles and because =
the RAM=20
inputs and outputs are being registered, it is possible to pipeline the =
access=20
to RAM. The RAM access really does still occur within a single clock =
cycle that=20
is now three times as fast due to registering. The only problem is the =
delay=20
cause by the multiple stages. However, with the access broken into =
stages, each=20
stage can represent a <I>different</I> RAM access. For instance, stage =
one could=20
be in the process of servicing a request for a video controller, while =
stage two=20
is performing an access for the cpu, while stage three is providing the =
result=20
for a DMA controller. When there are multiple devices accessing the RAM, =
the=20
best use of the RAM bandwidth can be obtained, even when the devices =
accessing=20
the RAM are not pipelined themselves. While one device (bus master) is =
waiting=20
for a response from the RAM, the RAM subsystem can be busy accessing =
data for=20
another device.</P>
<P>Example:</P>
<P>In a small SoC the devices consists of a non-pipelined 6502 =
compatible=20
processor, and a bitmapped video display controller. The video display=20
controller requires approximately 50% of the RAM bandwidth (an access =
every=20
other cycle). So the cpu may get access to RAM every other cycle. =
However=20
because the cpu is not pipelined, it must wait for an acknowledge from =
RAM=20
before proceeding. It takes three clock cycles from the time the cpu =
requests=20
access until a ready response is received from the RAM. The video =
display=20
controller is pipelined. The display controller accesses are effectively =
hidden=20
because they are interspersed with the cpu accesses in a pipelined =
fashion. The=20
result is to effectively double the system performance over what would =
be=20
obtainable without pipelined RAM access.</P>
<P>One nice thing about pipelined RAM access is that write cycles may =
return a=20
ready status in a single cycle, as soon as they are posted to the RAM =
subsystem.=20
Once the write address and data are latched into the system, there is no =
need to=20
wait any longer. The write will eventually take place.</P>
<P>&nbsp;</P>
<P><FONT size=3D4>Sample Code</FONT></P>
<P>The following RAM controller is written in Verilog and allows =
pipelined=20
access to an external asynchronous static RAM. One feature of the RAM =
controller=20
is that it uses a bit-vector to track which device is requesting access, =
and to=20
which device an acknowledge should be sent. In this case there are only =
two=20
devices (a cpu and video controller). It is assumed that an acknowledge =
is not=20
required for write accesses, as the external system will provide a write =

acknowledge as soon as the write is posted. The sample code also =
interfaces an=20
eight bit SoC bus to an external sixteen bit RAM, so there is some =
multiplexing=20
involved. There are two sets of input / output signals, signals that =
interface=20
to the SoC and signals that interface to the RAM, so the controller acts =
as kind=20
of a bridge.</P>
<P>This interface has been tested at 28.636 MHz, it will probably work =
at=20
upwards of 40 MHz.</P>
<P>&nbsp;</P>
<P>module RAMCtrl4(rst, clk, clk90, cs, req, ack, addr, wr, din,=20
dout,<BR>&nbsp;&nbsp;&nbsp; ram_we0, ram_we1, ram_we, ram_oe, ram_ce, =
ram_a,=20
ram_d);<BR>&nbsp;&nbsp;&nbsp; // system side =
connections<BR>&nbsp;&nbsp;&nbsp;=20
input rst; // reset<BR>&nbsp;&nbsp;&nbsp; input clk; // system=20
clock<BR>&nbsp;&nbsp;&nbsp; input clk90; // 90 deg. phase shifted clock =
for=20
write timing<BR>&nbsp;&nbsp;&nbsp; input cs; // circuit=20
select<BR>&nbsp;&nbsp;&nbsp; input [1:0] req; // identifies device =
requesting=20
access<BR>&nbsp;&nbsp;&nbsp; output [1:0] ack; // identifies device for =
which=20
data is available<BR>&nbsp;&nbsp;&nbsp; reg [1:0] =
ack;<BR>&nbsp;&nbsp;&nbsp;=20
input [17:0] addr; // address<BR>&nbsp;&nbsp;&nbsp; input wr; // write=20
signal<BR>&nbsp;&nbsp;&nbsp; input [7:0] din; // data=20
input<BR>&nbsp;&nbsp;&nbsp; output [7:0] dout; // data=20
output<BR>&nbsp;&nbsp;&nbsp; reg [7:0] dout;<BR>&nbsp;&nbsp;&nbsp; // =
RAM side=20
connections<BR>&nbsp;&nbsp;&nbsp; output ram_we0; // low byte=20
write<BR>&nbsp;&nbsp;&nbsp; output ram_we1; // high byte=20
write<BR>&nbsp;&nbsp;&nbsp; output ram_we; // generic=20
write<BR>&nbsp;&nbsp;&nbsp; output ram_oe; // output=20
enable<BR>&nbsp;&nbsp;&nbsp; output ram_ce; // chip =
enable<BR>&nbsp;&nbsp;&nbsp;=20
output [17:0] ram_a; // address<BR>&nbsp;&nbsp;&nbsp; inout [15:0] =
ram_d; //=20
data<BR><BR>reg [1:0] req1; // holds request id for intermediate =
pipeline=20
stage<BR>reg ram_ce;<BR>reg ram_oe;<BR>reg ce1;<BR>reg we; // registed=20
signals<BR>reg we0, we1;<BR>reg wr0, wr1;<BR>reg [17:0] ram_a;<BR>reg =
[7:0] dol;=20
// data output latch<BR>reg addr0; // intermediate: address bit =
zero<BR><BR>wire=20
wr0x =3D wr &amp; ~addr[0];<BR>wire wr1x =3D wr &amp; addr[0];<BR>assign =
ram_d[7:0]=20
=3D wr0 ? dol : 8'bz;<BR>assign ram_d[15:8] =3D wr1 ? dol : =
8'bz;<BR>assign ram_we =3D=20
~(we &amp; clk90);<BR>assign ram_we0 =3D ~(wr0 &amp; clk90);<BR>assign =
ram_we1 =3D=20
~(wr1 &amp; clk90);<BR><BR>always @(posedge clk)<BR>&nbsp;&nbsp;&nbsp; =
if (rst)=20
begin<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; we &lt;=3D=20
0;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; wr0 &lt;=3D=20
0;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; wr1 &lt;=3D=20
0;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ram_ce &lt;=3D=20
1;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ram_oe &lt;=3D=20
1;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ram_a &lt;=3D=20
0;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dol &lt;=3D=20
0;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; req1 &lt;=3D=20
2'b0;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ack &lt;=3D=20
2'b0;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; addr0 &lt;=3D=20
0;<BR>&nbsp;&nbsp;&nbsp; end<BR>&nbsp;&nbsp;&nbsp; else=20
begin<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // register RAM=20
inputs<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // stage=20
1<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; addr0 &lt;=3D=20
addr[0];<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ram_a &lt;=3D =
addr[17:1];=20
// address<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ram_ce &lt;=3D=20
~cs;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ram_oe &lt;=3D=20
wr;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; we &lt;=3D=20
wr;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // On a write cycle we =
assume=20
no ack is required<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; req1 =
&lt;=3D wr ?=20
2'b00 : cs ? req : 2'b00;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
dol=20
&lt;=3D din;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; wr0 &lt;=3D=20
wr0x;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; wr1 &lt;=3D=20
wr1x;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // stage=20
2<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dout &lt;=3D addr0 ? =
ram_d[15:8] :=20
ram_d[7:0];<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ack &lt;=3D=20
req1;<BR>&nbsp;&nbsp;&nbsp; end<BR><BR>endmodule<BR><BR>&nbsp;</P>
<P>&nbsp;</P></BODY></HTML>

------=_NextPart_000_0000_01C9DD48.2E3E7A00
Content-Type: text/css;
	charset="windows-1256"
Content-Transfer-Encoding: quoted-printable
Content-Location: http://www.birdcomputer.ca/_themes/sandston/sand1011.css

.mstheme {
	nav-banner-image: url(astonbnr.gif); separator-image: =
url(astonrul.gif); list-image-1: url(astonbu1.gif); list-image-2: =
url(astonbu2.gif); list-image-3: url(astonbu3.gif); =
navbutton-horiz-pushed: url(astonhbs.gif); navbutton-horiz-normal: =
url(astonhb.gif); navbutton-horiz-hovered: url(astonhbh.gif); =
navbutton-vert-pushed: url(astonvbs.gif); navbutton-vert-normal: =
url(astonvb.gif); navbutton-vert-hovered: url(astonvbh.gif); =
navbutton-home-normal: url(astonhom.gif); navbutton-home-hovered: =
url(astonhomh.gif); navbutton-home-pushed: url(blhomep.gif); =
navbutton-up-normal: url(astonup.gif); navbutton-up-hovered: =
url(astonuph.gif); navbutton-up-pushed: url(blupp.gif); =
navbutton-prev-normal: url(astonpre.gif); navbutton-prev-hovered: =
url(astonpreh.gif); navbutton-prev-pushed: url(blprevp.gif); =
navbutton-next-normal: url(astonnxt.gif); navbutton-next-hovered: =
url(astonnxth.gif); navbutton-next-pushed: url(blnextp.gif)
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -