📄 article3.htm

📁 基于FPGA的嵌入式机器人视觉识别系统模块源代码
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<td><center>
<h3>Figure 4.2:<p>Plane Separation</h3>
<p>
<img src="gnd_plane_seperation.gif"width="185"height="588">
</table>

</center>
<p>
Working with a PCB-Assembly like this, one that combines high frequency digital edges along with analog video conversion, it is extremely important to keep the analog and digital zones separated.  The CAD snippet depicted in Fig.4.2, above, shows one of the inner copper layers of the board.  This layer is the ground plane layer of the right side of the board.  If the reader compares this edge of the board artwork to the image of Fig.2 it should stand out that the separation of the ground planes runs directly under the VIPs.  The separation is at its minimum point 0.036".
<p>
The Video Input Processor is designed with all of the analog connections emanating from one side, grouped conveniently for shielding and termination.  The analog ground plane is mirrored on a second internal layer by an analog power plane.  The connection between these analog power and return planes and the digital ones takes place via a PI filter with center notch tuned to the 5th harmonic of the digital edges from the FPGAs driving circuitry.  Note: Remembering back to Fourier theory and a signals class, the critical piece in the design of the filter is that it is tuned to the frequency content of the rise of the digital edge, NOT the speed of the signals running through the drivers.
<p>
The complexity of keeping digital noise out of the analog signals is only half the battle when working with the board layout for this project.  The fact that the digital signals them selves contain very high speed edges and the setup / hold margins for the 10-ns SRAMS are so very low, even when accessed every other cycle, requires careful board layout.  Finally, this is compounded by the fact that the power supplies are 2x PCBs away which drives the need for adequate low impedance decoupling and again, PCB layout becomes extremely important.
<p>
When configured for 16-bit R,G,B operation, the SAA7111A VIP actually outputs 6 bits of green, and only 5 bits of blue and red.  Since green has the most direct correlation with black and white video (contrast) many video systems encode it with a higher bit density.  For this application, the extra LSB of green is discarded to deal with the color space as flat [5,5,5] bit encoding.  The VIP supports other modes including [8,8,8] however these are unused in this application.  The physical connections for [8,8,8] data are the same, but the data rate increases from 13.5M-words/sec to 27M-words/sec.  The FPGA can easily handle this data rate, but the author could not justify this high level of quality in an amateur robotics implementation.
<p>






<p>
<h2>FPGAs:</h2>
<p>
<table align="right"width="80"background="grid.gif"><tr><td><center>
<a href="ds077_2.pdf"><img src="adobe.gif"><p>Xilinx Data Sheet</a>
</table>
The Video Input Processor reviewed above spews forth data have a very high rate.  The remaining components of the video capture system are now left to handle a very high data rate (13.5M-words/sec) stream of digital video data + clock + syncs, that need to be transformed before it can be used.  The next functional piece in the design, the FPGA, implements these transformations as well as the rest of the functionality in the design.
<p>
It is not the intent of this article to provide an entry-level introduction to FPGAs, rather, the article will focus in on the specifics of implementation to form a generic color vision system.  For those wanting to dig a little further into the inner workings of an FPGA, the link, right, is a PDF data sheet from Xilinx.  This of course will need to be supplemented with further readings on the languages used to configure FPGAs and a firm understanding of how that language synthesizes into logic constructs within the fabric of the device.  Several references for further reading are provided at the end of this article.
<p>
As mentioned above, the FPGA used in this design is from the Spartan IIE family of FPGAs by Xilinx.  In particular this design is implemented with a <u>XCS2E-300</u>.  All logic internal to the FPGA is clocked from a single source through the FPGAs clock distribution network.  That clock is 50MHz.  The clock source is PLL locked to the processor clock, so the interface to those signals is direct.  The external VIPs however operate on separate asynchronous clocks, so their input control signals require double buffering to cross clock boundaries.  Note: <i>Driving signals across clock boundaries in a completely synchronous design is a deceivingly complex problem and warrants further research on the part of the reader.</i>
<p>
<table align="left"><tr><td><center>
<b>Figure 5.1: FPGA Loading</b>
<p>
<img src="proc_fpga.gif"width="214"height="181"></table>
<p>
Xilinx FPGAs are RAM based devices.  RAM based technology requires that all RAM cells with-in the device be re-configured (set/cleared) upon every power cycle.  There are many ways to configure an FPGA device.  For this robot application, the u-Processor is used to configure the FPGA through the connection of a few general purpose IO pins on the u-Processor to special programming port pins on the FPGA.  With this set-up the FPGA's configuration (code/logic) is stored in a static array which is compiled and linked with the processors 'C' code so that each time new processor code is loaded into FLASH, a new copy of the FPGA logic configuration is also loaded.  
<p>
Loading configuration from a u-Processor is an acceptable means of FPGA configuration however, there are a few items to take note of.  The XCS2E-300 FPGA used in this project requires over 234K-Bytes of code storage space.  This large memory requirement is more than the typical 8-bit processor's address range.  In such cases, using a flash chip loaded directly by a PC using JTAG or some other means, and automatically loaded by the FPGA at start up may be better suited.  Of course, smaller FPGAs require much less configuration space.  The data output from the Xilinx tools (after some parsing / massaging) used to configure the FPGA looks like the following:
<p>
<center><table background="grid.gif"border="8"bordercolor="blue"><tr><td>
<pre><font color="black">unsigned char fpgadata_Data[] = {
    0xff, 0xff, 0xff, 0xff, 0x55, 0x99, 0xaa, 0x66,
    0x0c, 0x80, 0x06, 0x80, 0x00, 0x00, 0x00, 0x88,
    0x0c, 0x00, 0x03, 0x80, 0x00, 0x00, 0x00, 0x00,
    .
    .
    .};
unsigned long fpgadata_ByteCount = 180252;
</pre>
</table>
</center>
<p>
Another detail for the amateur to take note of is the state of the FPGA's IO pins upon power up.  Since the device is RAM based and does not retain its configuration, then it makes sense that the IO pins are not configured out of reset either.  In this implementation, a boot mode is applied that brings the device up with all of its IO pins in tri-state mode with very weak pull-ups.
<p>
<br clear="left">
<a name="FPGAload"></a>
<table align="right"width="160"background="grid.gif">
<tr><td><center><a href="xapp176.pdf"><img src="adobe.gif"><p>XApp176 FPGA Config</a>
<td><center><a href="FPGA_load.htm"><img src="code.gif"><p>C-Code for FPGA loading</a>
</table>
<table align="left"width="120"><tr><td><center>
<a href="http://www.xilinx.com/xlnx/xebiz/designResources/ip_product_details.jsp?sGlobalNavPick=PRODUCTS&sSecondaryNavPick=Design+Tools&key=DS-ISE-WEBPACK"><img src="ise_webpack.gif"><p>Free Xilinx Software:</a>
</table>


From their web site, Xilinx provides several excellent application notes describing the different algorithms used to configure their FPGAs.  The 'C' code used to load the FPGA from the u-Processor in the generic color vision application is provided through the link to the right:
<p>
Xilinx offers free tools to compile and work with their FPGAs up to a certain size.  Unlike most companies, these tools can handle some pretty large / complex programmable logic.  The user will have to register the tools with Xilinx to get them up and running, however there is no charge for them.  These tools will compile Able, VHDL and Verilog.  These tools require the use of either windows 2000 or windows XP.  All of the code examples including the final robot application reviewed in this article with 3x vision systems and display drive functionality operating concurrently can be synthesized / fit by the freely downloaded Xilinx web-pack tools.
<p>

<br clear="right">
<br clear="left">
<h2>Verilog Source:</h2>
<p>
<a name="djv"></a>
<table align="left"background="grid.gif"width="150">
<tr><td><center><a href="djv1.htm"><img src="code.gif"><p>dj_top.v</a>
<td><center><a href="djv2.htm"><img src="code.gif"><p>ram_ scheduler.v</a>
<tr><td><center><a href="djv3.htm"><img src="code.gif"><p>fifo_34.v</a>
<td><center><a href="djv4.htm"><img src="code.gif"><p>video_ capture.v</a>
<tr><td><center><a href="djv5.htm"><img src="code.gif"><p>blob_ detection.v</a>
<td><center><a href="djv6.htm"><img src="code.gif"><p>serial_ divide.v</a>
<tr><td><center><a href="djv7.htm"><img src="code.gif"><p>dj_vid_ pins.ucf</a>
<td>
<tr><td colspan="2">A complete project file (600+ KBytes, when zipped) with all files, source, synthesized, fitted, place & route parameters are available on the author's web site. Files #15
</table>

As the reader might imagine programmable logic of this complexity takes substantial amounts of HDL code to describe.  The language used by the author to describe the logic in this application is Verilog, a 'C' like Hardware Description Language.  The files and structure for them are outlined in Fig.5.3, below.
<p>
The overview below, Fig.5.2, graphically depicts the connectivity between the files left.  Each file encompasses a block of functionality, and the arrows delineate the interconnectivity of the ports on these modules.  As the graphic shows, the highest level of data traffic is through the RAM scheduler module.  The data flow through that module is compounded when the other two vision systems (not shown here) and the LCD display drivers are layered in with their respective FIFOs.  Each of these modules moves data to & from the external SRAM through the same RAM scheduler function.
<p>
<center>
<h3>Figure 5.2: FPGA Functional Structure</h3>
<p>
<img src="floor_plan.gif">
</center>
<br clear="left"><p>

<img src="exclam.gif"align="left">
Fig.5.2, above, takes the reader a long way towards imparting an understanding of the data paths between the different modules used within the FPGA.  Understanding these linkages is important to getting the "bigger picture" of the FPGA overall architecture.  Each module (green block) with-in the FPGA represents a separate function, and a separate Verilog source file, above-left.
<p>
As the article progresses, figure 5.2 will be revisited many times.  Each time a new layer of functionality will be overlaid onto the existing code base.  The modularity / encapsulation of the language makes for easy work in these matters as the article will demonstrate in coming sections.
<p>
Fig.5.3, right, depicts the project files and structure.  The structure is important as it delineates the hierarchical relationship and port / instantiation inter-connectivity between Verilog modules.
<table align="right"><tr><td>
<center>
<h3>Figure 5.3: Verilog Hierarchy</h3>
<p>
<img src="ver_struct.gif"width="324"height="246">
</center>
</table>
<ol>
<li>The top level file contains the processor bus interface as well as the instantiation templates for all of the other hierarchically nested modules.  As mentioned previously, there is nothing special about this u-Processor bus interface.  It could be interfaced to any standard 8-bit processor with an external address & data bus.
<p>
<li>The next file (ram_schdeuler.v) contains the logic required to manage the external SRAM interface.  This logic bi-directionally, moves data into and out of the external SRAM upon prioritized request by several sources / destinations with-in the design.  Examples of this include data coming in from 3x VIPs, data going out to the LCD display being re-drawn 80x times / sec and data coming in for video overlay from the u-Processor.  This version does not implement a fairness algorithm, as calculations have shown maximum throughput loading to have no bottlenecks.
<p>
<li>Fifo_34.V is a very shallow (4-entries deep) FIFO that is 33-bits wide.  It is used to store data waiting in line as well as the addresses in which to store that data, to move out to the external SRAM from each of the afore mentioned sources.  Multiple instantiations of this function are implemented in this design.  Due to their small size, these FIFOs are implemented using distributed RAM bits across the FPGA's fabric rather than extracting them to dedicated block RAMs which consumes resources and entails longer routing paths.
<p>
<li>The file Video_Capture.V implements the basic conversion algorithm and will be reviewed in detail below.
<p>
<li>Blob_Detection.V is a simple function that stores a running sum in X&Y of the coordinates of all pixels that meet the filtering criteria.  The filtering criteria, are merely above and below cut off values for color in each R,G & B set by writing to registers through the u-Processor bus interface.
<p>
<li>Serial_Divide_UU.V is a function that, as its name implies, implements a straight forward integer divide unit capable of giving binary weighted - fractional results.  Serial divide is used to average the sums (both X & Y) tracked in blob detect.  Two parallel instantiations are implemented to correspond to the calculation for X & Y.  The function is considered a serial divider as it generates one bit of output for each clock cycle.  It is scalable to any number of bits for both inputs and outputs.  (This function was originally obtained from <a href="http://www.opencores.org">www.opencores.Org</a>, it was written by: <u>John Clayton</u>.  The file, redistributed here is done so with his permission and GNU GPL notice intact.)
<p>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -