📄 pipeline.html
字号:
<!doctype html public "-//W3C//DTD HTML 3.2//EN"><html><head><title>The Pipeline</title><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" /><meta http-equiv="content-language" content="en" /><meta name="resource-type" content="document"><meta name="copyright" content="This document copyright 2001 by Richard Murray. Use for non-profit and education purposes explicitly granted."><meta name="author" content="Richard Murray"><meta name="rating" content="general"></head><!-- /assembler/pipeline.html --><!-- --><!-- (C) Copyright 2001 Richard Murray --><!-- Designed by Richard Murray --><!-- rmurray@heyrick.co.uk --><!-- --><body bgcolor="#f0f0f0" text="#000000" link="#0022dd" vlink="#002288"><table border = "0" width="100%"> <tr> <td align=center width=100> <img src="arm3.gif" width=79 height=78 align = middle> </td> <td> <h1 align="center"><font color="#800080">The Pipeline</font></h1> </td> <td align=center width=100> <img src="arm3.gif" width=79 height=78 align = middle> </td></table><p> <p>A conventional processor executes instructions one at a time, just as you expect it to when youwrite your code. Each execution can be broken down into three parts, which anybody who haslearned this stuff at college will have <i>fetch, decode, execute</i> burned into their memory.<p>In English...<ol> <li> <i>Fetch</i><br> Retrieve the instruction from memory.<br> Don't get all techie - whether the instruction comes from system memory or the processor cache is irrelevant, the instruction is not loaded 'into' the processor until it is specifically requested. The cache simply serves to speed things up. By loading chunks of system memory into the cache, the processor can satisfy many more of its instruction fetches by pulling instructions from the cache. This is necessary because processors are very fast (StrongARMs, 200MHz+; Pentiums up to GHz!) and system memory is not (33, 66, or 133MHz). To see the effect the cache has on your processor, use <code>*Cache Off</code>. <br> <br> <li> <i>Decode</i><br> Figure out what the instruction is, and what is supposed to be done. <br> <br> <li> <i>Execute</i><br> Perform the requested operation.</ol><p>Each of these operations is performed along with the electronic 'heartbeat', the clock rate.Example clock rates for several microprocessors included in Acorn products are given here as anexample:<center><table border = 0 cellspacing = 8> <tr><td> BBC microcomputer <td> 6502 <td> 2MHz <tr><td> Acorn A310-A3000 <td> ARM 2 <td> 8MHz <tr><td> Acorn A5000 <td> ARM 3 <td> 25MHz <tr><td> Acorn A5000/I <td> ARM 3 <td> 30MHz <tr><td> RiscPC600 <td> ARM610 <td> 33MHz <tr><td> RiscPC700 <td> ARM710 <td> 40MHz <tr><td> Early PC co-processor <td> 486SXL-40 <td> 33MHz (not 40!) <tr><td> RiscPC (StrongARM) <td> SA110 <td> 202MHz - 278MHz+</table></center>As shown in the PC world, processors are running into GHz speeds (1,000,000,000 ticks/sec) whichwill necessitate much in the way of speed tweaks (huge amounts of cache, extremely optimisedpipeline) because there is no way the rest of the system can keep up. Indeed, the rest of thesystem is likely to be operating at a quarter of the speed of the processor. The RiscPC isdesigned to work, I believe, at 33MHz. That is why people thought the StrongARM wouldn't givemuch of a speed boost. However the small size of ARM programs, coupled with a rather large cache,made the StrongARM a viable proposition in the RiscPC, it bottlenecked horribly, but otherfactors meant that this wasn't so visible to the end-user, so the result was a system which ismuch faster than the ARM710. More recently, the Kinetic StrongARM processor card. This attemptsto alleviate bottlenecks by installing a big wodge of memory directly on the processor card andusing that. It even goes so far as to install the entirety of RISC OS into that memory so youaren't kept waiting for the ROMs (which are slower even than RAM).<p>There is an obvious solution. Since these three stages (fetch, decode, execute) are fairlyindependent, would it not be possible to:<pre> <i>fetch</i> instruction #3 <i>decode</i> instruction #2 <i>execute</i> instruction #1 ...then, on the next clock tick... <i>fetch</i> instruction #4 <i>decode</i> instruction #3 <i>execute</i> instruction #2 ...tick... <i>fetch</i> instruction #5 <i>decode</i> instruction #4 <i>execute</i> instruction #3</pre>In practice, the answer is yes. And this is exactly what a pipeline is. Simply by doing this,you have just made your processor three times faster!<p>Now, it isn't a perfect solution.<ul> <li> When it comes to a branch, the pipeline is dumped as instructions after a branch are not required. This is why it is preferable to use conditional execution and not branching. <br> <br> <li> Next, you have to keep in mind the program counter is ahead of the instruction that is currently being executed. So if you see an error at 'x', then the real error is quite possibly at 'x-8' (or 'x-12' for StrongARM).</ul><hr size = 3><a href="index.html#03">Return to assembler index</a><hr size = 3><address>Copyright © 2001 Richard Murray</address></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -