doc087.htm
来自「Reh Hat user manual. really goooood」· HTM 代码 · 共 60 行
HTM
60 行
<html><body><a href="doc088.html"><img src=../icons/next.gif alt="Next"></a><a href="doc000.html"><img src=../icons/up.gif alt="Up"></a><a href="doc086.html"><img src=../icons/previous.gif alt="Previous"></a><a href="doc000.html"><img src=../icons/contents.gif alt="Contents"></a><a href="doc123.html"><img src=../icons/index.gif alt="Index"></a><hr><h2><a name="sC.4">C.4 21064 performance vs 21066 performance</a></h2><title>21064 performance vs 21066 performance</title><p> The 21064 and the 21066 have the same (EV4) CPU core. If the same programis run on a 21064 and a 21066, at the same CPU speed, then thedifference in performance comes only as a result of systemBcache/memory bandwidth. Any code thread that has a high hit-rate onthe <em>internal</em> caches will perform the same. There are 2 bigperformance killers:<p><ol><p><li> Code that is write-intensive. Even though the 21064 and the 21066have write buffers to swallow some of the delays, code that iswrite-intensive will be throttled by write bandwidth at the systembus. This arises because the on-chip caches are write-through.<p><li> Code that wants to treat floats as integers. The Alphaarchitecture does not allow register-register transfers from integerregisters to floating point registers. Such a conversion has to bedone via memory (And therefore, because the on-chip caches arewrite-through, via the Bcache). (Editor's note: it seems that boththe EV4 and EV45 can perform the conversion through the primary datacache (Dcache), provided that the memory is cached already. In such acase, the store in the conversion sequence will update the Dcache andthe subsequent load is, under certain circumstances, able to read theupdated d-cache value, thus avoiding a costly roundtrip to the Bcache.In particular, it seems best to execute the stq/ldt or stt/ldqinstructions back-to-back, which is somewhat counter-intuitive.)<p></ol><p> If you make the same comparison between a 21064A and a 21066A, there is anadditional factor due to the different Icache and Dcache sizes between the twochips.<p> Now, the 21164 solves both these problems: it achieve <em>much</em>higher system bus bandwidths (despite having the same number of signalpins - yes, I <em>know</em> it's got about twice as many pins as a21064, but all those extra ones are power and ground! (yes, really!!))and it has write-back caches. The only remaining problem is the answerto the question ``how much does it cost?''<p><p><hr><a href="doc088.html"><img src=../icons/next.gif alt="Next"></a><a href="doc000.html"><img src=../icons/up.gif alt="Up"></a><a href="doc086.html"><img src=../icons/previous.gif alt="Previous"></a><a href="doc000.html"><img src=../icons/contents.gif alt="Contents"></a><a href="doc123.html"><img src=../icons/index.gif alt="Index"></a><hr></body></html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?