http:^^www.cs.washington.edu^homes^jlo^papers^isca96abstract.html
来自「This data set contains WWW-pages collect」· HTML 代码 · 共 65 行
HTML
65 行
Date: Tue, 10 Dec 1996 21:10:50 GMTServer: NCSA/1.4.2Content-type: text/htmlLast-modified: Mon, 18 Mar 1996 23:21:28 GMTContent-length: 2500<html><head><title>"Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor"</title></head><body><h2>Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor</h2><hr><!WA0><!WA0><!WA0><!WA0><a href="http://www.cs.washington.edu/homes/tullsen">Dean M. Tullsen</a>,<!WA1><!WA1><!WA1><!WA1><a href="http://www.cs.washington.edu/homes/eggers">Susan J. Eggers</a>, Joel S. Emer, <!WA2><!WA2><!WA2><!WA2><a href="http://www.cs.washington.edu/homes/levy">Henry M. Levy</a>,<!WA3><!WA3><!WA3><!WA3><a href="http://www.cs.washington.edu/homes/jlo">Jack L. Lo</a>,and Rebecca L. Stamm<hr><p>Simultaneous multithreading is a technique that permits multiple independentthreads to issue multiple instructions each cycle.Previous work has demonstrated the performance potential of simultaneousmultithreading,based on a somewhat idealized model. In this paper we show that the throughputgains fromsimultaneous multithreading can be achieved <i>without</i> extensive changesto a conventional wide-issue superscalar, either in hardware structures or sizes.We present an architecture for simultaneous multithreadingthat achieves three goals: (1) it minimizes the architectural impact on theconventional superscalar design, (2) it has minimal performance impact ona single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreadingarchitecture achieves a throughput of 5.4 instructions per cycle,a 2.5-fold improvementover an unmodified superscalar with the same hardware resources.This speedup is enabled by an advantage ofmultithreading previously unexploited in otherarchitectures: the ability to favor for fetch and issuethose threads most efficiently using the processor each cycle, therebyproviding the "best" instructions to the processor. We examineseveral heuristics that allow us to identify and use the best threads for fetchand issue, and show that such heuristics can increase throughput byas much as 37%. Using the best fetch and issuealternatives, we then use bottleneck analysis to identify opportunities forfurther gains on the improved architecture.<p><hr><i><br>Proceedings of the 23rd Annual International Symposium on Computer Architecture, Philadelphia, PA, May 1996.</i><p>To get the PostScript file, click<!WA4><!WA4><!WA4><!WA4><a href="http://www.cs.washington.edu/homes/jlo/papers/isca96.ps">here</a>.<hr><address><em>jlo@cs.washington.edu </em> <br></address></body></html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?