📄 http:^^www.cs.washington.edu^research^smt^papers^isca95abstract.html
字号:
Date: Tue, 10 Dec 1996 03:22:07 GMTServer: NCSA/1.4.2Content-type: text/html<html><head><title>"Simultaneous Multithreading: Maximizing On-Chip Parallelism"</title></head><body><h2>Simultaneous Multithreading: Maximizing On-Chip Parallelism</h2><hr><!WA0><!WA0><!WA0><a href="http://www.cs.washington.edu/homes/tullsen">Dean M. Tullsen</a>,<!WA1><!WA1><!WA1><a href="http://www.cs.washington.edu/homes/eggers">Susan J. Eggers</a>, and<!WA2><!WA2><!WA2><a href="http://www.cs.washington.edu/homes/levy">Henry M. Levy</a>,<hr><p>The increase in component density on modern microprocessors hasled to a substantial increase in on-chip parallelism. In particular, modern superscalar RISCs can issue several instructions to independent functional units each cycle. However, the benefit of such superscalar architectures is ultimately limited by the parallelism available in a single thread. <p>This paper examines <i>simultaneous multithreading</i>, a techniquepermitting several independent threads to issue instructions to a superscalar's multiplefunctional units in a single cycle. In the most general case, the binding between thread and functional unit is completely dynamic.We present severalmodels of simultaneous multithreading and compare them with wide superscalar,fine-grain multithreaded, and single-chip, multiple-issue multiprocessing architectures. To perform these evaluations, we simulate a simultaneous multithreadedarchitecture based on the DEC Alpha 21164 design, and execute codegenerated by the Multiflow trace scheduling compiler. Our results show that:(1) No single latency-hiding technique is likely to produce acceptable utilization of wide superscalar processors. Increasing processorutilization will therefore require a new approach,one that attacks multiple causes of processor idlecycles.(2) Simultaneous multithreading is such a technique.With our machine model, an 8-thread, 8-issuesimultaneous multithreaded processorsustains over 5 instructions per cycle, whilea single-threaded processorcan sustain fewer than 1.5 instructions per cycle with similar resources and issuebandwidth.(3) Multithreaded workloads degrade cache performance relative tosingle-thread performance, as previous studies have shown. We evaluateseveral cache configurations and demonstrate that private instructionand shared data caches provide excellent performance regardless of thenumber of threads.(4) Simultaneous multithreading is an attractive alternative tosingle-chip multiprocessors. We show that simultaneous multithreadedprocessors with a variety of organizations are all superior to conventional multiprocessors with similar resources.<p>While simultaneous multithreading has excellent potential to increaseprocessor utilization, it can add substantial complexity to the design.We examine many of these complexities and evaluate alternativeorganizations in the design space.<p><hr><i><br>Proceedings of the 22rd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995.</i><p>To get the PostScript file, click<!WA3><!WA3><!WA3><a href="http://www.cs.washington.edu/research/smt/papers/ISCA95.ps">here</a>.<hr><address><em>jlo@cs.washington.edu </em> <br></address></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -