📄 41.html

📁 国外MPI教材
💻 HTML
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />	<style type="text/css">	body { font-family: Verdana, Arial, Helvetica, sans-serif;}	a.at-term {	font-style: italic; }	</style>	<title>OpenMP Laplace Solver Performance Characteristics</title>	<meta name="Generator" content="ATutor">	<meta name="Keywords" content=""></head><body> <p> The following graphs show the performance on the OpenMP implementation of the Laplace solver code on three platforms: an SGI Origin 2000 with 32 MIPS R12k processors, a server-class PC with 4 Intel Pentium III Xeon processors, and a 16-processor Cray SV1e. The first graph, Figure 1, is of parallel speedup, the ratio of the wallclock time needed to run the program at a given processor count to that required for a single processor. The second graph, Figure 2, is <em><a href="../glossary.html#parallel+efficiency" target="body" class="at-term">parallel efficiency</a></em>, which is the ratio of speedup to processor count; this is a measure of how far the program is from linear speedup (parallel efficiency 
  == 1).</p>

<div class="figure">
<img src="omp-speedup.JPG" alt="openmp speedup chart" width="623" height="313"><p><strong>Figure 1:</strong> Parallel Speedup</p>
  </div>

<div class="figure">
<img src="omp-pareff.JPG" alt="openmp parallel efficiency chart" align="center" width="646" height="341">

<p><strong>Figure 2:</strong> Parallel Efficiency</p>
  </div>

<p> As with the original serial implementation of the Laplace solver, the performance of the OpenMP implementation is bound primarily by memory bandwidth. This is particularly apparent on the quad Xeon system, where the performance drops when going from 2 to 4 processors. For cache-based systems, a large L2 cache helps; this is evidenced on the Origin 2000 at 8 processors, where there is a superlinear speedup from the 4 processor case due to each thread's dataset fitting in the R12k's 8MB L2 cache in the 8 processor case. </p></body></html>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -