📄 node32.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
compete for bandwidth, thereby increasing total communication costs.
This issue is discussed in Section <A HREF="node33.html#secperfrefine" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html#secperfrefine">3.7</A>.
<P>
</UL><H2><A NAME=SECTION02462000000000000000>3.6.2 Speedup Anomalies</A></H2>
<P>
<A NAME=3659>&#160;</A>
An implementation may execute faster than
predicted by a performance model.  If this effect becomes more marked
as the number of processors increases, this phenomenon is termed a
<em> speedup anomaly</em>---the observed speedup is greater than
predicted.  Sometimes, we may see a speedup that is greater
than linear, or <em> superlinear</em>.  Situations in which this can occur
<A NAME=3662>&#160;</A>
include the following:
<A NAME=3663>&#160;</A>
<P>
<UL><LI>
<em> Cache effects.</em> On most parallel computers, each processor has
<A NAME=3666>&#160;</A>
a small amount of fast memory (cache) and a larger amount of
slower memory.  When a problem is executed on a greater number of
processors, more of its data can be placed in fast memory.  As a
result, total computation time (<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img514.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img514.gif">) will tend to decrease.
<A NAME=3668>&#160;</A>
If the reduction in <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img515.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img515.gif"> from this <em> cache effect
 </em>
offsets increases in <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img516.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img516.gif"> and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img517.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img517.gif"> resulting from
the use of additional processors, then efficiency will be greater than
1 and speedup will be superlinear. Similarly, the increased physical
memory available in a multiprocessor may reduce the cost of memory
accesses by avoiding the need for virtual memory paging.
<P>
<LI>
<em> Search anomalies.</em> This phenomenon is encountered in some
parallel search algorithms, such as the branch-and-bound search of
Section <A HREF="node21.html#secfloor" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node21.html#secfloor">2.7</A>.  If a search tree contains solutions at
varying depths, then multiple depth-first searches will, on average,
explore fewer tree nodes before finding a solution than will a sequential
depth-first search.  Notice that in this case, the parallel algorithm
executed is not the same as the sequential algorithm. In fact, the
best uniprocessor algorithm may be one that pursues several searches
concurrently.
<P>
</UL>
<P>
<A NAME=3677>&#160;</A>
<P><A NAME=4861>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img522.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img522.gif">
<BR><STRONG>Figure 3.10:</STRONG> <em> Speedup of an implementation of the 1-D finite difference
algorithm with <em> N=512</em>
 and <em> Z=10</em>
 as measured on the Intel
DELTA and as predicted by both a simple performance model that does
not account for load imbalances and a more sophisticated model that
does; both models assume <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img520.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img520.gif">sec and
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img521.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img521.gif">sec.</em><A NAME=figfddeltaspeedup2>&#160;</A><BR>
<P>
<P>
<BR><HR>
<b> Example <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img535.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img535.gif">.<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img523.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img523.gif">    Evaluating a Finite Difference Program</b>:<A NAME=exfdi>&#160;</A>
<P>
We consider the behavior of an implementation of the 1-D finite
difference algorithm.  Figure <A HREF="node32.html#figfddeltaspeedup2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node32.html#figfddeltaspeedup2">3.10</A> shows
observed performance, performance predicted by Equation <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A>,
and performance predicted by a refined model that we shall develop in
the following.  We present speedups rather than raw execution times in
order to make results clearer for larger <em> P</em>
.  The predicted
performance curves use machine parameter values obtained by a fitting
process so as to take into account additional overheads not accounted
for by the ``best possible'' parameter values of Table <A HREF="node29.html#tabmach" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#tabmach">3.1</A>.
A comparison of the two sets of parameter values (<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img524.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img524.gif">sec
versus 77 <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img525.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img525.gif">sec, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img526.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img526.gif">sec versus 0.54 <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img527.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img527.gif">sec) indicates
that the finite difference implementation incurs significant overhead.
This suggests that there may be opportunities for optimization.
<P>
Figure <A HREF="node32.html#figfddeltaspeedup2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node32.html#figfddeltaspeedup2">3.10</A> shows that Equation <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A> is
inaccurate for <em> N=512</em>
 and larger values of <em> P</em>
. The observed
speedup does not increase continuously, as predicted, but in a
stepwise fashion.  This observation suggests that the model is
incorrect in assuming that some aspect of program performance varies
continuously with <em> P</em>
.  Examining Equation <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A>, we see
that only computation cost depends on <em> P</em>
; both the number of
messages and message size per processor are constant and hence
independent of <em> P</em>
.  The problem then becomes clear.
Equation <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A> assumes that each processor has
<em> N/P</em>
 columns of the grid.  In reality, <em> P</em>
 does not always divide
<em> N</em>
.  More specifically, some tasks will be allocated <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img528.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img528.gif"> grid points and others <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img529.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img529.gif"> points.
For example, if <em> N=8</em>
, <em> Z=1</em>
, and <em> P=3</em>
, some will have
<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img530.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img530.gif"> and others <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img531.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img531.gif"> grid points.  Hence, while total
computation costs are as given by Equation <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A>, the maximum
computation costs on any processor are as follows:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img532.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img532.gif"><P>
<P>
This uneven distribution of computation leads to idle time, since at each
step processors with less computation will terminate before those with more.
Total idle time is the difference between the maximum computation time
and the mean computation times, multipled by the number of processors:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img533.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img533.gif"><P>
<P>
Incorporating this idle time into Equation <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A>, we obtain the
following more general performance model:
<P>
<P><A NAME=eqfdti>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img534.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img534.gif"><P>
<P>
The second predicted performance curve in
Figure <A HREF="node32.html#figfddeltaspeedup2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node32.html#figfddeltaspeedup2">3.10</A> is obtained using this refined
model.  Notice that the two models are equivalent when <em> N</em>
 is an
integer multiple of <em> P</em>
.
<P>
<BR><HR>
<P>

<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2237 HREF="node31.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node31.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2245 HREF="node33.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2243 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2247 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2248 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2246 HREF="node33.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html">3.7 A Refined Communication Cost Model</A>
<B>Up:</B> <A NAME=tex2html2244 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html">3 A Quantitative Basis for Design</A>
<B> Previous:</B> <A NAME=tex2html2238 HREF="node31.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node31.html">3.5 Experimental Studies</A>
<BR><HR><P>
<P><ADDRESS>
<I>&#169 Copyright 1995 by <A href="msgs0.htm#6" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#6">Ian Foster</a></I>
</ADDRESS>
</BODY>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -