📄 node30.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
keep efficiency constant.  In contrast, an algorithm with a quadratic
or exponential isoefficiency function would be poorly scalable.
<P>
Recall that efficiency <em> E</em>
 is defined as the ratio between
execution time on a single processor and total execution time summed
over <em> P</em>
 processors:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img452.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img452.gif"><P>
Hence, to maintain constant efficiency <em> E</em>
, the
following relation must hold for increasing <em> P</em>
:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img453.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img453.gif"><P>
That is, uniprocessor time must increase at the same rate as total
parallel time or, equivalently, the amount of essential computation
must increase at the same rate as overheads due to replicated
computation, communication, and idle time.
<P>
Scaled problem analysis does not make sense for all problems.
Real-time constraints, for example in weather forecasting, may require
that computation be completed in a fixed amount of time.  In other
applications, scaling is not possible because of physical constraints
on problem size.  For example, in molecular modeling, the number of
atoms in a molecule is fixed, as is the number of pixels in
image-processing applications.
<P>
<BR><HR>
<b> Example <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img477.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img477.gif">.<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img454.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img454.gif">    Isoefficiency of Finite Difference Algorithms</b>:<A NAME=experfgrid>&#160;</A>
<P>
We use isoefficiency analysis to examine the scalability of two
<A NAME=3439>&#160;</A>
parallel finite difference algorithms.  Recall that the efficiency of
an algorithm based on a 1-D decomposition of an <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img455.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img455.gif">
grid is given by Equation <A HREF="node29.html#eqfdte" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdte">3.7</A>.  For constant efficiency, a
function of <em> P</em>
, when substituted for <em> N</em>
, must satisfy the
following relation for increasing <em> P</em>
 and constant <em> E</em>
:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img456.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img456.gif"><P>
<P>
The function <em> N=P</em>
 satisfies this requirement, and yields the
following relation, which is valid for all except small
<em> P</em>
, when the <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img457.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img457.gif"> term becomes significant:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img458.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img458.gif"><P>
Because the finite difference computation operates on a square grid,
scaling <em> N</em>
 with <em> P</em>
 causes the number of grid points and thus
the amount of computation to scale as <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img459.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img459.gif">.  Hence, we say that the
isoefficiency function for this algorithm is <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img460.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img460.gif">, meaning that
the amount of computation must increase as the square of the number of
processors in order for constant efficiency to be maintained.
Figure <A HREF="node30.html#figfd15" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node30.html#figfd15">3.7</A> illustrates why this is so.
<P>
<P><A NAME=4756>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img463.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img463.gif">
<BR><STRONG>Figure 3.7:</STRONG> <em> Scaling a finite difference algorithm based on a 1-D
decomposition.  In (a), <em> N=8</em>
 and <em> P=2</em>
.  Each task has 32
grid points and must communicate with two neighbors.  In (b),
<em> P</em>
 is doubled while <em> N</em>
 stays the same.  Total computation costs stay
the same but communication costs double, so efficiency is reduced.  In
(c), both <em> P</em>
 and <em> N</em>
 are doubled, thereby increasing both
computation costs and the <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img462.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img462.gif"> component of communication costs by a
factor of four; hence, efficiency remains the
same.</em><A NAME=figfd15>&#160;</A><BR>
<P>
<P>
As a second example, consider a two-dimensional decomposition of the
finite difference problem.  Here, each task is responsible for
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img464.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img464.gif"> points and must exchange
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img465.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img465.gif"> points with each of four neighbors at each time step.
Hence,
<P><A NAME=eqfd2>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img466.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img466.gif"><P>
For constant efficiency, a function of <em> P</em>
, when
substituted for <em> N</em>
, must satisfy the following relation for
increasing <em> P</em>
:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img467.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img467.gif"><P>
The function <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img468.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img468.gif"> meets this requirement and gives the
following relation which is valid for all values of <em> P</em>
:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img469.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img469.gif"><P>
Because total computation is again proportional to <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img470.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img470.gif">, the
isoefficiency function is <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img471.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img471.gif">.  This analysis shows that a 2-D
decomposition is more scalable than a 1-D decomposition.
<P>
This example illustrates a general rule: Higher-dimensional
decompositions tend to be more efficient than lower-dimensional
decompositions.  To understand why, consider Equations <A HREF="node29.html#eqfdte" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdte">3.7</A>
<A NAME=3479>&#160;</A>
and <A HREF="node30.html#eqfd2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node30.html#eqfd2">3.8</A>.  While the 2-D decomposition sends slightly more
messages (four instead of two), data volume is reduced by a factor of
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img472.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img472.gif">, from <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img473.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img473.gif"> to <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img474.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img474.gif">.  Total communication
costs are reduced unless <em> P</em>
 and <em> N</em>
 are small or <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img475.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img475.gif"> is
much larger than <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img476.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img476.gif">.
<P>
<BR><HR><H2><A NAME=SECTION02443000000000000000>3.4.3 Execution Profiles</A></H2>
<P>
<A NAME=sec3prof>&#160;</A>
<P>
If scalability analysis suggests that performance is poor on problem
sizes and computers of interest, we can use models to identify likely
sources of inefficiency and hence areas in which an algorithm can be
improved.
<P>
Poor performance may be due to excessive replicated computation, idle
time, message startups, data transfer costs, or some combination of
these factors.  An important first step when attempting to improve an
algorithm is to identify which of these factors is dominant.  We can
do this by computing an expected <em> execution profile
 </em> for the
<A NAME=3486>&#160;</A>
algorithm, indicating the contributions of these different factors to
execution time as a function of <em> N</em>
 and/or <em> P</em>
.  This approach
is illustrated in Figure <A HREF="node30.html#figfdcomp" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node30.html#figfdcomp">3.8</A> for the 1-D finite
difference algorithm.  The model predicts that when this algorithm is
executed on a multicomputer with a single vertical layer (<em> Z=1</em>
),
data transfer costs dominate execution time when <em> P</em>
 is large;
message startup costs are also significant.  If the number of vertical
levels is increased, message startup costs become negligible, and
overall efficiency improves.
<P>
<P><A NAME=4794>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img484.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img484.gif">
<BR><STRONG>Figure 3.8:</STRONG> <em> Contributions of computation, message startup, and message
transfer costs to total execution time in the 1-D finite difference
algorithm, for <em> N=512</em>
, <em> Z=1</em>
, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img481.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img481.gif">sec,
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img482.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img482.gif">sec, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img483.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img483.gif">sec, and varying <em> P</em>
.  There is no
replicated computation or idle time in this
example.</em><A NAME=figfdcomp>&#160;</A><BR>
<P>
<P>
Cost information of this sort can be used to guide a redesign of an
algorithm.  Often, it may motivate us to reconsider decisions made
earlier in the design process.  For example, if replicated computation
is reducing performance, then we may wish to reconsider an alternative
algorithm, previously discarded, that avoids replicated computation at
the expense of increased communication.  Alternatively, high message
startup costs may suggest further agglomeration so as to increase
granularity.  Similarly, if data transfer costs are high, we may seek
to replicate computation or send more, smaller messages if doing so
can reduce the total volume of data transferred.
<P>
<A NAME=3498>&#160;</A>
<P>

<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2213 HREF="node29.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2221 HREF="node31.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node31.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2219 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2223 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2224 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2222 HREF="node31.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node31.html">3.5 Experimental Studies</A>
<B>Up:</B> <A NAME=tex2html2220 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html">3 A Quantitative Basis for Design</A>
<B> Previous:</B> <A NAME=tex2html2214 HREF="node29.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html">3.3 Developing Models</A>
<BR><HR><P>
<P><ADDRESS>
<I>&#169 Copyright 1995 by <A href="msgs0.htm#6" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#6">Ian Foster</a></I>
</ADDRESS>
</BODY>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -