📄 node29.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
represent computation and dashed lines represent communication
operations.  In both (a) and (b), processor P1 generates a request to
processor P2 at time <em> t+2</em>
 and receives a reply at time <em> t+8</em>
.
In both cases, the cost of actually sending the message is assumed to
be 1 time unit.  In (a), P1 has no other useful work to do while
waiting for the reply and hence is idle for five time units after
sending the message.  In (b), P1 switches to another task as soon the
request is generated.  As this task requires five time units to
complete, P1 is never idle.</em><A NAME=figover>&#160;</A><BR>
<P>
<P>
<A NAME=3287>&#160;</A>
<BR><HR>
<b> Example <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img423.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img423.gif">.<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img415.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img415.gif">    Finite Difference</b>:<A NAME=exfdx>&#160;</A>
<P>
Throughout this chapter, we use a parallel finite difference algorithm
similar to the atmosphere model considered in Section <A HREF="node20.html#secatmsum" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html#secatmsum">2.6</A>
to illustrate how performance models are developed and used.  For
simplicity, we assume a grid of size <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img416.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img416.gif"> points, where
<em> Z</em>
 is the number of points in the vertical dimension.  Initially,
we assume that this grid is decomposed in one horizontal dimension and
partitioned among <em> P</em>
 tasks, with each task responsible for a
subgrid of size <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img417.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img417.gif">.  Each task performs the same
computation on each grid point and at each time step.  Because the parallel
algorithm does not replicate computation, we can model computation
time in a single time step as
<P><A NAME=eq87>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img418.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img418.gif"><P>
where <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img419.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img419.gif"> is the average computation time at a single grid point.
<P>
As in Section <A HREF="node20.html#secatmsum" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html#secatmsum">2.6</A>, we consider a nine-point stencil,
meaning that each task must exchange <em> 2 N Z</em>
 data points with two
neighboring tasks, for a total of two messages and <em> 4 N Z</em>
 data.
(We assume that each processor is allocated at least a
2<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img420.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img420.gif"><em> N</em>
 subgrid; if not, communications will be required
with more than two neighbors.  Hence, the performance model that we
develop does not apply on more than <em> N/2</em>
 processors.)  The total
communication cost, summed over <em> P</em>
 processors, is
<P>
<P><A NAME=eq76>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img421.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img421.gif"><P>
If <em> P</em>
 divides <em> N</em>
 and the amount of computation per grid
point is a constant, idle time can be expected to be negligible
in this example.  In these circumstances, we can combine
Equations <A HREF="node29.html#eq87" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq87">3.2</A> and <A HREF="node29.html#eq76" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq76">3.3</A> to obtain the following
performance model:
<P><A NAME=eqfdt>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img422.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img422.gif"><P>
<P>
<BR><HR>
<P>
<A NAME=3320>&#160;</A>
<H2><A NAME=SECTION02432000000000000000>3.3.2 Efficiency and Speedup</A></H2>
<P>
<A NAME=3322>&#160;</A>
Execution time is not always the most convenient metric by which to
evaluate parallel algorithm performance.  As execution time tends to
<A NAME=3323>&#160;</A>
vary with problem size, execution times must be normalized when
comparing algorithm performance at different problem sizes.
<A NAME=3324>&#160;</A>
Efficiency---the fraction of time that processors spend doing useful
work---is a related metric that can sometimes provide a more
convenient measure of parallel algorithm quality.  It characterizes
the effectiveness with which an algorithm uses the computational
resources of a parallel computer in a way that is independent of
problem size.  We define <em> relative efficiency</em> as
<P>
<P><A NAME=eq45>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img424.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img424.gif"><P>
where <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img425.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img425.gif"> is the execution time on one processor and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img426.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img426.gif"> is the
time on <em> P</em>
 processors.  The related quantity <em> relative
speedup</em>,
<P><A NAME=eq46>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img427.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img427.gif"><P>
is the factor by which execution time is reduced on <em> P</em>
 processors:
<P>
<A NAME=3339>&#160;</A>
The quantities defined by Equations <A HREF="node29.html#eq45" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq45">3.5</A> and <A HREF="node29.html#eq46" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq46">3.6</A>
<A NAME=3342>&#160;</A>
are called <em> relative
 </em> efficiency and speedup because they are
defined with respect to the parallel algorithm executing on a single
processor.  They are useful when exploring the scalability of an
algorithm but do not constitute an absolute figure of merit.  For
example, assume that we have a parallel algorithm that takes 10,000
seconds on 1 processor and 20 seconds on 1000 processors.  Another
algorithm takes 1000 seconds on 1 processor and 5 seconds on 1000
processors. Clearly, the second algorithm is superior for <em> P</em>
 in
the range 1 to 1000.  Yet it achieves a relative speedup of only 200,
compared with 500 for the first algorithm.
<P>
When comparing two algorithms, it can be useful to have an
algorithm-independent metric other than execution time.  Hence, we
<A NAME=3345>&#160;</A>
define <em> absolute
 </em> efficiency and speedup, using as the
baseline <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img428.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img428.gif"> the uniprocessor time for the best-known algorithm.  In
many cases, this ``best'' algorithm will be the best-known
uniprocessor (sequential) algorithm.  From this point forward, we
shall frequently use the terms efficiency and speedup without
<A NAME=3347>&#160;</A>
qualifying them as relative or absolute.  However, the context will
always make clear which is meant.
<P>
<BR><HR>
<b> Example <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img432.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img432.gif">.<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img429.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img429.gif">    Efficiency of Finite Difference Algorithm</b>:<A NAME=exfdy>&#160;</A>
<P>
<A NAME=3350>&#160;</A>
In the finite difference algorithm, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img430.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img430.gif">, and so from
Equation <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A> we have the following model for efficiency in the
absence of load imbalances and when <em> P</em>
 divides <em> N</em>
:
<P><A NAME=eqfdte>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img431.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img431.gif"><P>
Because the uniprocessor algorithm is identical to the parallel algorithm
when <em> P=1</em>
, this equation represents absolute efficiency.
<P>
<BR><HR>
<P>

<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2201 HREF="node28.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node28.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2209 HREF="node30.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node30.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2207 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2211 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2212 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2210 HREF="node30.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node30.html">3.4 Scalability Analysis</A>
<B>Up:</B> <A NAME=tex2html2208 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html">3 A Quantitative Basis for Design</A>
<B> Previous:</B> <A NAME=tex2html2202 HREF="node28.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node28.html">3.2 Approaches to Performance Modeling</A>
<BR><HR><P>
<P><ADDRESS>
<I>&#169 Copyright 1995 by <A href="msgs0.htm#6" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#6">Ian Foster</a></I>
</ADDRESS>
</BODY>
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -