📄 node30.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
12 下一页
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>3.4 Scalability Analysis</TITLE>
</HEAD>
<BODY>
<meta name="description" value="3.4 Scalability Analysis">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
 <BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2213 HREF="node29.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2221 HREF="node31.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node31.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2219 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2223 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2224 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2222 HREF="node31.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node31.html">3.5 Experimental Studies</A>
<B>Up:</B> <A NAME=tex2html2220 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html">3 A Quantitative Basis for Design</A>
<B> Previous:</B> <A NAME=tex2html2214 HREF="node29.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html">3.3 Developing Models</A>
<BR><HR><P>
<H1><A NAME=SECTION02440000000000000000>3.4 Scalability Analysis</A></H1>
<P>
<A NAME=3361>&#160;</A>
<P>
Performance models of the type developed in preceding sections are
tools that we can use to explore and refine a parallel algorithm
design.  These models can be used without further refinement to
perform <em> qualitative
 </em> analyses of performance.  For example,
from Equations <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A> and <A HREF="node29.html#eqfdte" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdte">3.7</A> the following
observations can be made about the finite difference algorithm:
<A NAME=3365>&#160;</A>
<P>
<UL><LI>
Efficiency decreases with increasing <em> P</em>
, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img433.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img433.gif">, and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img434.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img434.gif">.
<P>
<LI>
Efficiency increases with increasing <em> N</em>
, <em> Z</em>
, and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img435.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img435.gif">.
<P>
<LI>
Execution time decreases with increasing <em> P</em>
 but is bounded
from below by the cost of exchanging two array slices.
<P>
<LI>
Execution time increases with increasing <em> N</em>
, <em> Z</em>
, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img436.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img436.gif">,
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img437.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img437.gif">, and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img438.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img438.gif">.
</UL>
<P>
These observations provide interesting insights into algorithm
characteristics.  However, they are not a sufficient basis for making
<A NAME=3374>&#160;</A>
design tradeoffs.  This task requires quantitative results, which in
turn require that we substitute machine-specific numeric values for
the various parameters in performance models.  In general, we seek to
obtain these numeric values by empirical studies, as will be discussed
in Section <A HREF="node31.html#secexper" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node31.html#secexper">3.5</A>.  Once augmented with empirical data,
models can be used to answer questions such as the following.
<P>
<UL><LI>
<A NAME=3377>&#160;</A>
Does the algorithm meet design requirements (for execution time,
memory requirements, etc.) on the target parallel computer?
<P>
<LI>
How adaptable is the algorithm?  That is, how well does it adapt to
increases in problem size and processor counts?  How sensitive is to
machine parameters such as <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img439.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img439.gif"> and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img440.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img440.gif">?
<P>
<LI>
How does the algorithm compare with other algorithms for the same
problem?  What difference in execution time can be expected from
different algorithms?
</UL>
<P>
It is important to remember that performance models are idealizations
of more complex phenomena.  Once an algorithm has been implemented, we
are able to validate our models and hence increase our confidence in
their quality.  However, in the early stages of a design we must
necessarily be cautious, especially if we are making quantitative
predictions or if the target computer has an architecture very
different from the idealized multicomputer.
<P>
<H2><A NAME=SECTION02441000000000000000>3.4.1 Scalability with Fixed Problem Size</A></H2>
<P>
An important aspect of performance analysis is the study of how
algorithm performance varies with parameters such as problem size,
processor count, and message startup cost.  In particular, we may
evaluate the scalability of a parallel algorithm, that is, how
effectively it can use an increased number of processors.  One
approach to quantifying scalability is to determine how execution time
<em> T</em>
 and efficiency <em> E</em>
 vary with increasing processor count
<A NAME=3382>&#160;</A>
<em> P</em>
 for a fixed problem size and machine parameters.  This <em>
fixed problem
 </em> analysis allows us to answer questions such as,
What is the fastest I can solve problem <em> A</em>
 on computer <em> X</em>
?
and What is the greatest number of processors I can utilize if I want
to maintain an efficiency of 50 percent?  The latter question may be
of interest if a computer is shared and there is a charge for each
processor used.
<P>
It is important to consider both <em> E</em>
 and <em> T</em>
 when evaluating
scalability.  While <em> E</em>
 will generally decrease monotonically with
<em> P</em>
, <em> T</em>
 may actually increase if the performance model includes a
term proportional to a positive power of <em> P</em>
.  In such cases, it
may not be productive to use more than some maximum number of
processors for a particular problem size and choice of machine
parameters.
<P>
<BR><HR>
<b> Example <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img443.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img443.gif">.<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img441.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img441.gif">    Scalability of Finite Difference</b>:<A NAME=experffdfd>&#160;</A>
<P>
Figure <A HREF="node30.html#figfd9" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node30.html#figfd9">3.6</A> illustrates fixed problem analysis applied to the
finite difference algorithm (Equations <A HREF="node29.html#eqfdt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdt">3.4</A>
and <A HREF="node29.html#eqfdte" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eqfdte">3.7</A>).  This figure plots <em> T</em>
 and <em> E</em>
 as a
function of <em> P</em>
 and <em> N</em>
, using machine parameters
characteristic of a relatively fine-grained multicomputer.  The
computation cost <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img442.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img442.gif">sec has been obtained by experiment, as
will be described in Example <A HREF="node31.html#exfdc" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node31.html#exfdc">3.5</A>.  Recall that because the
algorithm requires each task to have at least two grid columns, at
most 64 processors can be used productively when <em> N=128</em>
 and 256
processors when <em> N=512</em>
.  Later in the chapter, we shall see how
these predictions compare with observed performance.
<P>
<BR><HR>
<P>
<P><A NAME=4704>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img450.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img450.gif">
<BR><STRONG>Figure 3.6:</STRONG> <em> Scalability of the 1-D decomposition finite difference
algorithm, as predicted by Equations 3.4 and 3.7 when <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img447.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img447.gif">sec,
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img448.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img448.gif">sec, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img449.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img449.gif">sec, and <em> Z=10</em>
: (a) execution time
as a function of <em> P</em>
; (b) efficiency as a function of <em> P</em>
.
Note that when <b>N=128</b>, only 64 processors can be used
productively.</em><A NAME=figfd9>&#160;</A><BR>
<P><H2><A NAME=SECTION02442000000000000000>3.4.2 Scalability with Scaled Problem Size</A></H2>
<P>
Large parallel computers are frequently used not only to solve
fixed-size problems faster, but also to solve larger problems.  This
observation encourages a different approach to the analysis of
algorithms called <em> scaled problem
 </em> analysis, whereby we
consider not how <em> E</em>
 varies with <em> P</em>
, but how the amount of
computation performed must scale with <em> P</em>
 to keep
<em> E</em>
 constant.  This function of <em> N</em>
 is called an
<A NAME=3419>&#160;</A>
algorithm's <em> isoefficiency function
 </em> and can provide valuable
insights into algorithm behavior.  An algorithm with an isoefficiency
function of <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img451.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img451.gif"> is highly scalable, since the amount of
computation needs to increase only linearly with respect to <em> P</em>
 to
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -