📄 node28.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
12 下一页
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>3.2 Approaches to Performance Modeling</TITLE>
</HEAD>
<BODY>
<meta name="description" value="3.2 Approaches to Performance Modeling">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
 <BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2189 HREF="node27.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node27.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2197 HREF="node29.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2195 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2199 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2200 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2198 HREF="node29.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html">3.3 Developing Models</A>
<B>Up:</B> <A NAME=tex2html2196 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html">3 A Quantitative Basis for Design</A>
<B> Previous:</B> <A NAME=tex2html2190 HREF="node27.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node27.html">3.1 Defining Performance</A>
<BR><HR><P>
<H1><A NAME=SECTION02420000000000000000>3.2 Approaches to Performance Modeling</A></H1>
<P>
We introduce the topic of performance modeling by describing three
techniques sometimes used to characterize the performance of parallel
algorithms.  We explain why each is inadequate for our purposes.
<P>
<H2><A NAME=SECTION02421000000000000000>3.2.1 Amdahl's Law</A></H2>
<P>
<A NAME=secamdahl>&#160;</A>
<P>
<A NAME=3097>&#160;</A>
A common observation regarding parallel processing is that every
<A NAME=3098>&#160;</A>
algorithm has a sequential component that will eventually limit the
speedup that can be achieved on a parallel computer.  (Speedup, as we
shall soon define more formally, is the ratio between execution time
on a single processor and execution time on multiple processors.)
This observation is often codified as <i> Amdahl's law
 </i>, which
can be stated as follows: if the sequential component of an algorithm
accounts for <em> 1/s</em>
 of the program's execution time, then the
maximum possible speedup that can be achieved on a parallel computer
is <em> s</em>
.  For example, if the sequential component is 5 percent,
then the maximum speedup that can be achieved is 20.
<P>
In the early days of parallel computing, it was widely believed that
this effect would limit the utility of parallel computing to a small
number of specialized applications.  However, practical experience
shows that this inherently sequential way of thinking is of little
relevance to real problems.  To understand why, let us consider a
noncomputing problem.  Assume that 999 of 1000 workers on an
expressway construction project are idle while a single worker
completes a ``sequential component'' of the project.  We would not
view this as an inherent attribute of the problem to be solved, but as
a failure in management.  For example, if the time required for a
truck to pour concrete at a single point is a bottleneck, we could
argue that the road should be under construction at several points
simultaneously.  Doing this would undoubtedly introduce some
inefficiency---for example, some trucks would have to travel further
to get to their point of work---but would allow the entire task to be
finished more quickly.  Similarly, it appears that almost all
computational problems admit parallel solutions.  The scalability of
some solutions may be limited, but this is due to communication costs,
idle time, or replicated computation rather than the existence of
``sequential components.''
<P>
<A NAME=3102>&#160;</A>
Amdahl's law can be relevant when sequential programs are parallelized
incrementally.  In this approach to parallel software
<A NAME=3103>&#160;</A>
development, a sequential program is first profiled to identify
computationally demanding components.  These
components are then adapted for parallel execution, one by one, until
acceptable performance is achieved.  Amdahl's law clearly applies in
this situation, because the computational costs of the components that are
not parallelized provide a lower bound on the execution time of the
parallel program.  Therefore, this ``partial,'' or
``incremental,'' parallelization strategy is generally effective only
on small parallel computers.  Amdahl's law can also be useful when
analyzing the performance of data-parallel programs, in which some
components may not be amenable to a data-parallel formulation (see
Chapter <A HREF="node82.html#chaphpf" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node82.html#chaphpf">7</A>).
<P>
<H2><A NAME=SECTION02422000000000000000>3.2.2 Extrapolation from Observations</A></H2>
<P>
<A NAME=3106>&#160;</A>
Descriptions of parallel algorithms often characterize performance by
stating something like the following:
<blockquote> We implemented the algorithm on parallel computer <em> X</em>
 and
achieved a speedup of 10.8 on 12 processors with problem size
<em> N=100</em>
.
</blockquote>
Presumably, this single data point on a small number of processors is
intended as a measure of algorithm quality.  A speedup of 10.8 on 12
processors may or may not be regarded as ``good.''  However, a single
performance measurement (or even several measurements) serves only to
determine performance in one narrow region of what is a large
multidimensional space, and is often a poor indicator of performance
in other situations.  What happens on 1000 processors?  What if
<em> N=10</em>
 or
<em> N=1000</em>
?  What if communication costs are ten times higher?  
Answering these questions requires a deeper understanding of the
parallel algorithm.
<P>
The following three equations emphasize the limitations of
observations as a tool for understanding parallel performance.  Each
is a simple performance model that specifies execution time <em> T</em>
 as
a function of processor count <em> P</em>
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -