📄 node31.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
12 下一页
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>3.5 Experimental Studies</TITLE>
</HEAD>
<BODY>
<meta name="description" value="3.5 Experimental Studies">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
 <BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2225 HREF="node30.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node30.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2233 HREF="node32.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node32.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2231 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2235 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2236 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2234 HREF="node32.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node32.html">3.6 Evaluating Implementations</A>
<B>Up:</B> <A NAME=tex2html2232 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html">3 A Quantitative Basis for Design</A>
<B> Previous:</B> <A NAME=tex2html2226 HREF="node30.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node30.html">3.4 Scalability Analysis</A>
<BR><HR><P>
<H1><A NAME=SECTION02450000000000000000>3.5 Experimental Studies</A></H1>
<P>
<A NAME=secexper>&#160;</A>
<P>
<A NAME=3501>&#160;</A>
Discussion in preceding sections has emphasized analytic modeling of
performance.  Yet parallel programming is first and foremost an
experimental discipline.  The flexibility and ease of modification of
software on the one hand, and the complexity of parallel computer
systems on the other, mean that approaches to parallel program design
based entirely on theory are rarely cost effective.  The role of
modeling is most often to assist in what is essentially an
experimental process, by guiding experiment and explaining results.
<P>
Experimental studies can be used in early design stages to determine
values for parameters used in performance models, such as computation
time per grid point, average depth of a search tree, message startup
cost, and message transfer costs.  They can also be used after
programming begins, to compare observed and modeled performance.
<P>
Next we review several sometimes subtle issues that can arise during
experimental studies.
<P>
<H2><A NAME=SECTION02451000000000000000>3.5.1 Experimental Design</A></H2>
<P>
<A NAME=3503>&#160;</A>
The first step in an experimental study is to <em> identify the
data
 </em> that we wish to obtain.  For example, when calibrating a
performance model we may be interested in determining the execution
time of a sequential version of our application as a function of
problem size in order to determine <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img485.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img485.gif">.  Or, we may need to measure
the execution time of a simple message-passing testbed program in
order to determine <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img486.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img486.gif"> and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img487.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img487.gif">.
<P>
Normally, experiments are performed for a range of data
points---different problem sizes and/or processor counts.  By
maximizing the number of data points obtained, we reduce the impact of
errors in individual measurements.  When empirical data are used to
evaluate the quality of an implementation, a range of data points also
allows us to estimate the accuracy of the model and to identify
regimes in which it is inadequate.
<P>
The next step in an experimental study is to <em> design the
<A NAME=3505>&#160;</A>
experiments
 </em> that will be used to obtain the required data.  The
critical issue here is to ensure that our experiments measure what we
intend to measure.  For example, if a program comprises an
initialization step followed by a long series of iterations, and our
performance model deals only with the cost of an iteration, then that
is what we need to measure.
<P>
<H2><A NAME=SECTION02452000000000000000>3.5.2 Obtaining and Validating Experimental Data</A></H2>
<P>
The principal challenge when performing experiments is to obtain
accurate and reproducible results.  Execution times can be obtained in
various ways; which is best will depend on both our requirements and
the facilities available on the target computer.  A straightforward
but potentially time-consuming approach is to incorporate code into
our program that calls system timer routines to determine elapsed
time.  In principle, we should make these calls on every processor and
then take the maximum time.  However, we can often identify a
reference processor that does not start or finish appreciably later or
sooner than others, and measure times on this processor alone.
Alternatively, we can use a profiling or tracing tool that obtains
timing data automatically.  We discuss specific tools in
Chapter <A HREF="node106.html#chaptools" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node106.html#chaptools">9</A>.
<P>
Experiments should always be repeated to
verify that results are reproducible.  Generally, results should not
vary by more than a small amount---2 or 3 percent is a lot if one is
trying to fine-tune algorithms.  Possible causes of variation include
the following.
<P>
<UL><LI>
<A NAME=3509>&#160;</A>
<em> A nondeterministic algorithm.</em> Programs may use random numbers
<A NAME=3511>&#160;</A>
or may explore a search space or allocate work to processors in a
time-dependent manner (as in the search algorithm of
Section <A HREF="node21.html#secfloor" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node21.html#secfloor">2.7</A>).  Nondeterminism due to random numbers can be
controlled by using a reproducible parallel generator
(Chapter <A HREF="node116.html#chaprandom" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node116.html#chaprandom">10</A>).  Nondeterminism due to time-dependent
execution is more problematic.  One solution is to perform a large
number of trials.  Another is to normalize execution times by dividing
them by some measure of the amount of work done, such as search tree
nodes visited.
<P>
<LI>
<em> An inaccurate timer.</em> The timer used to obtain execution times
may have limited resolution or be inaccurate.  If resolution is
limited, we can improve accuracy by increasing execution time, for
example by performing more iterations or by solving the same problem
several times.  If the timer is inaccurate, we need to find another
way to determine execution times.
<P>
<LI>
<em> Startup and shutdown costs.</em> The time required for system
management functions concerned with the acquisition of processors,
loading of code, allocation of virtual memory, deallocation of
processors, etc., can vary significantly according to the state of the
system.  Timings are often more accurate if these system-dependent
components are excluded.  Hence, we may start a timer after a program
is loaded and stop it once a solution has been computed.
<P>
<LI>
<em> Interference from other programs.</em> On a nondedicated machine,
other users may compete for shared resources such as processors,
network bandwidth, and I/O bandwidth.  Timings may also be perturbed
by system functions such as accounting and backups that execute
occasionally.  Note that competition can occur even when processors
are dedicated to our application.  For example, a computer in which
processors are connected in a 2-D mesh (Section <A HREF="node33.html#secperfinter" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html#secperfinter">3.7.2</A>)
may be partitioned into disjoint submeshs.  Yet I/O requests generated
by programs executing in one submesh may need to traverse our submesh
to reach I/O devices, thereby consuming bandwidth.
<P>
<LI>
<em> Contention.</em> On some computers, the time required for a
communication operation can increase when several processors
communicate at the same time.  For example, an Ethernet-connected LAN
can carry only one message at a time, and senders must back off and
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -