📄 node29.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 3 页
字号:
<H4><A NAME=SECTION02431020000000000000> Communication Time.</A></H4>
<P>
<A NAME=3221>&#160;</A>
The <em> communication time
 </em> of an algorithm (<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img390.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img390.gif">) is
<A NAME=3224>&#160;</A>
the time that its tasks spend sending and receiving messages.  Two
distinct types of communication can be distinguished: interprocessor
communication and intraprocessor communication.  In <em>
interprocessor
 </em> communication, two communicating tasks are
located on different processors.  This will always be the case if an
algorithm creates one task per processor.  In <em>
intraprocessor
 </em> communication, two communicating tasks are
located on the same processor.  For simplicity, we assume that
interprocessor and intraprocessor communication costs are comparable.
Perhaps surprisingly, this assumption is not unreasonable in many
multicomputers, unless intraprocessor communication is highly
optimized.  This is because the cost of the memory-to-memory copies
and context switches performed in a typical implementation of
intraprocessor communication is often comparable to the cost of an
interprocessor communication.  In other environments, such as
Ethernet-connected workstations, intraprocessor communication is much
faster.
<P>
In the idealized multicomputer architecture, the cost of sending a
message between two tasks located on different processors can be
represented by two parameters: the message startup time <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img391.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img391.gif">, which
<A NAME=3227>&#160;</A>
is the time required to initiate the communication, and the transfer
time per (typically four-byte) word <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img392.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img392.gif">, which is determined by the
<A NAME=3228>&#160;</A>
physical bandwidth of the communication channel linking the source and
destination processors.  As illustrated in Figure <A HREF="node29.html#figperf" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#figperf">3.3</A>, the
time required to send a message of size
<em> L</em>
 words is then
<P><A NAME=eq88>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img393.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img393.gif"><P>
<P>
This idealized model of communication performance is adequate for many
purposes but does break down in some situations.  More detailed
models are described in Section <A HREF="node33.html#secperfrefine" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html#secperfrefine">3.7</A>.
<P>
<P><A NAME=4580>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img396.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img396.gif">
<BR><STRONG>Figure 3.3:</STRONG> <em> Simple communication cost model: <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img395.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img395.gif">.  In this plot of time versus message length, the slope of the line
corresponds to the cost per word transferred and the <b>y</b>-intercept to
the message startup cost.</em><A NAME=figperf>&#160;</A><BR>
<P>
<P>
Table <A HREF="node29.html#tabmach" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#tabmach">3.1</A> lists approximate values for <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img397.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img397.gif"> and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img398.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img398.gif"> for
some parallel computers.  Because these values tend to change rapidly
as hardware and systems software evolve, they should be verified
before being used in performance models.  Notice the considerable
variation in both <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img399.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img399.gif"> and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img400.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img400.gif"> values.  Clearly, different
computers have very different communication performance
characteristics.
<P>
The values in Table <A HREF="node29.html#tabmach" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#tabmach">3.1</A> were obtained either from the
literature or by fitting Equation <A HREF="node29.html#eq88" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq88">3.1</A> to execution times
measured for a small test program that sends messages back and forth
between two processors.  Figure <A HREF="node29.html#figperfd" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#figperfd">3.4</A> presents some
representative experimental data obtained with this program.  These
times are for a single round trip and hence are twice those given by
Equation <A HREF="node29.html#eq88" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq88">3.1</A>.  The impact of startup and per-word costs on
communication time is clearly visible.  Notice the irregularities in
<A NAME=3245>&#160;</A>
both Ethernet and Fiber Distributed Data Interconnect (FDDI) times for
small messages, and the periodic jumps in Paragon times.  These are
due to details of the communication protocols and buffer management
strategies used in communication libraries.  Nevertheless, we see that
Equation <A HREF="node29.html#eq88" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq88">3.1</A> is a reasonably accurate representation of
message costs, particularly for larger messages.
<P>
<A NAME=3247>&#160;</A>
<P><A NAME=4425>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img403.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img403.gif">
<BR><STRONG>Table 3.1:</STRONG>  Approximate machine parameters for some parallel computers,
in microseconds (<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img402.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img402.gif">sec).  Some of these data provided by
T. Dunigan.
<A NAME=tabmach>&#160;</A><BR>
<P>
<P>
<P><A NAME=4595>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img404.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img404.gif">
<BR><STRONG>Figure 3.4:</STRONG> <em> Round-trip time for a single message between two processors
as a function of message length on Ethernet-connected workstations,
FDDI-connected workstations, Intel Paragon, and IBM SP1.  Data
provided by W. Gropp.</em><A NAME=figperfd>&#160;</A><BR>
<P>
<P>
Notice that both the <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img405.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img405.gif"> and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img406.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img406.gif"> terms are required in
Equation <A HREF="node29.html#eq88" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq88">3.1</A>.  Asymptotically (for large
<em> L</em>
) only the <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img407.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img407.gif"> term is important; yet as <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img408.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img408.gif"> is generally
much larger than <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img409.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img409.gif">, the <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img410.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img410.gif"> term can dominate in applications
that send mostly small messages.
<P>
The values in Table <A HREF="node29.html#tabmach" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#tabmach">3.1</A> represent ``best achievable''
performance and in general may be used as a lower bound on
communication costs when estimating performance.  Applications with
less regular or structured communication patterns may perform less
well.  In addition, the values in Table <A HREF="node29.html#tabmach" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#tabmach">3.1</A> do not
incorporate other costs such as buffer management associated with
message passing.  However, these additional costs are typically
proportional to the number and size of messages communicated.  Hence,
it is generally possible, by fitting Equation <A HREF="node29.html#eq88" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq88">3.1</A> to empirical
data, to obtain system- and algorithm-dependent values for <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img411.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img411.gif"> and
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img412.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img412.gif"> for which Equation <A HREF="node29.html#eq88" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq88">3.1</A> is valid for a large range of
problem and machine sizes.  This procedure is applied in several
examples later in this chapter.
<P>
<H4><A NAME=SECTION02431030000000000000> Idle Time.</A></H4>
<P>
<A NAME=3276>&#160;</A>
Both computation and communication times are specified explicitly in a
parallel algorithm; hence, it is generally straightforward to determine their
contribution to execution time.  Idle time (<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img413.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img413.gif">) can be
more difficult to determine, however, since it often depends on the order in
which operations are performed.
<P>
A processor may be idle due to lack of computation or lack of data. In
the first case, idle time may be avoided by using load-balancing
techniques such as those introduced in Section <A HREF="node19.html#seclbalgs" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html#seclbalgs">2.5.1</A>.  In
the second case, the processor is idle while the computation and
communication required to generate remote data are performed.  This
idle time can sometimes be avoided by structuring a program so that
processors perform other computation or communication while waiting
for remote data.  This technique is referred to as <em> overlapping
computation and communication</em>, since local computation is performed
concurrently with remote communication and computation
(Figure <A HREF="node29.html#figover" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#figover">3.5</A>).  Such overlapping can be achieved in two
ways.  A simple approach is to create multiple tasks on each
processor.  When one task blocks waiting for remote data, execution
may be able to switch to another task for which data are already
available.  This approach has the advantage of simplicity but is
efficient only if the cost of scheduling a new task is less than the
idle time cost that is avoided.  Alternatively, a single task can be
structured so that requests for remote data are interleaved explicitly
with other computation.
<P>
<A NAME=3282>&#160;</A>
<P><A NAME=4613>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img414.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img414.gif">
<BR><STRONG>Figure 3.5:</STRONG> <em> Overlapping computation with communication.  Solid lines
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -