📄 node33.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>3.7 A Refined Communication Cost Model</TITLE>
</HEAD>
<BODY>
<meta name="description" value="3.7 A Refined Communication Cost Model">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
 <BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2249 HREF="node32.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node32.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2257 HREF="node34.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node34.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2255 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2259 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2260 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2258 HREF="node34.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node34.html">3.8 Input/Output</A>
<B>Up:</B> <A NAME=tex2html2256 HREF="node26.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html">3 A Quantitative Basis for Design</A>
<B> Previous:</B> <A NAME=tex2html2250 HREF="node32.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node32.html">3.6 Evaluating Implementations</A>
<BR><HR><P>
<H1><A NAME=SECTION02470000000000000000>3.7 A Refined Communication Cost Model</A></H1>
<P>
<A NAME=secperfrefine>&#160;</A>
<P>
Next we examine how the idealized communication cost model used in
preceding sections can be extended to account for characteristics of
realistic interconnection networks.  We review a range of network
architectures and develop a more detailed model of communication
performance that takes into account the impact of competition for
bandwidth on communication costs.  This more detailed model is
<A NAME=3729>&#160;</A>
still idealized but can be significantly more accurate in some
<A NAME=3730>&#160;</A>
circumstances.
<P>
<H2><A NAME=SECTION02471000000000000000>3.7.1 Competition for Bandwidth</A></H2>
<P>
<A NAME=3732>&#160;</A>
In the idealized multicomputer architecture introduced in
Chapter <A HREF="node6.html#chap1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node6.html#chap1">1</A>, the time required to send a message from one
processor to another is independent of both processor location and the
number of other processors that may be communicating at the same time.
These assumptions are reflected in the communication cost model,
Equation <A HREF="node29.html#eq88" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq88">3.1</A>:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img536.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img536.gif"><P>
<P>
While accurate for many algorithms and on many architectures, this
model can break down if a computer's interconnection network has
properties different from the ideal, particularly if an application
generates many messages.  In these cases, it is necessary to develop a
more detailed model of the interconnection network.
<P>
Most interconnection networks use fewer than <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img537.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img537.gif"> wires to connect
<em> N</em>
 processors.  Hence, they must include routing nodes, or <em>
switches
 </em>, to route messages from a source processor to a
destination.  A switching node may block or reroute messages when
several messages require access to the same wire at the same time.
The number of wires that must be traversed in order to get from one
processor to another is termed the <em> distance
 </em> between those
two processors.  (The distance is equal to the number of switches plus one.)
The maximum distance from any processor to any other processor is
termed the <em> diameter
 </em> of the network.  The distance between
<A NAME=3742>&#160;</A>
two processors and the length of the wires connecting them are not
normally significant factors in determining performance, although
networks with long wires may be more expensive to build.  (Wire length
can be important in networks extending over tens to thousands of
<A NAME=3743>&#160;</A>
kilometers, where the speed of light---about <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img538.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img538.gif">sec per
kilometer in optical fiber---places a lower limit on communication
latency.)
<P>
A factor that <em> can
 </em> have a significant impact on
communication performance and which we study here in some depth is
<em> competition for bandwidth</em>.  Two processors may need to send data
over the same wire at the same time.  Typically, only one message can
be transmitted simultaneously, so the other message will be delayed.
However, for many practical purposes it suffices to think of the two
processors as <em> sharing
 </em> the available bandwidth.  Hence, we
scale the data volume term of Equation <A HREF="node29.html#eq88" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node29.html#eq88">3.1</A> by <em> S</em>
, the
number of processors needing to send concurrently over the same wire:
<P><A NAME=eqcomm1>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img539.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img539.gif"><P>
The scaling factor reflects the idea that the effective bandwidth
available to each processor is <em> 1/S</em>
 of the true bandwidth.
<P>
Equation <A HREF="node33.html#eqcomm1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html#eqcomm1">3.10</A> does not account for additional contention
costs that may be incurred if messages collide and must be
retransmitted.  (Network researchers have developed sophisticated
simulation techniques to account for such effects.)  However,
experience shows that Equation <A HREF="node33.html#eqcomm1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html#eqcomm1">3.10</A> is sufficiently accurate
for many practical purposes.
<P>
The impact of competition for bandwidth is most severe in algorithms
that execute <em> synchronously</em>, that is, algorithms in which all
processors are sending and receiving messages at approximately the
same time and in which processors cannot proceed with other
computation while awaiting messages.  The finite difference problem
and many other SPMD algorithms have this property.  In algorithms such
as the search and Fock matrix construction algorithms described in
Chapter <A HREF="node14.html#chap2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node14.html#chap2">2</A>, processors execute asynchronously and are less
likely to compete for bandwidth.
<P>
<H2><A NAME=SECTION02472000000000000000>3.7.2 Interconnection Networks</A></H2>
<P>
<A NAME=secperfinter>&#160;</A>
<A NAME=4444>&#160;</A>
<P>
<A NAME=3761>&#160;</A>
The value <em> S</em>
 in Equation <A HREF="node33.html#eqcomm1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html#eqcomm1">3.10</A> can depend on properties
<A NAME=3764>&#160;</A>
of both the parallel algorithm and the underlying interconnection
<A NAME=3765>&#160;</A>
network.  In the following discussion, we use two examples to
illustrate how the communication patterns of a particular algorithm
can be analyzed to determine an approximate value for <em> S</em>
 on
different networks.  We first consider properties of interconnection
networks.
<P>
<H4><A NAME=SECTION02472010000000000000> Crossbar Switching Network.</A></H4>
<P>
<A NAME=3768>&#160;</A>
A crossbar switch avoids competition for bandwidth by using <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img540.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img540.gif">
<A NAME=3769>&#160;</A>
switches to connect <em> N</em>
 inputs to <em> N</em>
 outputs
(Figure <A HREF="node33.html#figcrossbar" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node33.html#figcrossbar">3.11</A>).  In this case, <em> S=1</em>
.  Although
highly nonscalable, crossbar switches are a popular mechanism for
connecting small numbers of workstations, typically 20 or fewer.  For
<A NAME=3774>&#160;</A>
example, the DEC GIGAswitch can connect up to 22 workstations.  While
<A NAME=3775>&#160;</A>
larger crossbars can be constructed (for example, the Fujitsu VPP 500
uses a 224 <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img541.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img541.gif"> 224 crossbar to connect 224 processors), they are
very expensive.
<P>
<P><A NAME=4912>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img544.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img544.gif">
<BR><STRONG>Figure 3.11:</STRONG> <em> A 4<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img543.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img543.gif">4 nonblocking crossbar, used here to connect 4
processors.  On the right, two switching elements are expanded; the
top one is set to pass messages through and the lower one to switch
messages.  Notice that each processor is depicted twice, and that any
pair of processors can communicate without preventing other processor
pairs from communicating.</em><A NAME=figcrossbar>&#160;</A><BR>
<P><H4><A NAME=SECTION02472020000000000000> Bus-based Networks.</A></H4>
<P>
<A NAME=3781>&#160;</A>
In a bus-based network, processors share a single communication
<A NAME=3782>&#160;</A>
resource (the bus).  A bus is a highly nonscalable architecture,
because only one processor can communicate on the bus at a time.  The
competition factor <em> S</em>
 is equal to the number of processors trying
to communicate simultaneously.
<P>
<P><A NAME=4928>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img545.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img545.gif">
<BR><STRONG>Figure 3.12:</STRONG> <em> A bus-based interconnection network, here used to implement
a shared-memory parallel computer.  Each processor (P) is connected to
the bus, which in turn is connected to the global memory.  A cache
associated with each processor stores recently accessed memory values,
in an effort to reduce bus traffic.</em><A NAME=figbus>&#160;</A><BR>
<P>
<P>
<A NAME=3788>&#160;</A>
Buses are commonly used in shared-memory parallel computers to
<A NAME=3789>&#160;</A>
communicate read and write requests to a shared global memory.  In
principle, the use of a global memory in a shared-memory computer
simplifies parallel programming by making locality a nonissue.
However, as discussed in Section <A HREF="node8.html#sec1othermachine" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node8.html#sec1othermachine">1.2.2</A>, most
shared-memory parallel computers introduce caches in an attempt to
<A NAME=3791>&#160;</A>
reduce bus traffic; hence, locality continues to be important.
<P>
<P><A NAME=4943>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img546.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img546.gif">
<BR><STRONG>Figure 3.13:</STRONG> <em> An Ethernet LAN.  Multiple computers are connected to a
single Ethernet cable, which acts as a communication bus, carrying a
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -