node17.html

Design and building parallel program
HTML
第 1 页 / 共 3 页
字号:
				 <tt> compute</tt> <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img185.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img185.gif"> <tt> using Equation <A HREF="node17.html#eq5pt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pt">2.1</A></tt>
<P>
		 <tt> endfor</tt>
<P>
</TT></PRE>

<P>
We observed earlier that the best sequential and parallel solutions to
a problem may use different techniques.  This situation arises in
finite difference problems.  In sequential computing, 
<A NAME=1293>&#160;</A>
Gauss-Seidel update strategies are often preferred over Jacobi
strategies because they allow solutions of comparable accuracy to be
obtained using fewer iterations.  In a Gauss-Seidel scheme, elements
are updated in a particular order so that the computation of each
element can use the most up-to-date value of other elements.  For
<A NAME=1294>&#160;</A>
example, the Jacobi update of Equation <A HREF="node17.html#eq5pt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pt">2.1</A> may be reformulated
as follows (notice the use of values <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img186.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img186.gif"> and
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img187.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img187.gif">):
<P><A NAME=eq5pta>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img188.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img188.gif"><P>
<P>
While Jacobi schemes are trivial to parallelize (all grid points can
be updated concurrently), this is not the case for all Gauss-Seidel
<A NAME=1316>&#160;</A>
schemes.  For example, the update scheme of Equation <A HREF="node17.html#eq5pta" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pta">2.2</A> allows
only an average of around <em> N/2</em>
 points within an
<em> N</em>
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img189.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img189.gif"><em> N</em>
 grid to be updated concurrently.  Fortunately,
many different Gauss-Seidel orderings are possible for most problems,
and we are usually free to choose the ordering that maximizes
available parallelism.  In particular, we can choose to update first
the odd-numbered elements and then the even-numbered
elements of an array.  Each update uses the most recent information,
yet the updates to the odd-numbered points are independent and can
proceed concurrently, as can the updates to the even-numbered points.
<A NAME=1321>&#160;</A> 
This update strategy yields what is referred to as a <em> red-black</em>
algorithm, since the points can be thought of as being colored as on a
chess board: either red (odd) or black (even); points of the same
color can be updated concurrently.  Figure <A HREF="node17.html#figjrb" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figjrb">2.5</A> illustrates
both the Gauss-Seidel scheme of Equation <A HREF="node17.html#eq5pta" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pta">2.2</A> and a red-black
scheme, and shows how the latter scheme increases opportunities for
parallel execution.
<P>
<P><A NAME=2426>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img190.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img190.gif">
<BR><STRONG>Figure 2.5:</STRONG> <em> Two finite difference update strategies, here applied on a
two-dimensional grid with a five-point stencil.  In both figures,
shaded grid points have already been updated to step <em> t+1</em>
;
unshaded grid points are still at step <em> t</em>
.  The arrows show data
dependencies for one of the latter points.  The figure on the left
illustrates a simple Gauss-Seidel scheme and highlights the five grid
points that can be updated at a particular point in time.  In this
scheme, the update proceeds in a wavefront from the top left corner to
the bottom right.  On the right, we show a red-black update scheme.
Here, all the grid points at step <em> t</em>
 can be updated
concurrently.</em><A NAME=figjrb>&#160;</A><BR>
<P>
<P>
This example indicates the important role that choice of solution
strategy can play in determining the performance of a parallel
program.  While the simple Gauss-Seidel update strategy of
Equation <A HREF="node17.html#eq5pta" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pta">2.2</A> may be appropriate in a sequential program, it
is not ideal on a parallel computer.  The Jacobi update strategy is
efficient on a parallel computer but is inferior numerically.  The
red-black scheme combines the advantages of both approaches.
<A NAME=1331>&#160;</A>
<P>

<P>
<H2><A NAME=SECTION02332000000000000000>2.3.2 Global Communication</A></H2>
<P>
<A NAME=seccommglobal>&#160;</A>
<P>
<P><A NAME=2447>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img191.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img191.gif">
<BR><STRONG>Figure 2.6:</STRONG> <em> A centralized summation algorithm that uses a central
manager task (S) to sum <em> N</em>
 numbers distributed among
<em> N</em>
 tasks.  Here, <em> N=8</em>
, and each of the 8 channels is labeled with the
number of the step in which they are used.</em><A NAME=figsum>&#160;</A><BR>
<P>
<P>
<A NAME=1340>&#160;</A>
A <i> global communication</i> operation is one in which many tasks must
participate.  When such operations are implemented, it may not be
sufficient simply to identify individual producer/consumer pairs.
Such an approach may result in too many communications or may restrict
opportunities for concurrent execution.  For example, consider the
<A NAME=1342>&#160;</A>
problem of performing a <em> parallel reduction
 </em> operation, that
<A NAME=1344>&#160;</A>
is, an operation that reduces <em> N</em>
 values distributed over
<em> N</em>
 tasks using a commutative associative operator such as addition:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img192.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img192.gif"><P>
Let us assume that a single ``manager'' task requires the result
<em> S</em>
 of this operation.  Taking a purely local view of
communication, we recognize that the manager requires values <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img193.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img193.gif">,
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img194.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img194.gif">, etc., from tasks 0, 1, etc.  Hence, we could define a
communication structure that allows each task to communicate its value
to the manager independently.  The manager would then receive the
values and add them into an accumulator (Figure <A HREF="node17.html#figsum" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figsum">2.6</A>).
However, because the manager can receive and sum only one number at a
time, this approach takes <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img195.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img195.gif"> time to sum <em> N</em>
 numbers---not a
very good parallel algorithm!
<P>
<A NAME=1354>&#160;</A>
This example illustrates two general problems that can hinder
efficient parallel execution in algorithms based on a purely local
view of communication:
<OL><LI>
The algorithm is <em> centralized
 </em>: it does not distribute
computation and communication.  A single task (in this case, the
manager task) must participate in every operation.
<LI>
The algorithm is <em> sequential
 </em>: it does not allow multiple
computation and communication operations to proceed concurrently.
</OL>
We must address both these problems to develop a good parallel
algorithm.
<P>
<H4><A NAME=SECTION02332010000000000000> Distributing Communication and Computation.</A></H4>
<P>
We first consider the problem of distributing the computation and
communication associated with the summation.  We can distribute the
summation of the <em> N</em>
 numbers by making each task <em> i</em>
,
<em> 0&lt;i&lt;N-1</em>
, compute the sum:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img196.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img196.gif"><P>
<P>
<P><A NAME=2475>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img197.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img197.gif">
<BR><STRONG>Figure 2.7:</STRONG> <em> A summation algorithm that connects <em> N</em>
 tasks in an
array in order to sum <em> N</em>
 numbers distributed among these tasks.
Each channel is labeled with the number of the step in which it is
used and the value that is communicated on it.</em><A NAME=figsumA>&#160;</A><BR>
<P>
<P>
The communication requirements associated with this algorithm can be
satisfied by connecting the <em> N</em>
 tasks in a one-dimensional array
(Figure <A HREF="node17.html#figsumA" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figsumA">2.7</A>).  Task <em> N-1</em>
 sends its value to its neighbor
in this array.  Tasks 1 through <em> N-2</em>
 each wait to receive a
partial sum from their right-hand neighbor, add this to their local
value, and send the result to their left-hand neighbor.  Task 0
receives a partial sum and adds this to its local value to obtain the
complete sum.  This algorithm distributes the <em> N-1</em>
 communications
and additions, but permits concurrent execution only if multiple
summation operations are to be performed.  (The array of tasks can
then be used as a pipeline, through which flow partial sums.)  A
single summation still takes <em> N-1</em>
 steps.
<P>
<H4><A NAME=SECTION02332020000000000000> Uncovering Concurrency: Divide and Conquer.</A></H4>
<P>
Opportunities for concurrent computation and communication can often
be uncovered by applying a problem-solving strategy called <em> divide
and conquer</em>.  To solve a complex problem (such as summing
<em> N</em>
 numbers), we seek to partition it into two or more simpler problems of
roughly equivalent size (e.g., summing <em> N/2</em>
 numbers).  This
process is applied recursively to produce a set of subproblems that
cannot be subdivided further (e.g., summing two numbers).  The
strategy is summarized in Algorithm <A HREF="node17.html#algdac" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#algdac">2.1</A>.  The
divide-and-conquer technique is effective in parallel computing when
the subproblems generated by problem partitioning can be solved
concurrently.  For example, in the summation problem, we can take
<A NAME=1382>&#160;</A>
advantage of the following identity (<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img198.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img198.gif">, <em> n</em>
 an
integer):
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img199.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img199.gif"><P>
The two summations on the right hand side can be performed
concurrently.  They can also be further decomposed if <em> n&gt;1</em>
, to
give the tree structure illustrated in Figure <A HREF="node17.html#figsum1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figsum1">2.8</A>.
Summations at the same level in this tree of height <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img200.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img200.gif"> can be
performed concurrently, so the complete summation can be achieved in
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img201.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img201.gif"> rather than <em> N</em>
 steps.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -