📄 node17.html
字号:
<tt> compute</tt> <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img185.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img185.gif"> <tt> using Equation <A HREF="node17.html#eq5pt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pt">2.1</A></tt>
<P>
<tt> endfor</tt>
<P>
</TT></PRE>
<P>
We observed earlier that the best sequential and parallel solutions to
a problem may use different techniques. This situation arises in
finite difference problems. In sequential computing,
<A NAME=1293> </A>
Gauss-Seidel update strategies are often preferred over Jacobi
strategies because they allow solutions of comparable accuracy to be
obtained using fewer iterations. In a Gauss-Seidel scheme, elements
are updated in a particular order so that the computation of each
element can use the most up-to-date value of other elements. For
<A NAME=1294> </A>
example, the Jacobi update of Equation <A HREF="node17.html#eq5pt" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pt">2.1</A> may be reformulated
as follows (notice the use of values <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img186.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img186.gif"> and
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img187.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img187.gif">):
<P><A NAME=eq5pta> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img188.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img188.gif"><P>
<P>
While Jacobi schemes are trivial to parallelize (all grid points can
be updated concurrently), this is not the case for all Gauss-Seidel
<A NAME=1316> </A>
schemes. For example, the update scheme of Equation <A HREF="node17.html#eq5pta" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pta">2.2</A> allows
only an average of around <em> N/2</em>
points within an
<em> N</em>
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img189.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img189.gif"><em> N</em>
grid to be updated concurrently. Fortunately,
many different Gauss-Seidel orderings are possible for most problems,
and we are usually free to choose the ordering that maximizes
available parallelism. In particular, we can choose to update first
the odd-numbered elements and then the even-numbered
elements of an array. Each update uses the most recent information,
yet the updates to the odd-numbered points are independent and can
proceed concurrently, as can the updates to the even-numbered points.
<A NAME=1321> </A>
This update strategy yields what is referred to as a <em> red-black</em>
algorithm, since the points can be thought of as being colored as on a
chess board: either red (odd) or black (even); points of the same
color can be updated concurrently. Figure <A HREF="node17.html#figjrb" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figjrb">2.5</A> illustrates
both the Gauss-Seidel scheme of Equation <A HREF="node17.html#eq5pta" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pta">2.2</A> and a red-black
scheme, and shows how the latter scheme increases opportunities for
parallel execution.
<P>
<P><A NAME=2426> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img190.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img190.gif">
<BR><STRONG>Figure 2.5:</STRONG> <em> Two finite difference update strategies, here applied on a
two-dimensional grid with a five-point stencil. In both figures,
shaded grid points have already been updated to step <em> t+1</em>
;
unshaded grid points are still at step <em> t</em>
. The arrows show data
dependencies for one of the latter points. The figure on the left
illustrates a simple Gauss-Seidel scheme and highlights the five grid
points that can be updated at a particular point in time. In this
scheme, the update proceeds in a wavefront from the top left corner to
the bottom right. On the right, we show a red-black update scheme.
Here, all the grid points at step <em> t</em>
can be updated
concurrently.</em><A NAME=figjrb> </A><BR>
<P>
<P>
This example indicates the important role that choice of solution
strategy can play in determining the performance of a parallel
program. While the simple Gauss-Seidel update strategy of
Equation <A HREF="node17.html#eq5pta" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#eq5pta">2.2</A> may be appropriate in a sequential program, it
is not ideal on a parallel computer. The Jacobi update strategy is
efficient on a parallel computer but is inferior numerically. The
red-black scheme combines the advantages of both approaches.
<A NAME=1331> </A>
<P>
<P>
<H2><A NAME=SECTION02332000000000000000>2.3.2 Global Communication</A></H2>
<P>
<A NAME=seccommglobal> </A>
<P>
<P><A NAME=2447> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img191.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img191.gif">
<BR><STRONG>Figure 2.6:</STRONG> <em> A centralized summation algorithm that uses a central
manager task (S) to sum <em> N</em>
numbers distributed among
<em> N</em>
tasks. Here, <em> N=8</em>
, and each of the 8 channels is labeled with the
number of the step in which they are used.</em><A NAME=figsum> </A><BR>
<P>
<P>
<A NAME=1340> </A>
A <i> global communication</i> operation is one in which many tasks must
participate. When such operations are implemented, it may not be
sufficient simply to identify individual producer/consumer pairs.
Such an approach may result in too many communications or may restrict
opportunities for concurrent execution. For example, consider the
<A NAME=1342> </A>
problem of performing a <em> parallel reduction
</em> operation, that
<A NAME=1344> </A>
is, an operation that reduces <em> N</em>
values distributed over
<em> N</em>
tasks using a commutative associative operator such as addition:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img192.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img192.gif"><P>
Let us assume that a single ``manager'' task requires the result
<em> S</em>
of this operation. Taking a purely local view of
communication, we recognize that the manager requires values <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img193.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img193.gif">,
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img194.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img194.gif">, etc., from tasks 0, 1, etc. Hence, we could define a
communication structure that allows each task to communicate its value
to the manager independently. The manager would then receive the
values and add them into an accumulator (Figure <A HREF="node17.html#figsum" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figsum">2.6</A>).
However, because the manager can receive and sum only one number at a
time, this approach takes <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img195.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img195.gif"> time to sum <em> N</em>
numbers---not a
very good parallel algorithm!
<P>
<A NAME=1354> </A>
This example illustrates two general problems that can hinder
efficient parallel execution in algorithms based on a purely local
view of communication:
<OL><LI>
The algorithm is <em> centralized
</em>: it does not distribute
computation and communication. A single task (in this case, the
manager task) must participate in every operation.
<LI>
The algorithm is <em> sequential
</em>: it does not allow multiple
computation and communication operations to proceed concurrently.
</OL>
We must address both these problems to develop a good parallel
algorithm.
<P>
<H4><A NAME=SECTION02332010000000000000> Distributing Communication and Computation.</A></H4>
<P>
We first consider the problem of distributing the computation and
communication associated with the summation. We can distribute the
summation of the <em> N</em>
numbers by making each task <em> i</em>
,
<em> 0<i<N-1</em>
, compute the sum:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img196.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img196.gif"><P>
<P>
<P><A NAME=2475> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img197.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img197.gif">
<BR><STRONG>Figure 2.7:</STRONG> <em> A summation algorithm that connects <em> N</em>
tasks in an
array in order to sum <em> N</em>
numbers distributed among these tasks.
Each channel is labeled with the number of the step in which it is
used and the value that is communicated on it.</em><A NAME=figsumA> </A><BR>
<P>
<P>
The communication requirements associated with this algorithm can be
satisfied by connecting the <em> N</em>
tasks in a one-dimensional array
(Figure <A HREF="node17.html#figsumA" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figsumA">2.7</A>). Task <em> N-1</em>
sends its value to its neighbor
in this array. Tasks 1 through <em> N-2</em>
each wait to receive a
partial sum from their right-hand neighbor, add this to their local
value, and send the result to their left-hand neighbor. Task 0
receives a partial sum and adds this to its local value to obtain the
complete sum. This algorithm distributes the <em> N-1</em>
communications
and additions, but permits concurrent execution only if multiple
summation operations are to be performed. (The array of tasks can
then be used as a pipeline, through which flow partial sums.) A
single summation still takes <em> N-1</em>
steps.
<P>
<H4><A NAME=SECTION02332020000000000000> Uncovering Concurrency: Divide and Conquer.</A></H4>
<P>
Opportunities for concurrent computation and communication can often
be uncovered by applying a problem-solving strategy called <em> divide
and conquer</em>. To solve a complex problem (such as summing
<em> N</em>
numbers), we seek to partition it into two or more simpler problems of
roughly equivalent size (e.g., summing <em> N/2</em>
numbers). This
process is applied recursively to produce a set of subproblems that
cannot be subdivided further (e.g., summing two numbers). The
strategy is summarized in Algorithm <A HREF="node17.html#algdac" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#algdac">2.1</A>. The
divide-and-conquer technique is effective in parallel computing when
the subproblems generated by problem partitioning can be solved
concurrently. For example, in the summation problem, we can take
<A NAME=1382> </A>
advantage of the following identity (<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img198.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img198.gif">, <em> n</em>
an
integer):
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img199.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img199.gif"><P>
The two summations on the right hand side can be performed
concurrently. They can also be further decomposed if <em> n>1</em>
, to
give the tree structure illustrated in Figure <A HREF="node17.html#figsum1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figsum1">2.8</A>.
Summations at the same level in this tree of height <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img200.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img200.gif"> can be
performed concurrently, so the complete summation can be achieved in
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img201.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img201.gif"> rather than <em> N</em>
steps.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -