📄 node18.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
<em> N-1</em>
 steps, the complete sum is replicated in every task.  This
strategy avoids the need for a subsequent broadcast operation, but at
the expense of <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img228.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img228.gif"> redundant additions and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img229.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img229.gif">
unnecessary communications.  However, the summation and broadcast
complete in <em> N-1</em>
 rather than <em> 2(N-1)</em>
 steps.  Hence, the
strategy is faster if the processors would otherwise be idle waiting
for the result of the summation.
<P>
The tree summation algorithm can be modified in a similar way to avoid
the need for a separate broadcast.  That is, multiple tree summations
are performed concurrently so that after <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img230.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img230.gif"> steps each task has
a copy of the sum.  One might expect this approach to result in
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img231.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img231.gif"> additions and communications, as in the ring algorithm.
However, in this case we can exploit redundancies in both computation
and communication to perform the summation in just <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img232.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img232.gif">
operations.  The resulting communication structure, termed a <em>
butterfly</em>, is illustrated in Figure <A HREF="node18.html#figbflyN" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html#figbflyN">2.14</A>.  In each of the
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img233.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img233.gif"> stages, each task receives data from two tasks, performs a
single addition, and sends the result of this addition to two tasks in
the next stage.
<A NAME=1513>&#160;</A>
<P>
<P><A NAME=2634>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img238.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img238.gif">
<BR><STRONG>Figure 2.14:</STRONG> <em> The butterfly communication structure
can be used to sum <em> N</em>
 values in <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img236.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img236.gif"> steps.  Numbers located
in the bottom row of tasks are propagated up through <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img237.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img237.gif">
intermediate stages, thereby producing the complete sum in each task
in the top row.</em><A NAME=figbflyN>&#160;</A><BR>
<P>
<P>
<P><A NAME=2651>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img239.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img239.gif">
<BR><STRONG>Figure 2.15:</STRONG> <em> The communication structures that result when tasks at
different levels in a tree or butterfly structure are agglomerated.
From top to bottom: a tree, a butterfly, and an equivalent
representation of the butterfly as a hypercube.  In each case,
<em> N=8</em>
, and each channel is labeled with the step in which it is
used for communication.</em><A NAME=figsumsagglom>&#160;</A><BR>
<P><H4><A NAME=SECTION02341030000000000000> Avoiding Communication.</A></H4>
<P>
Agglomeration is almost always beneficial if analysis of communication
requirements reveals that a set of tasks cannot execute concurrently.
For example, consider the tree and butterfly structures illustrated in
Figures <A HREF="node17.html#figsum1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#figsum1">2.8</A> and <A HREF="node18.html#figbflyN" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html#figbflyN">2.14</A>.  When a single summation
problem is performed, only tasks at the same level in the tree or
butterfly can execute concurrently.  (Notice, however, that if many
summations are to be performed, in principle all tasks can be kept
<A NAME=1525>&#160;</A>
busy by pipelining multiple summation operations.)  Hence, tasks at
different levels can be agglomerated without reducing opportunities
for concurrent execution, thereby yielding the communication
structures represented in Figure <A HREF="node18.html#figsumsagglom" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html#figsumsagglom">2.15</A>.  The hypercube
structure shown in this figure is a fundamental communication
structure that has many applications in parallel computing.  It is
discussed in greater detail in Chapter <A HREF="node123.html#chapcube" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node123.html#chapcube">11</A>.
<P>
<H2><A NAME=SECTION02342000000000000000>2.4.2 Preserving Flexibility</A></H2>
<P>
It is easy when agglomerating tasks to make design decisions that
limit unnecessarily an algorithm's scalability.  For example, we might
choose to decompose a multidimensional data structure in just a single
dimension, reasoning that this provides more than enough concurrency
for the number of processors available.  However, this strategy is
shortsighted if our program must ultimately be ported to larger
parallel computers.  It may also lead to a less efficient algorithm,
as discussed in Section <A HREF="node18.html#secbfly" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html#secbfly">2.4.1</A>.
<P>
The ability to create a varying number of tasks is critical if a
program is to be portable and scalable.  As discussed in
Chapter <A HREF="node6.html#chap1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node6.html#chap1">1</A>, good parallel algorithms are designed to be
resilient to changes in processor count.  This flexibility can also be
useful when tuning a code for a particular computer.  If tasks often
block waiting for remote data, it can be advantageous to map several
tasks to a processor.  Then, a blocked task need not result in a
processor becoming idle, since another task may be able to execute in
its place.  In this way, one task's communication is overlapped with
another task's computation.  This technique, termed <em> overlapping
<A NAME=1531>&#160;</A>
computation and communication,</em> is discussed in
Chapter <A HREF="node26.html#chapperf" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html#chapperf">3</A>.
<P>
A third benefit of creating more tasks than processors is that doing so
provides greater scope for mapping strategies that balance
computational load over available processors.  As a general rule of
thumb, we could require that there be at least an order of magnitude
more tasks than processors.  This issue is discussed in the next
section.
<P>
The optimal number of tasks is typically best determined by a
combination of analytic modeling and empirical studies.  Flexibility
does not necessarily require that a design always create a large
number of tasks.  Granularity can be controlled by a compile-time or
runtime parameter.  What is important is that a design not incorporate
unnecessary limits on the number of tasks that can be created.
<P>
<H2><A NAME=SECTION02343000000000000000>2.4.3 Reducing Software Engineering Costs</A></H2>
<P>
So far, we have assumed that our choice of agglomeration strategy is
determined solely by a desire to improve the efficiency and
flexibility of a parallel algorithm.  An additional concern, which can
be particularly important when parallelizing existing sequential
codes, is the relative development costs associated with different
partitioning strategies.  From this perspective, the most interesting
strategies may be those that avoid extensive code changes.  For
example, in a code that operates on a multidimensional grid, it may be
advantageous to avoid partitioning altogether in one dimension, if
doing so allows existing routines to be reused unchanged in a parallel
program.
<P>
Frequently, we are concerned with designing a parallel algorithm that
must execute as part of a larger system.  In this case, another
software engineering issue that must be considered is the data
distributions utilized by other program components.  For example, the
best algorithm for some program component may require that an input
array data structure be decomposed in three dimensions, while a
preceding phase of the computation generates a two-dimensional
decomposition.  Either one or both algorithms must be changed, or an
explicit restructuring phase must be incorporated in the computation.
Each approach has different performance characteristics.  These issues
are discussed in Chapter <A HREF="node39.html#chapmod" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html#chapmod">4</A>.
<P>
<H2><A NAME=SECTION02344000000000000000>2.4.4 Agglomeration Design Checklist</A></H2>
<P>
<A NAME=1536>&#160;</A>
We have now revised the partitioning and communication decisions
<A NAME=1537>&#160;</A>
developed in the first two design stages by agglomerating tasks and
communication operations.  We may have agglomerated tasks because
analysis of communication requirements shows that the original
partition created tasks that cannot execute concurrently.
Alternatively, we may have used agglomeration to increase computation
and communication granularity and/or to decrease software engineering
costs, even though opportunities for concurrent execution are reduced.
At this stage, we evaluate our design with respect to the following
checklist.  Several of these questions emphasize quantitative
performance analysis, which becomes more important as we move from the
abstract to the concrete; this topic is addressed in
Chapter <A HREF="node26.html#chapperf" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html#chapperf">3</A>.
<P>
<OL><LI>
Has agglomeration reduced communication costs by increasing locality?
If not, examine your algorithm to determine whether this could be
achieved using an alternative agglomeration strategy.
<P>
<LI>
If agglomeration has replicated computation, have you verified that
the benefits of this replication outweigh its costs, for a range of
problem sizes and processor counts?
<P>
<LI>
If agglomeration replicates data, have you verified that this does not
compromise the scalability of your algorithm by restricting the range
of problem sizes or processor counts that it can address?
<P>
<LI>
Has agglomeration yielded tasks with similar computation and
communication costs?  The larger the tasks created by agglomeration,
the more important it is that they have similar costs.  If we have
created just one task per processor, then these tasks should have
nearly identical costs.
<P>
<LI>
Does the number of tasks still scale with problem size?  If not, then
your algorithm is no longer able to solve larger problems on larger
parallel computers.
<P>
<LI>
If agglomeration eliminated opportunities for concurrent execution,
have you verified that there is sufficient concurrency for current and
future target computers?  An algorithm with insufficient concurrency
may still be the most efficient, if other algorithms have excessive
communication costs; performance models can be used to quantify these
tradeoffs.
<P>
<LI>
Can the number of tasks be reduced still further, without introducing
load imbalances, increasing software engineering costs, or reducing
scalability?  Other things being equal, algorithms that create fewer
larger-grained tasks are often simpler and more efficient than those
that create many fine-grained tasks.
<P>
<LI>
If you are parallelizing an existing sequential program, have you
considered the cost of the modifications required to the sequential
code?  If these costs are high, consider alternative agglomeration
strategies that increase opportunities for code reuse.  If the
resulting algorithms are less efficient, use performance modeling
techniques to estimate cost tradeoffs.
<P>
</OL>
<A NAME=1541>&#160;</A>
<P>

<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2059 HREF="node17.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2067 HREF="node19.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2065 HREF="node14.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node14.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2069 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2070 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2068 HREF="node19.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html">2.5 Mapping</A>
<B>Up:</B> <A NAME=tex2html2066 HREF="node14.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node14.html">2 Designing Parallel Algorithms</A>
<B> Previous:</B> <A NAME=tex2html2060 HREF="node17.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html">2.3 Communication</A>
<BR><HR><P>
<P><ADDRESS>
<I>&#169 Copyright 1995 by <A href="msgs0.htm#6" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#6">Ian Foster</a></I>
</ADDRESS>
</BODY>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -