📄 node19.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
load-balancing algorithms compensate for changes in computational load
using only information obtained from a small number of neighboring
processors.  For example, processors may be organized in a logical
mesh; periodically, each processor compares its computational load
with that of its neighbors in the mesh and transfers computation if
the difference in load exceeds some threshold.  Figure <A HREF="node19.html#figload2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html#figload2">2.17</A>
and
<A HREF="#mpmm">Plate 3</A>

<P>
show load distributions produced by such
schemes.
<P>
<P><HR>
<A NAME=mpmm HREF="Indlocal.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/Indlocal.gif"> <img
ALIGN=MIDDLE src="Indlocal_small.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/Indlocal_small.gif"></A>
<P>
(GIF <A HREF="Indlocal.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/Indlocal.gif">27606</A> bytes; RGB <A
HREF="javascript:if(confirm('http://www.dit.hcmut.edu.vn/books/system/par_anl/Indlocal.rgb  \n\nThis file was not retrieved by Teleport Pro, because the server reports that this file cannot be found.  \n\nDo you want to open it from the server?'))window.location='http://www.dit.hcmut.edu.vn/books/system/par_anl/Indlocal.rgb'" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/Indlocal.rgb">133673</A> bytes.)
Plate 3: A dynamic, local
load-balancing algorithm applied to a weather model.  This shows the
situation after grid points have migrated to compensate for a ``hot
spot'' slightly to the left of the center of the grid.  Image courtesy
of J. Michalakes.
<P><HR>

<P>
Because local algorithms are inexpensive to operate, they can be
useful in situations in which load is constantly changing.  However,
they are typically less good at balancing load than global algorithms
and, in particular, can be slow to adjust to major changes in load
characteristics.  For example, if a high load suddenly appears on one
processor, multiple local load-balancing operations are
required before load ``diffuses'' to other processors.
<P>
<H4><A NAME=SECTION02351030000000000000> Probabilistic Methods.</A></H4>
<P>
A particularly simple approach to load balancing is to allocate tasks
<A NAME=1615>&#160;</A>
to randomly selected processors.  If the number of tasks is large, we
<A NAME=1616>&#160;</A>
can expect that each processor will be allocated about the same amount
of computation.  Advantages of this strategy are its low cost and
scalability.  Disadvantages are that off-processor communication is
required for virtually every task and that acceptable load
distribution is achieved only if there are many more tasks than there
are processors.  The strategy tends to be most effective when there is
relatively little communication between tasks and/or little locality
in communication patterns.  In other cases, probabilistic methods
tend to result in considerably more communication than do other
techniques.
<P>
<H4><A NAME=SECTION02351040000000000000> Cyclic Mappings.</A></H4>
<P>
<A NAME=1618>&#160;</A>
If we know both that computational load per grid point varies and that
there is significant spatial locality in load levels, then a <em>
<A NAME=1619>&#160;</A>
cyclic
 </em> (or <em> scattered</em>, as it is sometimes called) mapping
of tasks to processors can be appropriate.  That is, each of
<em> P</em>
 processors is allocated every <em> P</em>
th task according to some
enumeration of the tasks (Figure <A HREF="node19.html#figload43" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html#figload43">2.18</A>).  This technique is a form of
probabilistic mapping. The goal is that, on average, each processor
will be allocated about the same computational load.  The benefits of
improved load balance may need to be weighed against increased
communication costs due to reduced locality.  Block cyclic
distributions are also possible, in which blocks of tasks are
allocated to processors.
<P>
<P><A NAME=2705>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img243.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img243.gif">
<BR><STRONG>Figure 2.18:</STRONG> <em> Using a cyclic mapping for load balancing in a grid
problem, when executing on 12 processors.  Tasks mapped to a single
processor are shaded.  Notice that with this mapping, all
communications are with tasks located on different processors
(assuming a five-point stencil).</em><A NAME=figload43>&#160;</A><BR>
<P><H2><A NAME=SECTION02352000000000000000>2.5.2 Task-Scheduling Algorithms</A></H2>
<P>
<A NAME=secmwmw>&#160;</A>
<P>
<A NAME=1629>&#160;</A>
Task-scheduling algorithms can be used when a functional decomposition
yields many tasks, each with weak locality requirements.  A
centralized or distributed task pool is maintained, into
which new tasks are placed and from which tasks are taken for
allocation to processors.  In effect, we reformulate the parallel
algorithm so that what were originally conceived of as tasks become
data structures representing ``problems,'' to be solved by a set of
worker tasks, typically one per processor.
<P>
The most critical (and complicated) aspect of a task-scheduling
<A NAME=1630>&#160;</A>
algorithm is the strategy used to allocate problems to workers.
Generally, the chosen strategy will represent a compromise between the
conflicting requirements for independent operation (to reduce
communication costs) and global knowledge of computation state (to
improve load balance).  We discuss manager/worker, hierarchical
manager/worker, and decentralized approaches.
<P>
<P><A NAME=2721>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img244.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img244.gif">
<BR><STRONG>Figure 2.19:</STRONG> <em> Manager/worker load-balancing structure.  Workers
repeatedly request and process problem descriptions; the manager
maintains a pool of problem descriptions (<tt> p</tt>) and responds to
requests from workers.</em><A NAME=figLB2>&#160;</A><BR>
<P><H4><A NAME=SECTION02352010000000000000> Manager/Worker.</A></H4>
<P>
Figure <A HREF="node19.html#figLB2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html#figLB2">2.19</A> illustrates a particularly simple task scheduling
scheme that is nevertheless effective for moderate numbers of
processors.  This strategy was used previously in
Section <A HREF="node10.html#exdatabase" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node10.html#exdatabase">1.4.4</A>.  A central manager task is given
responsibility for problem allocation.  Each worker repeatedly
requests and executes a problem from the manager.  Workers can also
send new tasks to the manager for allocation to other workers.  The
efficiency of this strategy depends on the number of workers and the
<A NAME=1638>&#160;</A>
relative costs of obtaining and executing problems.  Efficiency can be
<A NAME=1639>&#160;</A>
improved by prefetching problems so as to overlap computation and
<A NAME=1640>&#160;</A>
communication, and by caching problems in workers, so that workers
communicate with the manager only when no problems are available
locally.
<P>
<H4><A NAME=SECTION02352020000000000000> Hierarchical Manager/Worker.</A></H4>
<P>
A variant of the manager/worker scheme divides workers into
<A NAME=1642>&#160;</A>
disjoint sets, each with a submanager.  Workers request tasks from
submanagers, which themselves communicate periodically with the
manager and with other submanagers to balance load between the sets of
processors for which they are responsible.
<P>
<H4><A NAME=SECTION02352030000000000000> Decentralized Schemes.</A></H4>
<P>
<A NAME=1644>&#160;</A>
In completely decentralized schemes, there is no central manager.
Instead, a separate task pool is maintained on each processor, and
idle workers request problems from other processors.  In effect, the
task pool becomes a distributed data structure that is accessed by the
different tasks in an asynchronous fashion.  A variety of access
policies can be defined.  For example, a worker may request work from
a small number of predefined ``neighbors'' or may select other
processors at random.  In a hybrid centralized/distributed scheme,
requests are sent to a central manager, which allocates them to
workers in a round-robin fashion.  Notice that while this manager will
certainly be a bottleneck on large numbers of processors, it will
typically be accessed less frequently than will the manager in a
manager/worker scheduler and hence is a more scalable construct.
<P>
As noted in Section <A HREF="node17.html#seccommas" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#seccommas">2.3.4</A>, access to a distributed 
data structure, such as the task pool maintained by a decentralized
load-balancing scheme,
can be provided in several different ways.
<A NAME=1646>&#160;</A>
Workers can be made responsible for both computing and managing the
queue of problems. In this case, each worker must periodically
<A NAME=1647>&#160;</A>
poll to detect pending requests.  Alternatively, computation and
task pool management responsibilities can be encapsulated in separate
tasks.
<P>
<A NAME=1648>&#160;</A>
<H4><A NAME=SECTION02352040000000000000> Termination Detection.</A></H4>
<P>
Task-scheduling algorithms require a mechanism for determining when a
search is complete; otherwise, idle workers will never stop requesting
work from other workers.  This <em> termination
<A NAME=1650>&#160;</A>
detection
 </em> operation is straightforward in centralized schemes,
because the manager can easily determine when all workers are idle.
It is more difficult in decentralized algorithms, because not only is
there no central record of which workers are idle, but also messages
in transit may be carrying tasks even when all workers appear to be
idle.  See the chapter notes for references to termination-detection
algorithms.
<P>
<H2><A NAME=SECTION02353000000000000000>2.5.3 Mapping Design Checklist</A></H2>
<P>
<A NAME=1652>&#160;</A>
We have now completed our parallel algorithm design by specifying how
<A NAME=1653>&#160;</A>
tasks defined in previous design stages are mapped to processors.  Our
mapping decisions seek to balance conflicting requirements for
equitable load distribution and low communication costs.  When
possible, we use a static mapping scheme that allocates each task to a
single processor.  However, when the number or size of tasks is
variable or not known until runtime, we may use a dynamic load
balancing scheme or reformulate the problem so that a task scheduling
structure can be used to schedule computation.
<P>
The following checklist can serve as a basis for an informal
evaluation of the mapping design.
<P>
<OL><LI>
If considering an SPMD design for a complex problem, have you also
considered an algorithm based on dynamic task creation and deletion?
The latter approach can yield a simpler algorithm (as will be
illustrated in Section <A HREF="node21.html#secfloor" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node21.html#secfloor">2.7</A>); however, performance can be
problematic.
<P>
<LI>
If considering a design based on dynamic task creation and deletion,
have you also considered an SPMD algorithm?  An SPMD algorithm
provides greater control over the scheduling of communication and
computation, but can be more complex.
<P>
<LI>
If using a centralized load-balancing scheme, have you verified that
the manager will not become a bottleneck?  You may be able to reduce
communication costs in these schemes by passing pointers to tasks,
rather than the tasks themselves, to the manager.
<P>
<LI>
If using a dynamic load-balancing scheme, have you evaluated the
relative costs of different strategies?  Be sure to include the
implementation costs in your analysis.  Probabilistic or cyclic
mapping schemes are simple and should always be considered, because they
can avoid the need for repeated load-balancing operations.
<P>
<LI>
If using probabilistic or cyclic methods, do you have a large enough number
of tasks 
to ensure reasonable load balance?  Typically, at least ten
times as many tasks as processors are required.
<P>
</OL>
<P>
We have now completed the design of one or more parallel algorithms
designs for our problem.  However, we are not quite ready to start
writing code: several phases in the design process remain.  First, we
need to conduct some simple performance analyses in order to choose
between alternative algorithms and to verify that our design meets
performance goals.  We should also think hard about the implementation
costs of our designs, about opportunities for reusing existing code in
their implementation, and about how algorithms fit into larger systems
of which they may form a part.  These issues are discussed in detail
in Chapters <A HREF="node26.html#chapperf" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node26.html#chapperf">3</A> and <A HREF="node39.html#chapmod" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html#chapmod">4</A>.
<P>
<A NAME=1659>&#160;</A>
<P>

<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2071 HREF="node18.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2079 HREF="node20.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2077 HREF="node14.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node14.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2081 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2082 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2080 HREF="node20.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html">2.6 Case Study: Atmosphere Model</A>
<B>Up:</B> <A NAME=tex2html2078 HREF="node14.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node14.html">2 Designing Parallel Algorithms</A>
<B> Previous:</B> <A NAME=tex2html2072 HREF="node18.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html">2.4 Agglomeration</A>
<BR><HR><P>
<P><ADDRESS>
<I>&#169 Copyright 1995 by <A href="msgs0.htm#6" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#6">Ian Foster</a></I>
</ADDRESS>
</BODY>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -