⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 node20.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<P>
<OL><LI>
<em> Finite difference stencils.</em> If we assume a fine-grained
decomposition in which each task encapsulates a single grid point, the
nine-point stencil used in the horizontal dimension requires that each
task obtain values from eight neighboring tasks.  The corresponding
channel structure is illustrated in Figure <A HREF="node20.html#figsten9" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html#figsten9">2.23</A>.
Similarly, the three-point stencil used in the vertical dimension
requires that each task obtain values from two neighbors.
<P>
<LI>
<em> Global operations.</em> The atmosphere model computes periodically
the total mass of the atmosphere, in order to verify that the simulation is
proceeding correctly.  This quantity is defined as follows:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img267.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img267.gif"><P>
where <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img268.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img268.gif"> denotes the mass at grid point <em> (i,j,k)</em>
.  This sum
can be computed using one of the parallel summation algorithms
presented in Section <A HREF="node18.html#secbfly" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html#secbfly">2.4.1</A>.
<P>
<LI>
<em> Physics computations.</em> If each task encapsulates a single grid
point, then the physics component of the atmosphere model requires
considerable communication.  For example, the total clear sky (TCS) at
level <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img269.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img269.gif"> is defined as
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img270.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img270.gif"><P>
where level 0 is the top of the atmosphere and cld<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img271.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img271.gif"> is the cloud
fraction at level <em> i</em>
.  This <em> prefix product
 </em> operation
<A NAME=1794>&#160;</A>
can be performed in <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img272.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img272.gif"> steps using a variant of the hypercube
algorithm of Section <A HREF="node18.html#secbfly" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html#secbfly">2.4.1</A>.  In total, the physics component
of the model requires on the order of 30 communications per grid point
and per time step.
</OL>
<P>
Let us evaluate this design by using the checklist of
Section <A HREF="node17.html#secrules2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node17.html#secrules2">2.3.5</A>.  The communication associated with the
finite difference stencil is distributed and hence can proceed
concurrently.  So is the communication required for the global
communication operation, thanks to the distributed algorithm developed
in Section <A HREF="node18.html#secbfly" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node18.html#secbfly">2.4.1</A>.  (We might also consider performing this
global operation less frequently, since its value is intended only for
diagnostic purposes.)  The one component of our algorithm's
communication structure that is problematic is the physics.  However,
we shall see that the need for this communication can be avoided by
agglomeration.
<P>
<P><A NAME=2809>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img275.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img275.gif">
<BR><STRONG>Figure 2.24:</STRONG> <em> Using agglomeration to reduce communication requirements
in the atmosphere model.  In (a), each task handles a single point
and hence must obtain data from eight other tasks in order to implement the
nine-point stencil.  In (b), granularity is increased to <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img274.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img274.gif">
points, meaning that only 4 communications are required per
task.</em><A NAME=figfd2>&#160;</A><BR>
<P><H4><A NAME=SECTION02362030000000000000> Agglomeration.</A></H4>
<P>
<A NAME=1804>&#160;</A>
Our fine-grained domain decomposition of the atmosphere model has
created <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img276.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img276.gif"> tasks: between <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img277.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img277.gif"> and <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img278.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img278.gif">,
depending on problem size.  This is likely to be many more than we
require and some degree of agglomeration can be considered.  We
identify three reasons for pursuing agglomeration:
<OL><LI>
As illustrated in Figure <A HREF="node20.html#figfd2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html#figfd2">2.24</A>, a small amount of agglomeration
(from one to four grid points per task) can reduce the communication
requirements associated with the nine-point stencil from eight to four
messages per task per time step.
<P>
<LI>
Communication requirements in the horizontal dimension are relatively
small: a total of four messages containing eight data values.  In
contrast, the vertical dimension requires communication not only for
the finite difference stencil (2 messages, 2 data values) but also for
various ``physics'' computations (30 messages).  These communications
can be avoided by agglomerating tasks within each vertical column.
<P>
<LI>
Agglomeration in the vertical is also desirable from a software
engineering point of view.  Horizontal dependencies are restricted to
the dynamics component of the model; the physics component operates
within individual columns only.  Hence, a two-dimensional horizontal
decomposition would allow existing sequential physics code to be
reused in a parallel program without modification.
</OL>
<P>
This analysis makes it appear sensible to refine our parallel
algorithm to utilize a two-dimensional horizontal decomposition of the
model grid in which each task encapsulates at least four grid points.
Communication requirements are then reduced to those associated with
the nine-point stencil and the summation operation.  Notice that this
algorithm can create at most <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img279.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img279.gif"> tasks: between <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img280.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img280.gif">
and <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img281.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img281.gif">, depending on problem size.  This number is likely to be enough
for most practical purposes.
<P>
<H4><A NAME=SECTION02362040000000000000> Mapping.</A></H4>
<P>
<A NAME=1809>&#160;</A>
In the absence of load imbalances, the simple mapping strategy
illustrated in Figure <A HREF="node19.html#figmap1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html#figmap1">2.16</A> can be used.  It is clear from
the figure that in this case, further agglomeration can be performed;
in the limit, each processor can be assigned a single task responsible
for many columns, thereby yielding an SPMD program.
<P>
This mapping strategy is efficient if each grid column task performs
the same amount of computation at each time step.  This assumption is
valid for many finite difference problems but turns out to be invalid
for some atmosphere models.  The reason is that the cost of
physics computations can vary significantly depending on model
state variables.  For example, radiation calculations are not
performed at night, and clouds are formed only when humidity exceeds a
certain threshold.  The sort of variation in computational load that
can result is illustrated in
<A HREF="#ccm2">Plate 5</A>.

<P>
<P><HR>

<A NAME=ccm2 HREF="rad.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/rad.gif"> <img
src="rad_small.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/rad_small.gif"></A>
<A HREF="norad.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/norad.gif"> <img
src="norad_small.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/norad_small.gif"></A>
<P>
(GIF <A HREF="rad.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/rad.gif">28536</A> and <A
HREF="norad.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/norad.gif">156403</A> bytes; RGB <A
HREF="rad.rgb" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/rad.rgb">116173</A> and <A
HREF="javascript:if(confirm('http://www.dit.hcmut.edu.vn/books/system/par_anl/norad.rgb  \n\nThis file was not retrieved by Teleport Pro, because the server reports that this file cannot be found.  \n\nDo you want to open it from the server?'))window.location='http://www.dit.hcmut.edu.vn/books/system/par_anl/norad.rgb'" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/norad.rgb">919250</A> bytes.) Plate 5: Load distribution
in an atmosphere model with a 64X128 grid.  The figure shows per-point
computational load at a single time step, with the histogram giving
relative frequency of different load values.  The left-hand image
shows a time step in which radiation time steps are performed, and the
right-hand image an ordinary time step.  Diurnal, land/ocean, and
local variations are visible.  Images courtesy of J. Michalakes.
<P><HR>

<P>
<P><A NAME=2824>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img284.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img284.gif">
<BR><STRONG>Figure 2.25:</STRONG> <em> Load distribution in the physics component of an atmosphere
model in the absence of load balancing.  In the top part of the
figure, shading is used to indicate computational load in each of
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img283.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img283.gif"> processors.  A strong spatial variation is evident.  This
effect is due to the night/day cycle (radiation calculations are
performed only in sunlight); hence, there is a temporal variation
also.  The bottom part of the figure is a histogram showing the
distribution of computation times, which vary by a factor of 5.  These
results were obtained by using a parallel version of the National
Center for Atmospheric Research (NCAR) Community Climate Model (CCM2)
on the 512-processor Intel DELTA computer.</em><A NAME=figcmlb1>&#160;</A><BR>
<P>
<P>
<P><A NAME=2839>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img285.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img285.gif">
<BR><STRONG>Figure 2.26:</STRONG> <em> Load distribution in the physics component of CCM2
when using a cyclic mapping.  A comparison with Figure 2.25 shows that
load imbalances are reduced significantly.</em><A NAME=figcmlb2>&#160;</A><BR>
<P>
<P>
<A NAME=1825>&#160;</A>
Empirical studies suggest that these load imbalances can reduce
computational efficiency by 20 percent or more
(Figure <A HREF="node20.html#figcmlb1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html#figcmlb1">2.25</A>; see also
<A HREF="#ccm2">Plate 5</A>).

<P>
In many circumstances, this performance loss may be regarded as
acceptable.  However, if a model is to be used extensively, it is
worthwhile to spend time improving efficiency.  One approach is to use
a form of cyclic mapping: for example, allocating each processor tasks
from western and eastern and from northern and southern hemispheres.
Figure <A HREF="node20.html#figcmlb2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html#figcmlb2">2.26</A> shows the reduction in load imbalance that can
be achieved with this technique; this reduction must be weighed
against the resulting increase in communication costs.
<P>
<H2><A NAME=SECTION02363000000000000000>2.6.3 Atmosphere Model Summary</A></H2>
<P>
The design of a parallel atmosphere model has proved to be
straightforward process, in that most design choices are clear-cut.  A
two-dimensional domain decomposition of the model grid results in a
need for both local communication between tasks handling neighboring
grid points and a parallel summation operation.
<P>
One unanswered question concerns whether load-balancing algorithms
<A NAME=1833>&#160;</A>
should be incorporated into the model.  Because load balancing adds to the
complexity of the overall design, this decision requires both
performance data (of the sort presented in Figure <A HREF="node20.html#figcmlb1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node20.html#figcmlb1">2.25</A>) and
information about the expected use of the parallel model.  Another
question, addressed in Chapter <A HREF="node39.html#chapmod" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html#chapmod">4</A>, is how the
atmosphere model will fit into a larger framework such as the climate
model of Figure <A HREF="node16.html#figenv" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node16.html#figenv">2.3</A>.
<A NAME=1837>&#160;</A>
<P>

<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2083 HREF="node19.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2091 HREF="node21.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node21.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2089 HREF="node14.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node14.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2093 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2094 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2092 HREF="node21.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node21.html">2.7 Case Study: Floorplan Optimization</A>
<B>Up:</B> <A NAME=tex2html2090 HREF="node14.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node14.html">2 Designing Parallel Algorithms</A>
<B> Previous:</B> <A NAME=tex2html2084 HREF="node19.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node19.html">2.5 Mapping</A>
<BR><HR><P>
<P><ADDRESS>
<I>&#169 Copyright 1995 by <A href="msgs0.htm#6" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#6">Ian Foster</a></I>
</ADDRESS>
</BODY>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -