⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 node43.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<P>
Notice that communication costs are not necessarily distributed
equally among processors in the second algorithm, since the sending and
receiving processors form distinct groups.  Nevertheless,
Equations <A HREF="node43.html#eq877" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#eq877">4.1</A> and <A HREF="node43.html#eq878" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#eq878">4.2</A> give a rough idea of the
relative performance of the two algorithms.  The second algorithm
sends significantly fewer messages and hence should be more efficient
in situations in which message startup costs are dominant, for example,
when <em> N</em>
 and/or <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img758.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img758.gif"> are small or when
<em> P</em>
 or <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img759.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img759.gif"> are large.  On the other hand, the first algorithm
probably incurs lower data transfer costs and hence may be superior in
other situations.
<P>
<H2><A NAME=SECTION02542000000000000000>4.4.2 Composing Components</A></H2>
<P>
Having designed two alternative algorithms for the 2-D FFT, we now
<A NAME=5754>&#160;</A>
consider the parallel convolution algorithm proper.  Its four
<A NAME=5755>&#160;</A>
components---two parallel 2-D FFTs, one matrix multiplication, and one
inverse 2-D FFT (Figure <A HREF="node43.html#figmodpipe" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#figmodpipe">4.4</A>)---can be combined using
either sequential or parallel composition.  If sequential composition
is used, the parallel convolution algorithm can be represented as
follows, with the <tt> fft</tt> and <tt> fft<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img760.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img760.gif"></tt> calls invoking the
transpose 2-D parallel FFT.
<P>

<PRE><TT> 
		<tt> for each image</tt>
<P>
				<tt> A<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img761.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img761.gif"> = fft(A)</tt>
<P>
				<tt> B<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img762.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img762.gif"> = fft(B)</tt>
<P>
				<tt> C<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img763.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img763.gif"> = A<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img764.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img764.gif">.B<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img765.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img765.gif"></tt>
<P>
				<tt> C = fft<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img766.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img766.gif">(C<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img767.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img767.gif">)</tt>
<P>
		<tt> endfor</tt>
<P>
</TT></PRE>

<P>
If the input to this algorithm is decomposed appropriately (in one
dimension, by rows), then because each FFT involves a transpose, total
communication requirements are three times the cost of a single
transpose:
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img768.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img768.gif"><P>
Notice that because the forward FFT operates first on rows and then on
columns, the inverse FFT must operate first on columns and then on
rows, so as to avoid the need for an additional transpose operation
between the forward and inverse FFTs.
<P>
If parallel composition is used, the three FFTs execute
concurrently, each on one third of the available processors.  (Because the
<A NAME=5772>&#160;</A>
multiplication involves <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img769.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img769.gif"> rather than <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img770.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img770.gif">
<A NAME=5773>&#160;</A>
operations, we regard it as insignificant and compose it sequentially
with the inverse FFT.)  Communication costing <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img771.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img771.gif"> is
required to move data from the processors handling the forward FFTs to
the processors handling the inverse FFT.
<P>
The 2-D FFTs within the parallel composition can be implemented by
using either the transpose or pipeline algorithms, yielding two
algorithm variants.  Costs are as specified by Equations <A HREF="node43.html#eq877" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#eq877">4.1</A>
and <A HREF="node43.html#eq878" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#eq878">4.2</A>, except that each algorithm is executed three times,
with <em> P/3</em>
 rather than <em> P</em>
 processors involved in each
execution.  Combining these costs with the cost of the data movement
between components, we obtain the following models.
<P>
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img772.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img772.gif"><P>
<P>
<P><A NAME=6173>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img773.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img773.gif">
<BR><STRONG>Table 4.1:</STRONG>  Approximate total message counts and data volumes for
three parallel convolution algorithms, summed over <em> P</em>
 processors
and assuming that <em> P</em>
 is reasonably large.
<A NAME=tabmodconv>&#160;</A><BR>
<P>
<P>
<P><A NAME=6364>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img774.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img774.gif">
<BR><STRONG>Figure 4.6:</STRONG> <em> Performance of the sequential/transpose (Sequential)
and parallel/transpose (Parallel) convolution algorithms on an IBM
SP computer for different problem sizes and numbers of processors.
The latter algorithm is more efficient for smaller problems and larger
numbers of processors.</em><A NAME=figbook>&#160;</A><BR>
<P>
<P>
The results of this brief analysis are summarized in
Table <A HREF="node43.html#tabmodconv" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#tabmodconv">4.1</A>.  We see that the second and third algorithms
perform fewer message startups but send more data.  Hence, they can be
expected to be more efficient on smaller problems and larger numbers
of processors.  This result can be confirmed experimentally, as
illustrated in Figure <A HREF="node43.html#figbook" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#figbook">4.6</A>.  A flexible parallel program
might incorporate all three algorithms and select the appropriate
alternative at runtime.  Of course, a programming tool that supports
only sequential composition will allow only the first algorithm to be
used.
<P>
<H2><A NAME=SECTION02543000000000000000>4.4.3 Convolution Problem Summary</A></H2>
<P>
The convolution problem illustrates the design tradeoffs that can
arise when constructing even relatively simple parallel programs.
These tradeoffs can arise at multiple levels.  First, we must identify
candidate parallel algorithms for the component modules: in this case,
the 2-D FFT.  Then, we must decide how to compose these building
blocks so as to construct a complete parallel algorithm.  Aspects of
the complete design in turn influence the techniques used within
components, requiring for example that we operate on columns before
rows in the inverse FFT.  The performance analysis must take into
account all these design decisions.
<A NAME=5810>&#160;</A>
<P>

<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2374 HREF="node42.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node42.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2382 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2380 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2384 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2385 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2383 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html">4.5 Case Study: Tuple Space</A>
<B>Up:</B> <A NAME=tex2html2381 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html">4 Putting Components Together</A>
<B> Previous:</B> <A NAME=tex2html2375 HREF="node42.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node42.html">4.3 Performance Analysis</A>
<BR><HR><P>
<P><ADDRESS>
<I>&#169 Copyright 1995 by <A href="msgs0.htm#6" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#6">Ian Foster</a></I>
</ADDRESS>
</BODY>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -