📄 node43.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
12 下一页
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>4.4 Case Study: Convolution</TITLE>
</HEAD>
<BODY>
<meta name="description" value="4.4 Case Study: Convolution">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
 <BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2374 HREF="node42.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node42.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2382 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2380 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2384 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2385 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2383 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html">4.5 Case Study: Tuple Space</A>
<B>Up:</B> <A NAME=tex2html2381 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html">4 Putting Components Together</A>
<B> Previous:</B> <A NAME=tex2html2375 HREF="node42.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node42.html">4.3 Performance Analysis</A>
<BR><HR><P>
<H1><A NAME=SECTION02540000000000000000>4.4 Case Study: Convolution</A></H1>
<P>
<A NAME=eximage>&#160;</A>
<P>
<A NAME=5692>&#160;</A>
In the remainder of this chapter, we apply the modular design
<A NAME=5693>&#160;</A>
techniques discussed in preceding sections in three case studies.  We
<A NAME=5694>&#160;</A>
start with an example from image processing, which we
<A NAME=5695>&#160;</A>
use to study design tradeoffs that can arise when constructing
<A NAME=5696>&#160;</A>
parallel programs from several components.  We consider the problem of
<A NAME=5697>&#160;</A>
applying a series of <em> convolution
 </em> operations to a sequence
of images.  Images, represented as arrays of size
<em> N</em>
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img745.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img745.gif"><em> N</em>
, are input in pairs on streams <em> A</em>
 and
<em> B</em>
; convolution generates a new array of the same size that is output
on stream <em> C</em>
 (Figure <A HREF="node43.html#figmodpipe" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#figmodpipe">4.4</A>).  A single convolution
operation involves the transformation of two input arrays using
independent two-dimensional fast Fourier transforms (2-D FFTs), a
pointwise multiplication of the two transformed arrays, and the
transformation of the resulting array using an inverse 2-D FFT,
thereby generating an output array.  A 2-D FFT performs 1-D FFTs first
on each row and then on each column of an array.  A 1-D Fourier
transform, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img746.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img746.gif">, of a sequence of <em> N</em>
 values,
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img747.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img747.gif">, is given by
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img748.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img748.gif"><P>
where <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img749.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img749.gif">.  The FFT exploits symmetry to perform this
computation in <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img750.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img750.gif"> steps, each involving <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img751.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img751.gif"> operations.
<P>
<P><A NAME=6308>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img752.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img752.gif">
<BR><STRONG>Figure 4.4:</STRONG> <em> Dataflow diagram for an image-processing pipeline.
Two streams of images, <em> A</em>
 and <em> B</em>
, are passed through FFT
modules and then into an inverse FFT module, which first multiplies
them and then applies an inverse FFT.</em><A NAME=figmodpipe>&#160;</A><BR>
<P><H2><A NAME=SECTION02541000000000000000>4.4.1 Components</A></H2>
<P>
<A NAME=secmodfft2>&#160;</A>
<P>
<A NAME=5721>&#160;</A>
We first consider the three components from which the convolution
algorithm is constructed: forward 2-D FFT, multiplication, and inverse
2-D FFT.  The pointwise multiplication is the simplest: Its
communication requirements are zero as long as the arrays on which it
operates have the same data distribution.
<P>
<P><A NAME=6325>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img753.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img753.gif">
<BR><STRONG>Figure 4.5:</STRONG> <em> Two parallel algorithms for computing a series of
2-D FFTs.  In each case, the activity on each of four
processors (P0--3) is shown over time, with arrows denoting communication and I
and O denoting input and output operations.  The algorithm illustrated
in the upper part of the figure is a sequential composition of program
components that perform 1-D FFTs, first on rows and then on columns of
each input 2-D array; all-to-all communication is required to
transpose the array after performing row FFTs and before performing
column FFTs.  In the second algorithm, data flow from a first set of
processors performing row FFTs (P0, P1) to a second set performing
column FFTs (P2, P3).  Communication is required to move data from P0
and P1 to P2 and P3.</em><A NAME=figmodfft>&#160;</A><BR>
<P>
<P>
A variety of parallel algorithms are possible for the forward and
inverse 2-D FFTs.  A fine-grained algorithm can exploit concurrency
within the 1-D FFTs performed on individual rows and columns of the
input array, at the cost of considerable communication.  A more
coarse-grained algorithm performs independent 1-D FFTs in parallel,
thereby avoiding a need for communication within the 1-D FFTs, but
requiring communication when moving from a row-based to a column-based
decomposition.  We consider two algorithms based on the latter
strategy.  The first processes the input image stream sequentially,
performing row FFTs and then column FFTs on each image in turn.  The
second algorithm pipelines the image stream, performing column FFTs
for one image in parallel with the row FFTs for the next
(Figure <A HREF="node43.html#figmodfft" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node43.html#figmodfft">4.5</A>).  These two algorithms are in effect
sequential and parallel compositions, respectively, of code fragments
that perform 1-D FFTs on the rows and columns of a two-dimensional
array.
<P>
<A NAME=5727>&#160;</A>
The first parallel algorithm is termed the <em> transpose
 </em>
<A NAME=6150>&#160;</A>
algorithm, since it performs a series of one-dimensional transforms on
<A NAME=5730>&#160;</A>
<em> P</em>
 processors while the array is partitioned in one dimension,
then transposes the array and performs transforms in the second
dimension using the same processors.  The transpose operation requires
that each processor send one message of size <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img754.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img754.gif"> to each of the
<em> P-1</em>
 other processors.  Hence, total communication costs summed
over <em> P</em>
 processors are
<P><A NAME=eq877>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img755.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img755.gif"><P>
<P>
<A NAME=5740>&#160;</A>
The second algorithm is termed the <em> pipeline
 </em> algorithm, since it partitions processors into
two sets of size <em> P/2</em>
 which perform FFTs on rows and columns,
respectively.  Each processor in the first set must communicate with
each processor in the other set, for a total of <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img756.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img756.gif"> messages.  The
entire array is communicated.  Hence, total communication costs are
<P><A NAME=eq878>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img757.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img757.gif"><P>
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -