📄 node45.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
To complete the second parallel algorithm, we need to design a
strategy
<A NAME=5964>&#160;</A>
for communicating the submatrices between tasks.  One approach is for
each task to execute the following logic (Figure <A HREF="node45.html#figlamatmul" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#figlamatmul">4.12</A>):
<P>
<tt>
<PRE><TT> 
		 set <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img799.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img799.gif">
<P>
		 for <em> j</em>
=0 to <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img800.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img800.gif">
<P>
				 in each row <em> i</em>
, the <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img801.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img801.gif">th task broadcasts
<P>
						<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img802.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img802.gif"> to the other tasks in the row
<P>
				 accumulate <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img803.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img803.gif">.<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img804.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img804.gif">
<P>
				 send <IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img805.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img805.gif"> to upward neighbor
<P>
		 endfor
<P>
</TT></PRE>
</tt>
<P>
Each of the <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img806.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img806.gif"> steps in this algorithm involves a broadcast
to <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img807.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img807.gif"> tasks (for <em> A'</em>
) and a nearest-neighbor
communication (for <em> B'</em>
).  Both communications involve <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img808.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img808.gif">
data.  Because the broadcast can be accomplished in <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img809.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img809.gif"> steps
using a tree structure, the per-processor communication cost is
<P>
<P><A NAME=eqlamatmulc>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img810.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img810.gif"><P>
Notice that because every task in each row must serve as the root of a
broadcast tree, the total communication structure required for this
algorithm combines a hypercube (butterfly) structure within each row
of the two-dimensional task mesh and a ring within each column.
<P>
<P><A NAME=6575>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img811.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img811.gif">
<BR><STRONG>Figure 4.13:</STRONG> <em> Reorganizing from a one-dimensional to a one-dimensional
decomposition of a square matrix when <b>P=16</b>.  Shading indicates one
set of four tasks that must exchange data during the
reorganization.</em><A NAME=figmodmm>&#160;</A><BR>
<P><H2><A NAME=SECTION02562000000000000000>4.6.2 Redistribution Costs</A></H2>
<P>
Comparing Equations <A HREF="node45.html#eqmm1d" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#eqmm1d">4.3</A> with <A HREF="node45.html#eqlamatmulc" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#eqlamatmulc">4.4</A>,
<A NAME=5993>&#160;</A>
we see that the two-dimensional decomposition yields the more efficient
parallel algorithm.  Does this mean that our parallel library should
convert input arrays decomposed in one dimension to a two-dimensional
decomposition before performing the matrix multiplication?  To answer
this question, we need to know the cost of the reorganization.  The
communication costs associated with the reorganization of a single
array are as follows; each task exchanges data with <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img812.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img812.gif"> other
tasks, with each message having size <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img813.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img813.gif">
(Figure <A HREF="node45.html#figmodmm" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#figmodmm">4.13</A>):
<P>
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img814.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img814.gif"><P>
<P>
If <em> A</em>
, <em> B</em>
, and <em> C</em>
 are all decomposed in one dimension,
we must perform three such conversions.  This gives a worst-case total
communication cost for reorganization and multiplication using the
two-dimensional algorithm of
<P>
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img815.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img815.gif"><P>
Comparing this expression with Equation <A HREF="node45.html#eqmm1d" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#eqmm1d">4.3</A>, we see that
the algorithm that reorganizes data structures to a 2-D decomposition
before performing the multiplication will be more efficient than an
algorithm that does not, when
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img816.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img816.gif"><P>
<P>
This condition holds for all except small <em> P</em>
.  Hence, we conclude
that our parallel matrix multiply library should convert to a
two-dimensional decomposition before performing computation, as
follows.
<P>

<PRE><TT> 
		<tt> procedure matrix_multiply(A, B, C)</tt>
<P>
		<tt> begin</tt>
<P>
				<tt> if 1d_distributed(A) then reorg_to_2d(A)</tt>
<P>
				<tt> if 1d_distributed(B) then reorg_to_2d(B)</tt>
<P>
				<tt> 2d_matrix_multiply(A, B, C)</tt>
<P>
				<tt> if 1d_distributed(C) then reorg_to_1d(C)</tt>
<P>
		<tt> end</tt>
<P>
</TT></PRE>

<P>
<P><A NAME=6600>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img819.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img819.gif">
<BR><STRONG>Figure 4.14:</STRONG> <em> Layout of the <em> A</em>
 and <em> B</em>
 matrices in the systolic
matrix-matrix multiplication algorithm for a <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img818.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img818.gif"> task mesh.
The arrows show the direction of data movement during execution of the
systolic algorithm.</em><A NAME=figlasystolic>&#160;</A><BR>
<P><H2><A NAME=SECTION02563000000000000000>4.6.3 A Systolic Algorithm</A></H2>
<P>
We still have not said the last word about the ideal data distribution
for matrix-matrix multiplication!  An alternative algorithm
<A NAME=6039>&#160;</A>
allows the broadcast operations used in the preceding algorithm to be
<A NAME=6040>&#160;</A>
replaced with regular, nearest-neighbor (``systolic'') communications.
However, data must be distributed among tasks in a different fashion.
As before, we assume that <em> A</em>
, <em> B</em>
, and
<em> C</em>
 are decomposed into <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img820.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img820.gif"> submatrices.  Each task
<em> (i,j)</em>
 contains submatrices <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img821.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img821.gif">, <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img822.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img822.gif">, and <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img823.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img823.gif">, where
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img824.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img824.gif">.  This data layout is
illustrated in Figure <A HREF="node45.html#figlasystolic" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#figlasystolic">4.14</A>.
<P>
Computation proceeds in <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img825.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img825.gif"> steps.  In each step, contributions
to <em> C</em>
 are accumulated in each task, after which values of
<em> A</em>
 move down and values of <em> B</em>
 move right.  The entire computation
requires a total of <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img826.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img826.gif"> messages per task, each of size
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img827.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img827.gif">, for a cost of
<P>
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img828.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img828.gif"><P>
<P>
Communication costs are less by a factor of about <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img829.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img829.gif"> than in
Equation <A HREF="node45.html#eqlamatmulc" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#eqlamatmulc">4.4</A>.  Again, this benefit must be weighed
against the cost of converting matrices <em> A</em>
, <em> B</em>
, and
<em> C</em>
 into the layout required by this algorithm.  This analysis is left as
<A NAME=6062>&#160;</A>
an exercise.
<A NAME=6063>&#160;</A>
<P>
<A NAME=6064>&#160;</A>
<P>

<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2398 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2406 HREF="node46.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node46.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2404 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2408 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2409 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2407 HREF="node46.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node46.html">4.7 Summary</A>
<B>Up:</B> <A NAME=tex2html2405 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html">4 Putting Components Together</A>
<B> Previous:</B> <A NAME=tex2html2399 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html">4.5 Case Study: Tuple Space</A>
<BR><HR><P>
<P><ADDRESS>
<I>&#169 Copyright 1995 by <A href="msgs0.htm#6" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#6">Ian Foster</a></I>
</ADDRESS>
</BODY>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -