📄 node45.html

📁 Design and building parallel program
💻 HTML
📖 第 1 页 / 共 2 页
字号:
12 下一页
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>4.6 Case Study: Matrix Multiplication</TITLE>
</HEAD>
<BODY>
<meta name="description" value="4.6 Case Study: Matrix Multiplication">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
 <BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a>    <A NAME=tex2html2398 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2406 HREF="node46.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node46.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2404 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2408 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2409 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a>   <BR>
<B> Next:</B> <A NAME=tex2html2407 HREF="node46.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node46.html">4.7 Summary</A>
<B>Up:</B> <A NAME=tex2html2405 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html">4 Putting Components Together</A>
<B> Previous:</B> <A NAME=tex2html2399 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html">4.5 Case Study: Tuple Space</A>
<BR><HR><P>
<H1><A NAME=SECTION02560000000000000000>4.6 Case Study: Matrix Multiplication</A></H1>
<P>
<A NAME=seclamm>&#160;</A>
<P>
<A NAME=5868>&#160;</A>
In our third case study,
<A NAME=5869>&#160;</A>
we use the example of matrix-matrix
<A NAME=5870>&#160;</A>
multiplication to illustrate issues that arise when developing data
<A NAME=5871>&#160;</A>
distribution neutral libraries.  In particular, we consider the
<A NAME=5872>&#160;</A>
problem of developing a library to compute <em> C = A.B</em>
, where
<em> A</em>
, <em> B</em>
, and <em> C</em>
 are dense matrices of size
<em> N</em>
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img780.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img780.gif"><em> N</em>
.  (A <i> dense matrix</i> is a matrix in which most of the
entries are nonzero.)  This matrix-matrix multiplication involves <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img781.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img781.gif">
operations, since for each element <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img782.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img782.gif"> of <em> C</em>
, we must
compute
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img783.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img783.gif"><P>
<P>
We wish a library that will allow each of the arrays <em> A</em>
, <em> B</em>
,
and <em> C</em>
 to be distributed over <em> P</em>
 tasks in one of three ways:
blocked by row, blocked by column, or blocked by row and column.  This
library may be defined with a subroutine interface suitable for
sequential composition in an SPMD program or with a channel interface
suitable for parallel or concurrent composition.  The basic
algorithmic issue remains the same: Does the library need to
incorporate different algorithms for different distributions of
<em> A</em>
, <em> B</em>
, and <em> C</em>
, or should incoming data structures be
converted to a standard distribution before calling a single
algorithm?
<P>
<P><A NAME=6466>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img784.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img784.gif">
<BR><STRONG>Figure 4.10:</STRONG> <em> Matrix-matrix multiplication <em> A.B=C</em>
 with matrices <em> A</em>
,
<em> B</em>
, and <em> C</em>
 decomposed in one dimension.  The components of
<em> A</em>
, <em> B</em>
, and <em> C</em>
 allocated to a single task are shaded
black.  During execution, this task requires all of matrix
<em> A</em>
 (shown stippled).</em><A NAME=figlamat1>&#160;</A><BR>
<P><H2><A NAME=SECTION02561000000000000000>4.6.1 Parallel Matrix-Matrix Multiplication</A></H2>
<P>
<A NAME=secmodmm2>&#160;</A>
<P>
We start by examining algorithms for various distributions of <em> A</em>
,
<A NAME=5910>&#160;</A>
<em> B</em>
, and <em> C</em>
.  We first consider a one-dimensional, columnwise
decomposition in which each task encapsulates corresponding columns
from <em> A</em>
, <em> B</em>
, and <em> C</em>
.  One parallel algorithm makes each
task responsible for all computation associated with its <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img785.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img785.gif">.  As
shown in Figure <A HREF="node45.html#figlamat1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#figlamat1">4.10</A>, each task requires all of matrix
<em> A</em>
 in order to compute its <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img786.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img786.gif">.  <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img787.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img787.gif"> data are required
from each of <em> P-1</em>
 other tasks, giving the following per-processor
communication cost:
<P>
<P><A NAME=eqmm1d>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img788.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img788.gif"><P>
<P>
Note that as each task performs <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img789.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img789.gif"> computation, if
<em> N</em>
<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img790.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img790.gif"><em> P</em>
, then the algorithm will have to transfer roughly
one word of data for each multiplication and addition performed.
Hence, the algorithm can be expected to be efficient only when
<em> N</em>
 is much larger than <em> P</em>
 or the cost of computation is much larger
than <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img791.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img791.gif">.
<P>
<P><A NAME=6514>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img792.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img792.gif">
<BR><STRONG>Figure 4.11:</STRONG> <em> Matrix-matrix multiplication <em> A.B=C</em>
 with matrices <em> A</em>
,
<A NAME=6502>&#160;</A>
<em> B</em>
, and <em> C</em>
 decomposed in two dimensions.  The components of
<em> A</em>
, <em> B</em>
, and <em> C</em>
 allocated to a single task are shaded
black.  During execution, this task requires corresponding rows and
columns of matrix <em> A</em>
 and <em> B</em>
, respectively
(shown stippled).</em><A NAME=figlamat2>&#160;</A><BR>
<P>
<P>
Next, we consider a two-dimensional decomposition of <em> A</em>
, <em> B</em>
,
and <em> C</em>
.  As in the one-dimensional algorithm, we assume that a task
encapsulates corresponding elements of <em> A</em>
, <em> B</em>
, and
<em> C</em>
 and that each task is responsible for all computation associated with
its <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img793.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img793.gif">.  The computation of a single element <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img794.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img794.gif"> requires
an entire row <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img795.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img795.gif"> and column <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img796.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img796.gif"> of <em> A</em>
 and <em> B</em>
,
respectively.  Hence, as shown in Figure <A HREF="node45.html#figlamat2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#figlamat2">4.11</A>, the
computation performed within a single task requires the <em> A</em>
 and
<em> B</em>
 submatrices allocated to tasks in the same row and column,
respectively.  This is a total of <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img797.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img797.gif"> data,
considerably less than in the one-dimensional algorithm.
<P>
<P><A NAME=6551>&#160;</A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img798.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img798.gif">
<BR><STRONG>Figure 4.12:</STRONG> <em> Matrix-matrix multiplication algorithm based on two-dimensional
decompositions.  Each step involves three stages:  (a) an
<em> A</em>
 submatrix is broadcast to other tasks in the same row;  (b) local
computation is performed; and (c) the <em> B</em>
 submatrix is rotated
upwards within each column.</em><A NAME=figlamatmul>&#160;</A><BR>
<P>
<P>
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -