📄 node45.html
字号:
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>4.6 Case Study: Matrix Multiplication</TITLE>
</HEAD>
<BODY>
<meta name="description" value="4.6 Case Study: Matrix Multiplication">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a> <A NAME=tex2html2398 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2406 HREF="node46.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node46.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2404 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2408 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2409 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a> <BR>
<B> Next:</B> <A NAME=tex2html2407 HREF="node46.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node46.html">4.7 Summary</A>
<B>Up:</B> <A NAME=tex2html2405 HREF="node39.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node39.html">4 Putting Components Together</A>
<B> Previous:</B> <A NAME=tex2html2399 HREF="node44.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node44.html">4.5 Case Study: Tuple Space</A>
<BR><HR><P>
<H1><A NAME=SECTION02560000000000000000>4.6 Case Study: Matrix Multiplication</A></H1>
<P>
<A NAME=seclamm> </A>
<P>
<A NAME=5868> </A>
In our third case study,
<A NAME=5869> </A>
we use the example of matrix-matrix
<A NAME=5870> </A>
multiplication to illustrate issues that arise when developing data
<A NAME=5871> </A>
distribution neutral libraries. In particular, we consider the
<A NAME=5872> </A>
problem of developing a library to compute <em> C = A.B</em>
, where
<em> A</em>
, <em> B</em>
, and <em> C</em>
are dense matrices of size
<em> N</em>
<IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img780.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img780.gif"><em> N</em>
. (A <i> dense matrix</i> is a matrix in which most of the
entries are nonzero.) This matrix-matrix multiplication involves <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img781.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img781.gif">
operations, since for each element <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img782.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img782.gif"> of <em> C</em>
, we must
compute
<P><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img783.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img783.gif"><P>
<P>
We wish a library that will allow each of the arrays <em> A</em>
, <em> B</em>
,
and <em> C</em>
to be distributed over <em> P</em>
tasks in one of three ways:
blocked by row, blocked by column, or blocked by row and column. This
library may be defined with a subroutine interface suitable for
sequential composition in an SPMD program or with a channel interface
suitable for parallel or concurrent composition. The basic
algorithmic issue remains the same: Does the library need to
incorporate different algorithms for different distributions of
<em> A</em>
, <em> B</em>
, and <em> C</em>
, or should incoming data structures be
converted to a standard distribution before calling a single
algorithm?
<P>
<P><A NAME=6466> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img784.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img784.gif">
<BR><STRONG>Figure 4.10:</STRONG> <em> Matrix-matrix multiplication <em> A.B=C</em>
with matrices <em> A</em>
,
<em> B</em>
, and <em> C</em>
decomposed in one dimension. The components of
<em> A</em>
, <em> B</em>
, and <em> C</em>
allocated to a single task are shaded
black. During execution, this task requires all of matrix
<em> A</em>
(shown stippled).</em><A NAME=figlamat1> </A><BR>
<P><H2><A NAME=SECTION02561000000000000000>4.6.1 Parallel Matrix-Matrix Multiplication</A></H2>
<P>
<A NAME=secmodmm2> </A>
<P>
We start by examining algorithms for various distributions of <em> A</em>
,
<A NAME=5910> </A>
<em> B</em>
, and <em> C</em>
. We first consider a one-dimensional, columnwise
decomposition in which each task encapsulates corresponding columns
from <em> A</em>
, <em> B</em>
, and <em> C</em>
. One parallel algorithm makes each
task responsible for all computation associated with its <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img785.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img785.gif">. As
shown in Figure <A HREF="node45.html#figlamat1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#figlamat1">4.10</A>, each task requires all of matrix
<em> A</em>
in order to compute its <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img786.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img786.gif">. <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img787.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img787.gif"> data are required
from each of <em> P-1</em>
other tasks, giving the following per-processor
communication cost:
<P>
<P><A NAME=eqmm1d> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img788.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img788.gif"><P>
<P>
Note that as each task performs <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img789.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img789.gif"> computation, if
<em> N</em>
<IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img790.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img790.gif"><em> P</em>
, then the algorithm will have to transfer roughly
one word of data for each multiplication and addition performed.
Hence, the algorithm can be expected to be efficient only when
<em> N</em>
is much larger than <em> P</em>
or the cost of computation is much larger
than <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img791.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img791.gif">.
<P>
<P><A NAME=6514> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img792.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img792.gif">
<BR><STRONG>Figure 4.11:</STRONG> <em> Matrix-matrix multiplication <em> A.B=C</em>
with matrices <em> A</em>
,
<A NAME=6502> </A>
<em> B</em>
, and <em> C</em>
decomposed in two dimensions. The components of
<em> A</em>
, <em> B</em>
, and <em> C</em>
allocated to a single task are shaded
black. During execution, this task requires corresponding rows and
columns of matrix <em> A</em>
and <em> B</em>
, respectively
(shown stippled).</em><A NAME=figlamat2> </A><BR>
<P>
<P>
Next, we consider a two-dimensional decomposition of <em> A</em>
, <em> B</em>
,
and <em> C</em>
. As in the one-dimensional algorithm, we assume that a task
encapsulates corresponding elements of <em> A</em>
, <em> B</em>
, and
<em> C</em>
and that each task is responsible for all computation associated with
its <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img793.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img793.gif">. The computation of a single element <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img794.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img794.gif"> requires
an entire row <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img795.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img795.gif"> and column <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img796.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img796.gif"> of <em> A</em>
and <em> B</em>
,
respectively. Hence, as shown in Figure <A HREF="node45.html#figlamat2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node45.html#figlamat2">4.11</A>, the
computation performed within a single task requires the <em> A</em>
and
<em> B</em>
submatrices allocated to tasks in the same row and column,
respectively. This is a total of <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img797.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img797.gif"> data,
considerably less than in the one-dimensional algorithm.
<P>
<P><A NAME=6551> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img798.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img798.gif">
<BR><STRONG>Figure 4.12:</STRONG> <em> Matrix-matrix multiplication algorithm based on two-dimensional
decompositions. Each step involves three stages: (a) an
<em> A</em>
submatrix is broadcast to other tasks in the same row; (b) local
computation is performed; and (c) the <em> B</em>
submatrix is rotated
upwards within each column.</em><A NAME=figlamatmul> </A><BR>
<P>
<P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -