📄 node83.html
字号:
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>7.1 Data Parallelism</TITLE>
</HEAD>
<BODY>
<meta name="description" value="7.1 Data Parallelism">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a> <A NAME=tex2html2952 HREF="node82.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node82.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html2960 HREF="node84.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node84.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html2958 HREF="node82.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node82.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html2962 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html2963 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a> <BR>
<B> Next:</B> <A NAME=tex2html2961 HREF="node84.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node84.html">7.2 Fortran 90</A>
<B>Up:</B> <A NAME=tex2html2959 HREF="node82.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node82.html">7 High Performance Fortran</A>
<B> Previous:</B> <A NAME=tex2html2953 HREF="node82.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node82.html">7 High Performance Fortran</A>
<BR><HR><P>
<H1><A NAME=SECTION03410000000000000000>7.1 Data Parallelism</A></H1>
<P>
We first provide a general introduction to data parallelism and
data-parallel languages, focusing on concurrency, locality, and
algorithm design.
<P>
<H2><A NAME=SECTION03411000000000000000>7.1.1 Concurrency</A></H2>
<P>
<A NAME=10592> </A>
Depending on the programming language used, the data ensembles
operated on in a data-parallel program may be regular (e.g., an array)
or irregular (e.g., a tree or sparse matrix). In F90 and HPF, the
data structures operated on are arrays. In contrast, the
<A NAME=10593> </A>
data-parallel language pC++
allows programs to operate not only on
arrays but also on trees, sets, and other more complex data
structures.
<P>
<A NAME=10594> </A>
Concurrency may be implicit or may be expressed by using explicit
<A NAME=10595> </A>
parallel constructs. For example, the F90 array assignment statement
is an <em> explicitly
</em> parallel construct; we write
<PRE><TT>
<tt> A = B*C</tt> ! <tt> A</tt>, <tt> B</tt>, <tt> C</tt> are arrays
<P>
</TT></PRE>
to specify that each element of array <tt> A</tt> is to be assigned the
product of the corresponding elements of arrays <tt> B</tt> and <tt> C</tt>.
<A NAME=10606> </A>
This statement also implies <em> conformality
</em>; that is, the three
<A NAME=10608> </A>
arrays have the same size and shape.
In contrast, the following do-loop
<A NAME=10609> </A>
is <em> implicitly
</em> parallel: a compiler may be able to detect
that the various do-loop iterations are independent (meaning that one
iteration does not write a variable read or written by another) and
hence can be performed in parallel, but this detection requires some
analysis.
<P>
<PRE><TT>
<tt> do i = 1,m</tt>
<P>
<tt> do j = 1,n</tt>
<P>
<tt> A(i,j) = B(i,j)*C(i,j)</tt>
<P>
<tt> enddo</tt>
<P>
<tt> enddo</tt>
<P>
</TT></PRE>
<P>
A data-parallel program is a sequence of explicitly and implicitly
parallel statements. On a distributed-memory parallel computer,
compilation typically translates these statements into an SPMD
<A NAME=10618> </A>
program (Section <A HREF="node9.html#sec1other" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node9.html#sec1other">1.3.2</A>), in which each processor executes
the same code on a subset of the data structures. In many cases, the
compiler can construct this program by first partitioning data
structures into disjoint subdomains, one per processor, and then
<A NAME=10620> </A>
applying the owner computes rule to determine which operations should
be performed on each processor. This rule states that the computation
required to produce a datum is performed on the processor on which
that datum is located. However, compilers can also use different
techniques.
<P>
Compilation also introduces communication operations when computation
mapped to one processor requires data mapped to another processor. As
an example, consider the following program.
<P>
<PRE><TT>
<tt> real y, s, X(100)</tt> ! <tt> y</tt>, <tt> s</tt> scalars; <tt> X</tt> an array
<P>
<tt> X = X*y</tt> ! Multiply each <tt> X(i)</tt> by <tt> y</tt>
<P>
<tt> do i = 2,99</tt>
<P>
<tt> X(i) = (X(i-1) + X(i+1))/2</tt> ! Communication required
<P>
<tt> enddo</tt>
<P>
<tt> s = SUM(X)</tt> ! Communication required
<P>
</TT></PRE>
<P>
The communication requirements of this program depend on how the
variables <tt> X</tt>, <tt> y</tt>, and <tt> s</tt> are distributed over
processors. If <tt> X</tt> is distributed while <tt> y</tt> and <tt> s</tt> are
replicated, then the first assignment can proceed without
communication, with each <tt> X(i)</tt> being computed by the processor
that owns <tt> X(i)</tt>. The second assignment (in the do-loop) requires
communication: the processor computing <tt> X(i)</tt> requires the values
of <tt> X(i-1)</tt> and <tt> X(i+1)</tt>, which may be located on different
processors. The summation also requires communication.
<P>
<H2><A NAME=SECTION03412000000000000000>7.1.2 Locality</A></H2>
<P>
<A NAME=10646> </A>
Data placement is an essential part of a data-parallel algorithm, since
the mapping of data to processors determines the <em> locality
</em>
of data references and hence, to a large extent, the performance of a
parallel program. For example, the simple array assignment <tt>
A = B*C</tt> either can proceed without any communication or can require
communication for every assignment, depending on whether corresponding
elements of the arrays <tt> A</tt>, <tt> B</tt>, and <tt> C</tt> are located on
the same or different processors.
<P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -