📄 node89.html
字号:
<html><!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 95.1 (Fri Jan 20 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>7.7 Performance Issues</TITLE>
</HEAD>
<BODY>
<meta name="description" value="7.7 Performance Issues">
<meta name="keywords" value="book">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
<BR> <HR><a href="msgs0.htm#2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#2"><img ALIGN=MIDDLE src="asm_color_tiny.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/asm_color_tiny.gif" alt="[DBPP]"></a> <A NAME=tex2html3024 HREF="node88.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node88.html"><IMG ALIGN=MIDDLE ALT="previous" SRC="previous_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/previous_motif.gif"></A> <A NAME=tex2html3032 HREF="node90.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node90.html"><IMG ALIGN=MIDDLE ALT="next" SRC="next_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/next_motif.gif"></A> <A NAME=tex2html3030 HREF="node82.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node82.html"><IMG ALIGN=MIDDLE ALT="up" SRC="up_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/up_motif.gif"></A> <A NAME=tex2html3034 HREF="node1.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node1.html"><IMG ALIGN=MIDDLE ALT="contents" SRC="contents_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/contents_motif.gif"></A> <A NAME=tex2html3035 HREF="node133.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node133.html"><IMG ALIGN=MIDDLE ALT="index" SRC="index_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/index_motif.gif"></A> <a href="msgs0.htm#3" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/tppmsgs/msgs0.htm#3"><img ALIGN=MIDDLE src="search_motif.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/search_motif.gif" alt="[Search]"></a> <BR>
<B> Next:</B> <A NAME=tex2html3033 HREF="node90.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node90.html">7.8 Case Study: Gaussian Elimination</A>
<B>Up:</B> <A NAME=tex2html3031 HREF="node82.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node82.html">7 High Performance Fortran</A>
<B> Previous:</B> <A NAME=tex2html3025 HREF="node88.html" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node88.html">7.6 Other HPF Features</A>
<BR><HR><P>
<H1><A NAME=SECTION03470000000000000000>7.7 Performance Issues</A></H1>
<P>
<A NAME=secdpperf> </A>
<P>
<A NAME=11386> </A>
The performance of an HPF program depends not only on the skill of the
programmer but also on the capabilities of the compiler, which in
effect generates the actual parallel program from a high-level
specification provided by the programmer. The structure and hence the
performance of this program may not be intuitively obvious to the
programmer. However, a good HPF compiler should provide feedback
identifying hard-to-parallelize components, and of course intuition
can be developed with experience.
<P>
Two major obstacles impact the efficient execution of an HPF
program: sequential bottlenecks and excessive communication costs. In
the following sections, we first examine the compilation process and then
discuss these two obstacles in turn.
<P>
<H2><A NAME=SECTION03471000000000000000>7.7.1 HPF Compilation</A></H2>
<P>
<A NAME=11388> </A>
Compilers for HPF and related languages generally proceed roughly as
<A NAME=11389> </A>
follows. Data decomposition statements are analyzed to determine the
<A NAME=11390> </A>
decomposition of each array in a program. Computation is then
partitioned across processors, typically (but not always) using the
owner computes rule.
This process allows nonlocal references, and hence
communication requirements, to be identified. Communication
operations are then optimized. In particular, an attempt is made to
move messages out of loops so as to reduce communication costs.
<P>
<P><A NAME=fighpf2> </A><IMG BORDER=0 ALIGN=BOTTOM ALT="" SRC="img983.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img983.gif"><P>
<P>
As an illustration of how an HPF compiler operates,
Program <A HREF="node89.html#fighpf2" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node89.html#fighpf2">7.6</A> gives the code that might be generated for
Program <A HREF="node85.html#proghpf1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node85.html#proghpf1">7.2</A>. Recall that Program <A HREF="node85.html#proghpf1" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node85.html#proghpf1">7.2</A>
implements a parallel algorithm based on a one-dimensional
decomposition of a two-dimensional finite-difference problem and
executes on four processors. The generated code is a mixture of F90
statements and calls to library routines that perform communication
operations. In this example, two such routines are called: <tt>
stencil_exchange_1d</tt> and <tt> reduce_real</tt>. The first routine
exchanges data with the processors handling neighboring parts of the
finite difference grid, and the second performs the reduction
operation required to compute a maximum difference. These routines
account for the communication requirements of the program.
<P>
In this example, communication costs are easy to determine. The
nearest-neighbor exchange will send two messages having a total size
of 200 words; the reduction will generate <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img984.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img984.gif"> communications,
each of size one word. Hence, total costs are <IMG BORDER=0 ALIGN=MIDDLE ALT="" SRC="img985.gif" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/img985.gif"> 202. As
in addition, this program decomposes computation evenly across
processors, it can be expected to execute with reasonable efficiency.
<P>
<H2><A NAME=SECTION03472000000000000000>7.7.2 Sequential Bottlenecks</A></H2>
<P>
A <em> sequential bottleneck
</em> occurs when a code fragment does not
<A NAME=11416> </A>
incorporate sufficient parallelism or when parallelism exists (in the
<A NAME=11417> </A>
sense that data dependencies do not prevent concurrent execution) but
cannot be detected by the compiler. In either case, the code fragment
<A NAME=11418> </A>
cannot be executed in parallel. Sequential bottlenecks of this sort
may not be serious if a program is intended to execute on only a small
number of processors, but they inevitably reduce a program's
scalability. More precisely, if some fraction
<em> 1/s</em>
of a program's total execution time executes sequentially, then
<A NAME=11420> </A>
Amdahl's law applies, and the maximum possible speedup that can be
achieved on a parallel computer is <em> s</em>
(Section <A HREF="node28.html#secamdahl" tppabs="http://www.dit.hcmut.edu.vn/books/system/par_anl/node28.html#secamdahl">3.2.1</A>).
<P>
An HPF compiler should provide information about constructs that it
was unable to parallelize. The programmer may then be able to
restructure the code in question to enable parallelization.
<P>
<H2><A NAME=SECTION03473000000000000000>7.7.3 Communication Costs</A></H2>
<P>
We next discuss a number of issues that affect the communication
<A NAME=11424> </A>
performance of HPF programs.
<A NAME=11425> </A>
<P>
<A NAME=11427> </A>
<em> Intrinsics.</em> Many F90 and HPF intrinsic functions combine data
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -