📄 14.html

📁 国外MPI教材
💻 HTML
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />	<style type="text/css">	body { font-family: Verdana, Arial, Helvetica, sans-serif;}	a.at-term {	font-style: italic; }	</style>	<title>Motivations for Multilevel Parallel Programming</title>	<meta name="Generator" content="ATutor">	<meta name="Keywords" content=""></head><body> <p> There are several reasons the developer of a parallel scientific application may want to consider using a multilevel parallel approach. These reasons are related to the desired performance of the application, the architecture of the system the application runs on, and the algorithms that the applications use. Most of these reasons assume that there is an existing message passing application upon which to base the multilevel version. </p>

<h3> Performance Considerations </h3>

<p> Scalability is possibly the most compelling performance argument for multilevel parallel programming. Several researchers have seen that for a fixed problem size, a multilevel parallel code will often scale to a higher processor count than either a message passing or directive-based code before the speedup gained from the increase in processor count is overwhelmed by communication overhead. One example of this is the Laplace application, which is discussed in the following chapters. </p>

<p> However, multilevel parallel programming is not a cure-all for scalability. In particular, message passing applications that stress a system's memory bandwidth to its limits are not always helped by a multilevel approach since adding a shared memory component to the application simply increases the pressure on the memory system. In addition, multilevel approaches may not be effective in message passing applications in which performance is limited by the speed of interconnect rather than the processor. </p>

<p> On NUMA systems and clusters of larger SMPs (8+ CPUs/node), another potential performance motivation is that of <em><a href="../glossary.html#load+balancing" target="body" class="at-term">load balancing</a></em> for message passing applications. 
If a message passing application has "hot spots" where some processes have much more computation to perform than others, they can be balanced to some degree by having the processes with larger volumes of computation spawn several threads to reduce the load imbalance. </p>

<h3> Architectural Considerations </h3>

<p> On some systems,architecture limitations force the use of multilevel parallel programming. For instance, on a few IBM SP3 installations there is a maximum limit of 4 MPI processes per 8- or 16-processor node because of a flaw in the SP's high performance network. To work around this, OpenMP is used to create multiple threads per MPI process in order to use all of the processors on the nodes that were allocated. </p>

<p>Multilevel programming can also be motivated by interactions between the applications' algorithms and the architecture. On systems where the local memory bandwidth is much greater than that of the message passing network, applications that require all-to-all communication (such as parallel FFTs) are often performance limited at higher processor count due to the O(N<sup>2</sup>) behavior of the bandwidth required. With a multilevel approach, the number of message passing processes can be lowered while retaining the same number of total processors used. A <a href="http://www.sc2000.org/techpapr/papers/pap.pap214.pdf">paper 
given at the Supercomputing 2000 conference</a> demonstrated that a hybrid MPI/OpenMP approach performed better than a pure MPI approach on IBM SPs with SMP nodes for the "FT" code from the <a href="http://www.nas.nasa.gov/Software/NPB/">NAS 
Parallel Benchmark suite</a>, which is exactly this type of application. </p>

<h3> Algorithm Considerations </h3>

<p> Some algorithms map more or less naturally onto a multilevel parallel approach, in that they have an outer structure that looks like a message passing code and an inner structure which can benefit from having compiler directives applied to it. For instance, many computational fluid dynamics codes now use a "multi-grid" or "chimera" approach in which the solution domain is broken into several interacting zones. In a parallel implementation of this approach, the interaction between zones is something that can be performed using message passing, while the computations 
within a zone can be accelerated using compiler directives. One of the first codes to be implemented using a multilevel chimera approach was the <a href="http://www.supercomp.org/sc97/proceedings/TECH/HATAY/INDEX.HTM">OVERFLOW</a> 
code from the NAS facility at NASA Ames. </p></body></html>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -