📄 67.html

📁 国外MPI教材
💻 HTML
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />	<style type="text/css">	body { font-family: Verdana, Arial, Helvetica, sans-serif;}	a.at-term {	font-style: italic; }	</style>	<title>Applying MPI to the Laplace Solver</title>	<meta name="Generator" content="ATutor">	<meta name="Keywords" content=""></head><body> <p>As opposed to the case of OpenMP, where the compiler does most of the hard work of parallelization, MPI requires much more work on the part of the developer to parallelize an application.  Since no data is shared between MPI processes except through the MPI communication routines, it is up to the developer to determine the patterns of communication.  The upside to this is that the developer also has much more control over how an application is parallelized, which can
lead to much better performance and scalability.</p>

<h3>Domain Decomposition</h3>

<p>Fundamental to parallelizing many applications with MPI is the concept of <em><a href="../glossary.html#domain+decomposition" target="body" class="at-term">domain decomposition</a></em>, in which the overall domain of the solution is broken into pieces and the calculations for each piece are assigned to an MPI process.  Consider again the diagram of the domain shown in Chapter 2:</p>


  <img src="domain.jpg">


<p>This domain can be decomposed in three ways.  The first is a 1D partitioning along the <i>y</i>-axis (a.k.a. the <code>j</code> index):</p>


  <img src="domain-1d-horz.jpg">


<p>Another way of decomposing the domain is a 1D partitioning along the <i>x</i>-axis (a.k.a. the <code>i</code> index):
</p>


  <img src="domain-1d-vert.jpg">


<p>The third way of decomposing the domain is a 2D decomposition where the domain is partitioned along both axes:
</p>


  <img src="domain-2d.jpg">


<p>In all three diagrams, the number in each stripe or block represents the rank of the MPI process responsible for that area.  MPI's virtual topology functionality can also be used to assign areas to MPI processes and determine nearest neighbor connectivity.
</p>

<p>A second concept that often goes hand in hand with domain  decomposition is <em>ghost cells</em>.  In most numerical solutions of differential equations, the solution contains a dependency on neighboring points.  With a decomposed domain, the solution on the edge of a particular subdomain will usually depend on values in a neighboring subdomain owned by a different MPI process.  To handle this, each MPI process maintains ghost cells on the edges of its subdomain which hold the solution values from neighboring MPI processes.  This requires pairs of neighboring MPI processes to exchange solution values along the the boundary between them.
</p>

<h3>Algorithmic Changes and Implementation Details</h3>

<p>Domain decomposition requires a number of algorithmic changes to the approach used in the serial implementation.  First, each MPI process must determine for what ranges in the <code>i</code> and <code>j</code> indices it is responsible, as well as what MPI processes are responsible for neighboring areas.  The latter can be achieved using MPI's virtual topology functionality.  Second, the main computation loop only loops over the ranges in <code>i</code> and <code>j</code> for which the MPI process is responsible.  Third, determining the maximum change from one iteration to the next becomes a two step process; each MPI process first determines its own local maximum change, and then a global maximum among all the MPI processes is computed using the MPI_REDUCE call. Finally, to maintain current values in the ghost cells, the solution values along boundaries shared with other MPI processes must be sent to the neighboring processes, and values received from those neighboring processes must be placed in the appropriate ghost cells. There are a number of ways these data exchanges can be implemented; the code below uses nonblocking sends (MPI_ISEND) followed by blocking receives (MPI_REDUCE) and a wait for the completion of the send (MPI_WAIT).  However, these communications can also be implemented using a blocking exchange (MPI_SENDRECV), or by preposting nonblocking receives (MPI_IREC) followed by blocking sends (MPI_SEND).
</p>

<p>The particular domain decomposition used for the implementation below is that of a 1D decomposition along the <i>y</i> axis (<code>j</code> index).  This is advantageous due to how Fortran 90 structures arrays in memory; Fortran is column-major, so array elements in successive values of <code>i</code> are contiguous in memory.  By keeping the <code>i</code> index unbroken, the compiler is more readily able to generate prefetch instructions and avoid stalling on memory references.
</p>

<h3>MPI Implementation</h3>

<pre><code>
      program lpmpi
      include 'mpif.h'
      integer imax,jmax,im1,im2,jm1,jm2,it,itmax
      parameter (imax=2001,jmax=2001)
      parameter (im1=imax-1,im2=imax-2,jm1=jmax-1,jm2=jmax-2)
      parameter (itmax=100)
      real*8 u(imax,jmax),du(imax,jmax),umax,dumax,tol,pi
      parameter (umax=10.0,tol=1.0e-6,pi=3.14159)
! Additional MPI parameters
      integer istart,iend,jstart,jend
      integer size,rank,ierr,istat(MPI_STATUS_SIZE),mpigrid,length
      integer grdrnk,dims(1),gloc(1),up,down,isize,jsize
      integer ureq,dreq
      integer ustat(MPI_STATUS_SIZE),dstat(MPI_STATUS_SIZE)
      real*8 tstart,tend,gdumax
      logical cyclic(1)
      real*8 uibuf(imax),uobuf(imax),dibuf(imax),dobuf(imax)

! Initialize
      call MPI_INIT(ierr)
      call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
      call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
! 1D linear topology
      dims(1)=size
      cyclic(1)=.FALSE.
      call MPI_CART_CREATE(MPI_COMM_WORLD,1,dims,cyclic,.true.,mpigrid &
          ,ierr)
      call MPI_COMM_RANK(mpigrid,grdrnk,ierr)
      call MPI_CART_COORDS(mpigrid,grdrnk,1,gloc,ierr)
      call MPI_CART_SHIFT(mpigrid,0,1,down,up,ierr)
      istart=2
      iend=imax-1
      jsize=jmax/size
      jstart=gloc(1)*jsize+1
      if (jstart.LE.1) jstart=2
      jend=(gloc(1)+1)*jsize
      if (jend.GE.jmax) jend=jmax-1
      do j=jstart-1,jend+1
         do i=istart-1,iend+1
            u(i,j)=0.0
            du(i,j)=0.0
         enddo
         u(imax,j)=umax*sin(pi*float(j-1)/float(jmax-1))
      enddo

! Main computation loop
      call MPI_BARRIER(MPI_COMM_WORLD,ierr)
      tstart=MPI_WTIME()
      do it=1,itmax
        dumax=0.0
         do j=jstart,jend
            do i=istart,iend
               du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j)
               dumax=max(dumax,abs(du(i,j)))
            enddo
         enddo
         do j=jstart,jend
            do i=istart,iend
               u(i,j)=u(i,j)+du(i,j)
            enddo
         enddo
! Compute the overall residual
         call MPI_REDUCE(dumax,gdumax,1,MPI_REAL8,MPI_MAX,0 &
             ,MPI_COMM_WORLD,ierr)
! Send phase
         if (down.NE.MPI_PROC_NULL) then
            j=1
            do i=istart,iend
               dobuf(j)=u(i,jstart)
               j=j+1
            enddo
            length=j-1
            call MPI_ISEND(dobuf,length,MPI_REAL8,down,it,mpigrid, &
                dreq,ierr)
         endif
         if (up.NE.MPI_PROC_NULL) then
            j=1
            do i=istart,iend
               uobuf(j)=u(i,jend)
               j=j+1
            enddo
            length=j-1
            call MPI_ISEND(uobuf,length,MPI_REAL8,up,it,mpigrid, &
                ureq,ierr)
         endif
! Receive phase
         if (down.NE.MPI_PROC_NULL) then
            length=iend-istart+1
            call MPI_RECV(dibuf,length,MPI_REAL8,down,it, &
                mpigrid,istat,ierr)
            call MPI_WAIT(dreq,dstat,ierr)
            j=1
            do i=istart,iend
               u(i,jstart-1)=dibuf(j)
               j=j+1
            enddo
         endif
         if (up.NE.MPI_PROC_NULL) then
            length=iend-istart+1
            call MPI_RECV(uibuf,length,MPI_REAL8,up,it, &
                mpigrid,istat,ierr)
            call MPI_WAIT(ureq,ustat,ierr)
            j=1
            do i=istart,iend
               u(i,jend+1)=uibuf(j)
               j=j+1
            enddo
         endif
         write (rank+10,*) rank,it,dumax,gdumax
         if (rank.eq.0) write (1,*) it,gdumax
      enddo
      call MPI_BARRIER(MPI_COMM_WORLD,ierr)
      tend=MPI_WTIME()
      if (rank.EQ.0) then
         write(*,*) 'Calculation took ',tend-tstart,'s. on ',size, &
             ' MPI processes'
      endif
      call MPI_FINALIZE(ierr)
      stop
      end</code></pre></body></html>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -