📄 78.html

📁 国外MPI教材
💻 HTML
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />	<style type="text/css">	body { font-family: Verdana, Arial, Helvetica, sans-serif;}	a.at-term {	font-style: italic; }	</style>	<title>Applying OpenMP to the MPI Laplace Solver</title>	<meta name="Generator" content="ATutor">	<meta name="Keywords" content=""></head><body> <p>If we look closely at the MPI implementation of the Laplace solver, it has the same loop structures as the serial version of the code except with different index bounds.  Thus, in principle we can apply OpenMP to the same two loop structures as in the serial code:</p>

  <ul>   
  <li><em>The outer (<var>j</var>) initialization loop at the beginning of the program</em>: While this loop is run only once, it is necessary to run it in parallel to ensure proper memory placement on ccNUMA systems like the SGI Origin 2000 and 3000.   </li>

  <li> <p><em>The outer (<var>j</var>) spacial loop within the main iterative loop</em>: This loop comprises the bulk of the work in the program and thus should be parallelized if possible. This loop requires a REDUCTION clause due to the running comparisons with <var>dumax</var>. </p>
  </li>
  </ul>

<h3> Caveats and Implementation Details </h3>

<p>Unfortunately, things are not quite so simple as putting directives around the loops structures mentioned above.  Care needs to be taken to make sure that no MPI communication routines are called within an OpenMP parallel region.  There are two simple ways to do this.  The first is to only create OpenMP parallel regions around the loops being parallelized with OpenMP, and keep the rest of the program single-threaded within an MPI process; this is the approach taken below.  Another way to accomplish the same thing is to have a single OpenMP parallel region for the entire program, and place all MPI routines in OpenMP SINGLE or MASTER regions.</p>

<p>Another feature of the implementation shown below is that it has the number of threads per MPI process hardcoded in it, instead of using the <var>OMP_NUM_THREADS</var> environment variable.  This is done because on many clusters of SMPs, the MPI environment does not cause environment variables to be propagated to all nodes in a job.</p>

<h3> Hybrid MPI/OpenMP Implementation </h3>

<code><pre>      program lpmlp
      include 'mpif.h'
      integer imax,jmax,im1,im2,jm1,jm2,it,itmax
      parameter (imax=2001,jmax=2001)
      parameter (im1=imax-1,im2=imax-2,jm1=jmax-1,jm2=jmax-2)
      parameter (itmax=100)
      real*8 u(imax,jmax),du(imax,jmax),umax,dumax,tol,pi
      parameter (umax=10.0,tol=1.0e-6,pi=3.14159)
! Additional MPI parameters
      integer istart,iend,jstart,jend
      integer size,rank,ierr,istat(MPI_STATUS_SIZE),mpigrid,length
      integer grdrnk,dims(1),gloc(1),up,down,isize,jsize
      integer ureq,dreq
      integer ustat(MPI_STATUS_SIZE),dstat(MPI_STATUS_SIZE)
      real*8 tstart,tend,gdumax
      logical cyclic(1)
      real*8 uibuf(imax),uobuf(imax),dibuf(imax),dobuf(imax)
! OpenMP parameters
      integer nthrds

! Initialize
      call MPI_INIT(ierr)
      call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
      call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
! 1D linear topology
      dims(1)=size
      cyclic(1)=.FALSE.
      call MPI_CART_CREATE(MPI_COMM_WORLD,1,dims,cyclic,.true.,mpigrid
     +     ,ierr)
      call MPI_COMM_RANK(mpigrid,grdrnk,ierr)
      call MPI_CART_COORDS(mpigrid,grdrnk,1,gloc,ierr)
      call MPI_CART_SHIFT(mpigrid,0,1,down,up,ierr)
      istart=2
      iend=imax-1
      jsize=jmax/size
      jstart=gloc(1)*jsize+1
      if (jstart.LE.1) jstart=2
      jend=(gloc(1)+1)*jsize
      if (jend.GE.jmax) jend=jmax-1
      nthrds=2
      call omp_set_num_threads(nthrds)
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j)
! Initialize -- done in parallel to force "first-touch" distribution
! on ccNUMA machines (i.e. O2k)
!$OMP DO
      do j=jstart-1,jend+1
         do i=istart-1,iend+1
            u(i,j)=0.0
            du(i,j)=0.0
         enddo
         u(imax,j)=umax*sin(pi*float(j-1)/float(jmax-1))
      enddo
!$OMP END DO
!$OMP END PARALLEL

! Main computation loop
      call MPI_BARRIER(MPI_COMM_WORLD,ierr)
      tstart=MPI_WTIME()
      do it=1,itmax
! We have to keep the OpenMP and MPI calls segregated...
        call omp_set_num_threads(nthrds)
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j)
!$OMP MASTER
        dumax=0.0
!$OMP END MASTER
!$OMP DO REDUCTION(max:dumax)
         do j=jstart,jend
            do i=istart,iend
               du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j)
               dumax=max(dumax,abs(du(i,j)))
            enddo
         enddo
!$OMP END DO
!$OMP DO
         do j=jstart,jend
            do i=istart,iend
               u(i,j)=u(i,j)+du(i,j)
            enddo
         enddo
!$OMP END DO
!$OMP END PARALLEL
! Compute the overall residual
         call MPI_REDUCE(dumax,gdumax,1,MPI_REAL8,MPI_MAX,0
     +        ,MPI_COMM_WORLD,ierr)

! Send phase
         if (down.NE.MPI_PROC_NULL) then
            j=1
            do i=istart,iend
               dobuf(j)=u(i,jstart)
               j=j+1
            enddo
            length=j-1
            call MPI_ISEND(dobuf,length,MPI_REAL8,down,it,mpigrid,
     +           dreq,ierr)
         endif
         if (up.NE.MPI_PROC_NULL) then
            j=1
            do i=istart,iend
               uobuf(j)=u(i,jend)
               j=j+1
            enddo
            length=j-1
            call MPI_ISEND(uobuf,length,MPI_REAL8,up,it,mpigrid,
     +           ureq,ierr)
         endif
! Receive phase
         if (down.NE.MPI_PROC_NULL) then
            length=iend-istart+1
            call MPI_RECV(dibuf,length,MPI_REAL8,down,it,
     +           mpigrid,istat,ierr)
            call MPI_WAIT(dreq,dstat,ierr)
            j=1
            do i=istart,iend
               u(i,jstart-1)=dibuf(j)
               j=j+1
            enddo
         endif
         if (up.NE.MPI_PROC_NULL) then
            length=iend-istart+1
            call MPI_RECV(uibuf,length,MPI_REAL8,up,it,
     +           mpigrid,istat,ierr)
            call MPI_WAIT(ureq,ustat,ierr)
            j=1
            do i=istart,iend
               u(i,jend+1)=uibuf(j)
               j=j+1
            enddo
         endif
         write (rank+10,*) rank,it,dumax,gdumax
         if (rank.eq.0) write (1,*) it,gdumax
      enddo
      call MPI_BARRIER(MPI_COMM_WORLD,ierr)
      tend=MPI_WTIME()
      if (rank.EQ.0) then
         write(*,*) 'Calculation took ',tend-tstart,'s. on ',size,
     +        ' MPI processes'
     +        ,' with ',nthrds,' OpenMP threads per process'
      endif
      call MPI_FINALIZE(ierr)
      stop
      end</pre></code></body></html>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -