📄 78.html
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <style type="text/css"> body { font-family: Verdana, Arial, Helvetica, sans-serif;} a.at-term { font-style: italic; } </style> <title>Applying OpenMP to the MPI Laplace Solver</title> <meta name="Generator" content="ATutor"> <meta name="Keywords" content=""></head><body> <p>If we look closely at the MPI implementation of the Laplace solver, it has the same loop structures as the serial version of the code except with different index bounds. Thus, in principle we can apply OpenMP to the same two loop structures as in the serial code:</p>
<ul>
<li><em>The outer (<var>j</var>) initialization loop at the beginning of the program</em>: While this loop is run only once, it is necessary to run it in parallel to ensure proper memory placement on ccNUMA systems like the SGI Origin 2000 and 3000. </li>
<li> <p><em>The outer (<var>j</var>) spacial loop within the main iterative loop</em>: This loop comprises the bulk of the work in the program and thus should be parallelized if possible. This loop requires a REDUCTION clause due to the running comparisons with <var>dumax</var>. </p>
</li>
</ul>
<h3> Caveats and Implementation Details </h3>
<p>Unfortunately, things are not quite so simple as putting directives around the loops structures mentioned above. Care needs to be taken to make sure that no MPI communication routines are called within an OpenMP parallel region. There are two simple ways to do this. The first is to only create OpenMP parallel regions around the loops being parallelized with OpenMP, and keep the rest of the program single-threaded within an MPI process; this is the approach taken below. Another way to accomplish the same thing is to have a single OpenMP parallel region for the entire program, and place all MPI routines in OpenMP SINGLE or MASTER regions.</p>
<p>Another feature of the implementation shown below is that it has the number of threads per MPI process hardcoded in it, instead of using the <var>OMP_NUM_THREADS</var> environment variable. This is done because on many clusters of SMPs, the MPI environment does not cause environment variables to be propagated to all nodes in a job.</p>
<h3> Hybrid MPI/OpenMP Implementation </h3>
<code><pre> program lpmlp
include 'mpif.h'
integer imax,jmax,im1,im2,jm1,jm2,it,itmax
parameter (imax=2001,jmax=2001)
parameter (im1=imax-1,im2=imax-2,jm1=jmax-1,jm2=jmax-2)
parameter (itmax=100)
real*8 u(imax,jmax),du(imax,jmax),umax,dumax,tol,pi
parameter (umax=10.0,tol=1.0e-6,pi=3.14159)
! Additional MPI parameters
integer istart,iend,jstart,jend
integer size,rank,ierr,istat(MPI_STATUS_SIZE),mpigrid,length
integer grdrnk,dims(1),gloc(1),up,down,isize,jsize
integer ureq,dreq
integer ustat(MPI_STATUS_SIZE),dstat(MPI_STATUS_SIZE)
real*8 tstart,tend,gdumax
logical cyclic(1)
real*8 uibuf(imax),uobuf(imax),dibuf(imax),dobuf(imax)
! OpenMP parameters
integer nthrds
! Initialize
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
! 1D linear topology
dims(1)=size
cyclic(1)=.FALSE.
call MPI_CART_CREATE(MPI_COMM_WORLD,1,dims,cyclic,.true.,mpigrid
+ ,ierr)
call MPI_COMM_RANK(mpigrid,grdrnk,ierr)
call MPI_CART_COORDS(mpigrid,grdrnk,1,gloc,ierr)
call MPI_CART_SHIFT(mpigrid,0,1,down,up,ierr)
istart=2
iend=imax-1
jsize=jmax/size
jstart=gloc(1)*jsize+1
if (jstart.LE.1) jstart=2
jend=(gloc(1)+1)*jsize
if (jend.GE.jmax) jend=jmax-1
nthrds=2
call omp_set_num_threads(nthrds)
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j)
! Initialize -- done in parallel to force "first-touch" distribution
! on ccNUMA machines (i.e. O2k)
!$OMP DO
do j=jstart-1,jend+1
do i=istart-1,iend+1
u(i,j)=0.0
du(i,j)=0.0
enddo
u(imax,j)=umax*sin(pi*float(j-1)/float(jmax-1))
enddo
!$OMP END DO
!$OMP END PARALLEL
! Main computation loop
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
tstart=MPI_WTIME()
do it=1,itmax
! We have to keep the OpenMP and MPI calls segregated...
call omp_set_num_threads(nthrds)
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j)
!$OMP MASTER
dumax=0.0
!$OMP END MASTER
!$OMP DO REDUCTION(max:dumax)
do j=jstart,jend
do i=istart,iend
du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j)
dumax=max(dumax,abs(du(i,j)))
enddo
enddo
!$OMP END DO
!$OMP DO
do j=jstart,jend
do i=istart,iend
u(i,j)=u(i,j)+du(i,j)
enddo
enddo
!$OMP END DO
!$OMP END PARALLEL
! Compute the overall residual
call MPI_REDUCE(dumax,gdumax,1,MPI_REAL8,MPI_MAX,0
+ ,MPI_COMM_WORLD,ierr)
! Send phase
if (down.NE.MPI_PROC_NULL) then
j=1
do i=istart,iend
dobuf(j)=u(i,jstart)
j=j+1
enddo
length=j-1
call MPI_ISEND(dobuf,length,MPI_REAL8,down,it,mpigrid,
+ dreq,ierr)
endif
if (up.NE.MPI_PROC_NULL) then
j=1
do i=istart,iend
uobuf(j)=u(i,jend)
j=j+1
enddo
length=j-1
call MPI_ISEND(uobuf,length,MPI_REAL8,up,it,mpigrid,
+ ureq,ierr)
endif
! Receive phase
if (down.NE.MPI_PROC_NULL) then
length=iend-istart+1
call MPI_RECV(dibuf,length,MPI_REAL8,down,it,
+ mpigrid,istat,ierr)
call MPI_WAIT(dreq,dstat,ierr)
j=1
do i=istart,iend
u(i,jstart-1)=dibuf(j)
j=j+1
enddo
endif
if (up.NE.MPI_PROC_NULL) then
length=iend-istart+1
call MPI_RECV(uibuf,length,MPI_REAL8,up,it,
+ mpigrid,istat,ierr)
call MPI_WAIT(ureq,ustat,ierr)
j=1
do i=istart,iend
u(i,jend+1)=uibuf(j)
j=j+1
enddo
endif
write (rank+10,*) rank,it,dumax,gdumax
if (rank.eq.0) write (1,*) it,gdumax
enddo
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
tend=MPI_WTIME()
if (rank.EQ.0) then
write(*,*) 'Calculation took ',tend-tstart,'s. on ',size,
+ ' MPI processes'
+ ,' with ',nthrds,' OpenMP threads per process'
endif
call MPI_FINALIZE(ierr)
stop
end</pre></code></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -