📄 40.html

📁 国外MPI教材
💻 HTML
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />	<style type="text/css">	body { font-family: Verdana, Arial, Helvetica, sans-serif;}	a.at-term {	font-style: italic; }	</style>	<title>Applying OpenMP to the Laplace Solver Code</title>	<meta name="Generator" content="ATutor">	<meta name="Keywords" content=""></head><body> <p>If we consider the cache-friendly implementation of the Laplace solver in light of the preceeding chapter on OpenMP, there are two loop structures which make sense to be parallelized using OpenMP directives:
  <ul>
    
  <li> 
    <p><em>The outer (<code>j</code>) initialization loop at the beginning of the program</em>: While this loop is only run once, it is necessary to run it in parallel to ensure proper memory placement on <em><a href="../glossary.html#ccNUMA" target="body" class="at-term">ccNUMA</a></em> systems like the SGI Origin 2000 and 3000. </p>
  </li>
  <li> 
    <p><em>The outer (<code>j</code>) spacial loop within the main iterative loop</em>: This loop comprises the bulk of the work in the program and thus should be parallelized if possible. This loop requires a REDUCTION
clause due to the running comparisons with <var>dumax</var>. </p>
  </li>
  </ul>
  <p>While there are other loop structures in the code, parallelizing those could do more harm than good.  For instance, the inner (<var>i</var>) loops in the initialization and main loop sections could be parallelized in the same way as the outer (<var>j</var>) loops, but doing this would likely break stride-1 memory access patterns and lower cache reuse, causing a drop in performance.  The iterative (<var>it</var>) loop cannot be parallelized, because one iteration in that loop depends on the previous iteration.</p>

<h3> OpenMP Implementation </h3>

<pre><code>      program lpomp
      integer imax,jmax,im1,jm1,it,itmax
      parameter (imax=2001,jmax=2001)
      parameter (im1=imax-1,jm1=jmax-1)
      parameter (itmax=100)
      real*8 u(imax,jmax),du(imax,jmax),umax,dumax
      parameter (umax=10.0)

!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j)
! Initialize -- done in parallel to force "first-touch" distribution
! on ccNUMA machines (i.e. O2k)

!$OMP DO
      do j=1,jmax
         do i=1,imax-1
            u(i,j)=0.0
            du(i,j)=0.0
         enddo
         u(imax,j)=umax
      enddo
!$OMP END DO 

! Main computation loop
      do it=1,itmax

!$OMP SINGLE
         dumax=0.0
!$OMP END SINGLE

!$OMP DO REDUCTION (max:dumax)
         do j=2,jm1
            do i=2,im1
               du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j)
               dumax=max(dumax,abs(du(i,j)))
            enddo
         enddo
!$OMP END DO

!$OMP DO
         do j=2,jm1
            do i=2,im1
               u(i,j)=u(i,j)+du(i,j)
            enddo
         enddo
!$OMP END DO NOWAIT

!$OMP MASTER
         write (1,*) it,dumax
!$OMP END MASTER

!$OMP BARRIER
      enddo

!$OMP END PARALLEL
      stop
      end</code></pre>
<p>
To see the C version of this code click on 
<a href="mlp.lpomp.html" target="_blank">C Code</a>
<p>.</body></html>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -