📄 40.html
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <style type="text/css"> body { font-family: Verdana, Arial, Helvetica, sans-serif;} a.at-term { font-style: italic; } </style> <title>Applying OpenMP to the Laplace Solver Code</title> <meta name="Generator" content="ATutor"> <meta name="Keywords" content=""></head><body> <p>If we consider the cache-friendly implementation of the Laplace solver in light of the preceeding chapter on OpenMP, there are two loop structures which make sense to be parallelized using OpenMP directives:
<ul>
<li>
<p><em>The outer (<code>j</code>) initialization loop at the beginning of the program</em>: While this loop is only run once, it is necessary to run it in parallel to ensure proper memory placement on <em><a href="../glossary.html#ccNUMA" target="body" class="at-term">ccNUMA</a></em> systems like the SGI Origin 2000 and 3000. </p>
</li>
<li>
<p><em>The outer (<code>j</code>) spacial loop within the main iterative loop</em>: This loop comprises the bulk of the work in the program and thus should be parallelized if possible. This loop requires a REDUCTION
clause due to the running comparisons with <var>dumax</var>. </p>
</li>
</ul>
<p>While there are other loop structures in the code, parallelizing those could do more harm than good. For instance, the inner (<var>i</var>) loops in the initialization and main loop sections could be parallelized in the same way as the outer (<var>j</var>) loops, but doing this would likely break stride-1 memory access patterns and lower cache reuse, causing a drop in performance. The iterative (<var>it</var>) loop cannot be parallelized, because one iteration in that loop depends on the previous iteration.</p>
<h3> OpenMP Implementation </h3>
<pre><code> program lpomp
integer imax,jmax,im1,jm1,it,itmax
parameter (imax=2001,jmax=2001)
parameter (im1=imax-1,jm1=jmax-1)
parameter (itmax=100)
real*8 u(imax,jmax),du(imax,jmax),umax,dumax
parameter (umax=10.0)
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j)
! Initialize -- done in parallel to force "first-touch" distribution
! on ccNUMA machines (i.e. O2k)
!$OMP DO
do j=1,jmax
do i=1,imax-1
u(i,j)=0.0
du(i,j)=0.0
enddo
u(imax,j)=umax
enddo
!$OMP END DO
! Main computation loop
do it=1,itmax
!$OMP SINGLE
dumax=0.0
!$OMP END SINGLE
!$OMP DO REDUCTION (max:dumax)
do j=2,jm1
do i=2,im1
du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j)
dumax=max(dumax,abs(du(i,j)))
enddo
enddo
!$OMP END DO
!$OMP DO
do j=2,jm1
do i=2,im1
u(i,j)=u(i,j)+du(i,j)
enddo
enddo
!$OMP END DO NOWAIT
!$OMP MASTER
write (1,*) it,dumax
!$OMP END MASTER
!$OMP BARRIER
enddo
!$OMP END PARALLEL
stop
end</code></pre>
<p>
To see the C version of this code click on
<a href="mlp.lpomp.html" target="_blank">C Code</a>
<p>.</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -