📄 17.html
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <style type="text/css"> body { font-family: Verdana, Arial, Helvetica, sans-serif;} a.at-term { font-style: italic; } </style> <title>Two Implementations of Laplace Solvers</title> <meta name="Generator" content="ATutor"> <meta name="Keywords" content=""></head><body> <p> There are at least two ways to implement the main computation loop of a Laplace solver: </p>
<ol>
<li>
<p> Compute all <code>du</code> values, then find a maximum <code>du</code>, then update all <code>u</code> values. This is known as the <em>vectorized</em> approach, as it tends to run well on vector supercomputers such as the Cray T90 series where the compilers look for long vectors of operations. On microprocessor-based systems with cached hierarchical memory, this approach tends to perform poorly due to low cache reuse. </p>
</li>
<li>
<p> Incrementally compute <code>du</code> values and compare with the current maximum, then update all <code>u</code> values. This is a <em>cache-friendly</em> approach, as the comparison operation can usually take place without additional memory operations. This approach often performs poorly on vector systems because
compilers often have trouble locating vectorizable operations in this style of code. </p>
</li>
</ol>
<p>Fortran 90 implementations of both approaches follow. </p>
<h3>Vectorized Implementation</h3>
<pre><code>program lpvect
integer imax,jmax,im1,im2,jm1,jm2,it,itmax
parameter (imax=2001,jmax=2001)
parameter (im1=imax-1,im2=imax-2,jm1=jmax-1,jm2=jmax-2)
parameter (itmax=100)
real*8 u(imax,jmax),du(imax,jmax),umax,dumax,tol,pi
parameter (umax=10.0,tol=1.0e-6,pi=3.14159)
do j=1,jmax
do i=1,imax-1
u(i,j)=0.0
du(i,j)=0.0
enddo
u(imax,j)=umax*sin(pi*float(j-1)/float(jmax-1))
enddo
! Main computation loop
do it=1,itmax
dumax=0.0
do j=2,jm1
do i=2,im1
du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j)
enddo
enddo
do j=2,jm1
do i=2,im1
dumax=max(dumax,abs(du(i,j)))
enddo
enddo
do j=2,jm1
do i=2,im1
u(i,j)=u(i,j)+du(i,j)
enddo
enddo
write (1,*) it,dumax
enddo
stop
end</code></pre>
<p>To see the C version of this code click on <a href="mlp_lpvect.html" target="_blank">C Code</a>.
</p>
<br>
<h3>Cache-Friendly Implementation</h3>
<pre><code>program lpcache
integer imax,jmax,im1,im2,jm1,jm2,it,itmax
parameter (imax=2001,jmax=2001)
parameter (im1=imax-1,im2=imax-2,jm1=jmax-1,jm2=jmax-2)
parameter (itmax=100)
real*8 u(imax,jmax),du(imax,jmax),umax,dumax,tol,pi
parameter (umax=10.0,tol=1.0e-6,pi=3.14159)
! Initialize
do j=1,jmax
do i=1,imax-1
u(i,j)=0.0
du(i,j)=0.0
enddo
u(imax,j)=umax*sin(pi*float(j-1)/float(jmax-1))
enddo
! Main computation loop
do it=1,itmax
dumax=0.0
do j=2,jm1
do i=2,im1
du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j)
dumax=max(dumax,abs(du(i,j)))
enddo
enddo
do j=2,jm1
do i=2,im1
u(i,j)=u(i,j)+du(i,j)
enddo
enddo
write (1,*) it,dumax
enddo
stop
end</code></pre>
<p>To see the C version of this code click on <a href="mlp.lpcache.html" target="_blank">C Code</a>.<p>
<p>Note that the cache-friendly implementation is virtually identical to the vectorized implementation, except that the <code>dumax=max(dumax,abs(du(i,j)))</code> operation takes place in the same loop structure as the calculation of <code>du(i,j)</code>. It is also possible to fold the update of <code>u(i,j)</code> into that loop structure; however, doing that makes the numerical results of the cache-friendly version change, as <code>du(i,j)</code> is then computed with updated values for <code>u(i,j)</code> on two sides. While this is often preferred (it acts as a convergence accelerator), it makes comparison with the vectorized implementation more difficult.
</p></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -