⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 mod1a.f

📁 网络带宽测试工具
💻 F
字号:
      Program mod1a! ----------------------------------------------------------------------! **********************************************************************! *** This program is part of the EuroBen Benchmark                  ***! *** Copyright: EuroBen Group p/o                                   ***! ***            Utrecht University, Computational Physics Dept.     ***! ***            P.O. Box 80.000                                     ***! ***            3508 TA Utrecht                                     ***! ***            The Netherlands                                     ***! ***                                                                ***! *** Authors of this program: Peter P.M. de Rijk and                ***! ***                          Aad J. van der Steen                  ***! *** Contributed:             Autumn 1990                           ***! *** Last change:             Autumn 2003                           ***! **********************************************************************!  Version 4.2!! --- Purpose of this module! --------------------------!  This program times some kernels for some vector lengths and!  calculates their MFlop rate.!!  In addition, n(1/2) and r(infi) are estimated for the same kernels.!!  Both parameters are obtained by a linear least squares fit on the!  function f(n) = alfa + beta*n!!  The following holds: r(infi) = 1/beta  (as long as in primary cache)!                       n(1/2)  = alfa / beta!!-----------------------------------------------------------------------      Use               numerics      Implicit          None      ! --- Universal Constants      Real(l_), Parameter :: zero = 0.0_l_, half = 0.5_l_,      &                       one = 1.0_l_, two = 2.0_l_,      &                       oneneg =-1.0_l_! --- Variables used for measurements      Real(l_), Allocatable :: x1(:), x2(:), x3(:), x4(:), y(:,:) ! --- Parameters used for no. of measuring methods and lower bound of!     time resolution accepted.      Real(l_), Parameter :: timelb=1.0e-12_l_! --- Parameters used in the Repetition Factor for the timings      Integer               :: nfixed, mrep  != 10 * nfixed! --- Variables used to prevent overoptimisation in timings      Integer,  Allocatable :: indj(:)      Integer               :: ndummy! --- Number of kernels to be examined      Integer               :: nkern! --- Character variables      Character             :: symbol*60! --- Stuff to be calculated      Integer               :: nflop, ntrans      Real(l_)              :: time, rmflop, rtrans, cycle, fpc! --- Counters and other stuff      Integer               :: i, k, kn, n, nrep       Integer               :: omp_get_num_threads, nprocs      Real(l_)              :: tbegin, tend! --- Introduce Common block to frustrate overoptimisation.      Real(l_)              :: s(2), ssw      Common /cfake/           s, ssw! --- External functions      Real(l_)              :: wclock, dran1! --- Initialization! ----------------------------------------------------------------------! --- Call module identification routine      Call state ( 'mod1a   ' )! ----------------------------------------------------------------------      tbegin = wclock()!$omp parallel shared(nprocs)      nprocs = omp_get_num_threads()!$omp end parallel      Print 9010, nprocs      Open( 1, File = 'mod1a.in' ) ! Contains vector lengths & rep. fac.      Open( 2, File = 'mod1a.krn' )! Contains flops executed in kernel.      Open( 3, File = 'mod1a.cyc' )! Contains clock frequency of proc.                                   ! (in Hz.)! --- Get clock frequency of processor.      Read( 3, * ) cycle! --- Get number of kernels.      nkern = 0   10 Read( 2, *, End = 20 ) symbol, nflop      nkern = nkern + 1      Go To 10   20 Rewind ( 2 )      Do k = 1, nkern          Read ( 2, * ) symbol, nflop, ntrans, kn          Print 9030, kn, symbol, nflop! --- Get problem size and repetition factor.   30   Read( 1, *, End = 50 ) n, nfixed         mrep = 10*nfixed! --- Initialize index vector to be used against overoptimization.         ndummy = -1999         Allocate ( indj(mrep) )         Do i = 1, mrep   40       Continue            indj(i) = Int( dran1( ndummy ) * mrep ) + 1! --- Avoid problematic cases (just to be sure ...)            If( indj(i) <= 0 .OR. indj(i) > mrep ) Go To 40         End Do! --- Get vector length, allocate, and initialise.	          nrep = Max( 10, Min( Int(nfixed/n) * 100, mrep ) )         Allocate( x1(4*n), x2(4*n), x3(4*n), x4(4*n), y(4*n,2) )         x1 = half         x2 = oneneg         x3 = two         x4 = half         y  = one! --- Time vector operation         Call timer( kn, n, x1, x2, x3, x4, y, 4*n, indj, mrep/2, nrep,     &               time )! --- Average MFlop/s         If(  nflop == 0  ) nflop = 1         rmflop = Real(nflop, l_)*1.0e-6_l_*Real(n, l_) /     &            Max( time, timelb )         rtrans = ntrans*1.0e-9_l_*Real(n, l_)/Max( time, timelb )         fpc    = rmflop*1.0e5_l_/cycle         Print 9040, n, time, rmflop, rtrans, fpc, nrep         Deallocate( x1, x2, x3, x4, y, indj )         Go To 30   50    Print 9020         Rewind( 1 )      End Do                                                    tend = wclock()      Print 9050, tend - tbegin! ----------------------------------------------------------------------  9010 Format(     &' ------------------------------------------------------------'/     &' |      Performance measurement of some basic kernels       |'/     &' | ======================================================== |'/     &' | A measuring method is used that evades overoptimisation  |'/     &' | of the repetition loop that is used for better timing    |'/     &' | accuracy by calling the function JSWITCH.  The call      |'/     &' | of this funtion depends on the iteration counter of      |'/     &' | the repetition loop.                                     |'/     &' ------------------------------------------------------------'//     &' No. of proc.s = ', i4 ) 9020 Format( 1x, 70('-') ) 9030 Format(//,1x, 70('=')/1x, 'Kernel', i3, ': ', a/     &12x,'No. of flops per Iteration =', i3/     &1x, 70('-')/     &1x, ' Loop ', '|   ', 'CPU Time', 2x,'|', 2x, 'Average', 4x, '|',     $'  Average', 4x, '|', ' Flop per ', '|', 2x, 'Repeat |'/     &1x, 'length', '|   ', '  sec   ', 2x,'|', 2x, 'Mflop/s', 4x, '|',     $'    GB/s ', 4x, '|', '   cycle  ', '|',2x, ' count |'/     &1x, 70('-') ) 9040 Format( 1x, i6, '|', 1pg13.5, '|', g13.5, '|', g13.5, '| ',     &        f8.4, ' |', i8, ' |' ) 9050 Format(//1X,'Total execution time:  ',g12.5)! ----------------------------------------------------------------------       End Program mod1a

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -