📄 transp.f
字号:
Program transp! ----------------------------------------------------------------------! **********************************************************************! *** This program is part of the EuroBen Efficiency Benchmark ***! *** ***! *** Copyright: European Benchmark Group p/o ***! *** Utrecht University, High Perf. Computing Group ***! *** P.O. Box 80195 ***! *** 3508 TD Utrecht ***! *** The Netherlands ***! *** ***! *** Author of this program: Aad J. van der Steen ***! *** Email: steen@phys.uu.nl ***! *** Date: Spring 2005 ***! **********************************************************************! ----------------------------------------------------------------------! --- Purpose of the program! ----------------------! This program performs a distributed matrix transpose. The! interesting items to be measured are the total speed, the fraction! of time spent in communication, and the aggregate bandwidth of! the communication.! ---------------------------------------------------------------------- Use numerics Use dist_module ! Contains # of proc.s & ! proc. numbers. Implicit None Include 'mpif.h' Real(l_), Allocatable :: a1(:,:), a2(:,:) Real(l_) :: speed, time, timeg Real(l_) :: mb, mbps, ctime, ctimeg, pctime Integer, Allocatable :: actsiz(:,:), base(:,:) Integer :: hsize, i, ierr, n1, n2, nrep, vsize Logical :: ok, okg! ---------------------------------------------------------------------- Call csetup If ( me == 0 ) Call state( 'transp ' ) Open( 1, File = 'transp.in' ) If ( me == 0 ) Print 1000, nodes 10 Read( 1, *, End = 30 ) n1, n2, nrep mb = 1.0e-6_l_*Real( 8*n1*n2, l_ ) Allocate( actsiz(0:nodes-1,2), base(0:nodes-1,2) ) Call sizoff( n1, n2, actsiz, base ) vsize = Maxval( actsiz(:,1) ) hsize = Maxval( actsiz(:,2) ) Allocate( a1(n1,hsize), a2(n2,vsize) ) Call gendat( a1, n1, actsiz(me,2), actsiz, base ) ctime = 0.0_l_ time = MPI_Wtime() Do i = 1, nrep Call gtrans( a1, a2, n1, n2, vsize, hsize, ctime ) End Do time = MPI_Wtime() - time Call MPI_Reduce( time, timeg, 1, MPI_Real8, MPI_Max, 0, & MPI_Comm_World, ierr ) Call MPI_Reduce( ctime, ctimeg, 1, MPI_Real8, MPI_Max, 0, & MPI_Comm_World, ierr ) pctime = 100.0_l_*(ctimeg/timeg) timeg = timeg/Real( nrep, l_ ) ctimeg = ctimeg/Real( nrep, l_ ) mbps = mb/ctimeg*( Real( nodes - 1)/Real( nodes**2, l_ ) ) ok = .TRUE. Call check( a2, vsize, n1, n2, actsiz, base, ok ) Call MPI_Reduce( ok, okg, 1, MPI_Logical, MPI_Land, 0, & MPI_Comm_World, ierr ) If ( me == 0 ) Print 1010, n1, n2, timeg, mbps, ctimeg, pctime, & okg Deallocate( a1, a2, actsiz, base ) Go To 10 30 If ( me == 0 ) Print 1020 Call MPI_Finalize( ierr )! ---------------------------------------------------------------------- 1000 Format( //, 'Distributed matrix transpose test: No. of procs. = ', & i3/ & 'Bandwidth per link, assuming (nodes**2) links'/ & '-------------------------------------------------------', & '------------------------'/ & ' Order | Exec. Time | Bandwidth | Comm. time |', & ' Frac. | Correctness |'/ & ' n1 | n2 | (s) | (MB/s) | (s) |', & ' (%) | OK |'/ & '-------------------------------------------------------', & '------------------------' ) 1010 Format( i4,' | ', i4,' |', g12.5,' |', g13.5, '|', g13.5, & '| ', f7.3, ' | ', l3, ' |' ) 1020 Format( '-------------------------------------------------------', & '------------------------' )! ---------------------------------------------------------------------- End Program transp
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -