📄 dddot.f
字号:
Program dddot! ----------------------------------------------------------------------! **********************************************************************! *** This program is part of the EuroBen Efficiency Benchmark ***! *** ***! *** Copyright: European Benchmark Group p/o ***! *** Utrecht University, High Perf. Computing Group ***! *** P.O. Box 80195 ***! *** 3508 TD Utrecht ***! *** The Netherlands ***! *** ***! *** Author of this program: Aad J. van der Steen ***! *** Email: steen@phys.uu.nl ***! *** Date: Spring 1998, modified Spring 2001 ***! **********************************************************************! ----------------------------------------------------------------------!- Purpose of program dddot! ------------------------! This program measures the performance of a distributed dot product.! The computation of the partial sums is not optimal for most! processors but will hardly be significant with respect to the! communication.! Three variants are examined:! 1) dddot1.f uses a naive scheme gathering the partial sums on! one processor and distributing the global sum to all processors! again by plain send and receive calls.! 2) dddot2.f uses also send and receive calls but now a tree ! structured communication is used in which the global sub-sums! are gathered on the way up.! 3) The MPI_Reduce and MPI_Bcast are used. When these calls are! implemented well, this should be the most efficient way.!! Remarks! -------! EuroBen encapsulation routines are used for setting up the network! (csetup).! In addition, the EuroBen status routine 'state' is used.! ---------------------------------------------------------------------- Use dist_module Use numerics Implicit None Include 'mpif.h' Integer, Parameter :: n = 1 000 000, nrep = 10 Integer, Parameter :: maxnodes = 2048 Integer :: ranks(maxnodes) Integer :: allgroup, partgroup, partcomm Real(l_) :: sum, x1(n), y1(n) Real(l_) :: dddot1, dddot2, dddot3 Real(l_) :: eltim1, eltim2, eltim3, rmfl1, rmfl2, rmfl3 Real(l_) :: eltime1, eltime2, eltime3 Integer :: i, ierr, k, nlen Logical :: ok1, ok2, ok3! ----------------------------------------------------------------------! --- No. of tasks is given in the 'mpirun' command. Set up network. Call csetup! ----------------------------------------------------------------------! --- Call identification routine.and intinialise the group of all! processes. If ( me == 0 ) Call state( 'dddot ' ) If ( me == 0 ) Print 1000, nodes Do k = 1, nodes ranks(k) = k - 1 End Do Call MPI_Comm_Group( MPI_Comm_World, allgroup, ierr )! ----------------------------------------------------------------------! --- Loop over k = 1,..., nodes cases. Use partial communicators! to mobilise an increasing amount of processes. Do k = 1, nodes nlen = n/k ok1 = .TRUE.; ok2 = .TRUE.; ok3 = .TRUE.! ----------------------------------------------------------------------! --- Initialize vectors. Do i = 1, nlen x1(i) = 1.0_l_ + me y1(i) = 2.0E-6_l_ End Do! ----------------------------------------------------------------------! --- Initialize partial communicator. Call MPI_Group_Incl( allgroup, k, ranks, partgroup, ierr ) Call MPI_Comm_Create( MPI_Comm_World, partgroup, partcomm, & ierr )! ----------------------------------------------------------------------! --- Time simple implementation. If ( me < k ) Then Call MPI_Barrier( partcomm, ierr ) eltim1 = MPI_Wtime() Do i = 1, nrep sum = dddot1( k, nlen, x1, y1 ) End Do eltim1 = MPI_Wtime() - eltim1 Call MPI_Reduce( eltim1, eltime1, 1, MPI_Real8, & MPI_Max, 0, partcomm, ierr ) rmfl1 = 2.0_l_*n*nrep*1.0e-6_l_/Max( eltime1, 1.0e-9_l_ ) ok1 = Abs( sum - ( 1.0_l_ + k ) ) < 1.0e-3_l_ End If! ----------------------------------------------------------------------! --- Time tree implementation. If ( me < k ) Then Call MPI_Barrier( partcomm, ierr ) eltim2 = MPI_Wtime() Do i = 1, nrep sum = dddot2( k, nlen, x1, y1 ) End Do eltim2 = MPI_Wtime() - eltim2 Call MPI_Reduce( eltim2, eltime2, 1, MPI_Real8, & MPI_Max, 0, partcomm, ierr ) rmfl2 = 2.0_l_*n*nrep*1.0e-6_l_/Max( eltime2, 1.0e-9_l_ ) ok2 = Abs( sum - ( 1.0_l_ + k ) ) < 1.0e-3_l_ End If! ----------------------------------------------------------------------! -- Time MPI_Reduce/Broadcast implementation. If ( me < k ) Then Call MPI_Barrier( partcomm, ierr ) eltim3 = MPI_Wtime() Do i = 1, nrep sum = dddot3( k, partcomm, nlen, x1, y1 ) End Do eltim3 = MPI_Wtime() - eltim3 Call MPI_Reduce( eltim3, eltime3, 1, MPI_Real8, & MPI_Max, 0, partcomm, ierr ) rmfl3 = 2.0_l_*n*nrep*1.0e-6_l_/Max( eltime3, 1.0e-9_l_ ) ok3 = Abs( sum - ( 1.0_l_ + k ) ) < 1.0e-3_l_ End If! ----------------------------------------------------------------------! --- Print results. If ( me == 0 ) Print 1010, k, eltime1, rmfl1, ok1, eltime2, & rmfl2, ok2, eltime3, rmfl3, ok3 If ( me < k ) Then Call MPI_Comm_Free( partcomm, ierr ) Call MPI_Group_Free( partgroup, ierr ) End If End Do ! ---------- End loop over 1,..., nodes processors. If ( me == 0 ) Print 1020! ----------------------------------------------------------------------! --- Exit network orderly. Call MPI_Finalize(ierr)! ----------------------------------------------------------------------! --- Formats. 1000 Format( /, 'Test of distributed dotproduct implementations:', & ' No. of procs. = ', i3 / & 79('-')/ & '|', 37x, 'Method:', 33x, '|'/ & 79('-')/ & '| | Simple |', ' Tree |', & ' Bcast/Red. |'/ & 79('-')/ & '|Procs| Time(s) |Mflop/s|OK| Time(s) |Mflop/s|OK|', & ' Time(s) |Mflop/s|OK|'/ & 79('-') ) 1010 Format( i5, ' |', g12.4, '|', f7.1, '|', l2, '|', g12.4, '|', & f7.1, '|', l2, '|', g12.4, '|', f7.1, '|', l2, '|' ) 1020 Format ( 79('-') )! ---------------------------------------------------------------------- End Program dddot
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -