📄 dddot.f

📁 网络带宽测试工具
💻 F
字号:
      Program dddot! ----------------------------------------------------------------------! **********************************************************************! *** This program is part of the EuroBen Efficiency Benchmark       ***! ***                                                                ***! *** Copyright: European Benchmark Group p/o                        ***! ***            Utrecht University, High Perf. Computing Group      ***! ***            P.O. Box 80195                                      ***! ***            3508 TD Utrecht                                     ***! ***            The Netherlands                                     ***! ***                                                                ***! *** Author of this program: Aad J. van der Steen                   ***! *** Email: steen@phys.uu.nl                                        ***! *** Date:  Spring 1998, modified Spring 2001                       ***! **********************************************************************! ----------------------------------------------------------------------!- Purpose of program dddot!  ------------------------!  This program measures the performance of a distributed dot product.!  The computation of the partial sums is not optimal for most!  processors but will hardly be significant with respect to the!  communication.!  Three variants are examined:!  1) dddot1.f uses a naive scheme gathering the partial sums on!     one processor and distributing the global sum to all processors!     again by plain send and receive calls.!  2) dddot2.f uses also send and receive calls but now a tree !     structured communication is used in which the global sub-sums!     are gathered on the way up.!  3) The MPI_Reduce and MPI_Bcast are used. When these calls are!     implemented well, this should be the most efficient way.!!  Remarks!  -------!  EuroBen encapsulation routines are used for setting up the network!  (csetup).!  In addition, the EuroBen status routine 'state' is used.! ----------------------------------------------------------------------      Use                  dist_module      Use                  numerics      Implicit             None      Include              'mpif.h'      Integer, Parameter :: n = 1 000 000, nrep = 10      Integer, Parameter :: maxnodes = 2048      Integer            :: ranks(maxnodes)      Integer            :: allgroup, partgroup, partcomm      Real(l_)           :: sum, x1(n), y1(n)      Real(l_)           :: dddot1, dddot2, dddot3      Real(l_)           :: eltim1, eltim2, eltim3, rmfl1, rmfl2, rmfl3      Real(l_)           :: eltime1, eltime2, eltime3      Integer            :: i, ierr, k, nlen      Logical            :: ok1, ok2, ok3! ----------------------------------------------------------------------! --- No. of tasks is given in the 'mpirun' command. Set up network.      Call csetup! ----------------------------------------------------------------------! --- Call identification routine.and intinialise the group of all!     processes.      If ( me == 0 ) Call state( 'dddot   ' )      If ( me == 0 ) Print 1000, nodes      Do k = 1, nodes         ranks(k) = k - 1      End Do      Call MPI_Comm_Group( MPI_Comm_World, allgroup, ierr )! ----------------------------------------------------------------------! --- Loop over k = 1,..., nodes cases. Use partial communicators!     to mobilise an increasing amount of processes.      Do k = 1, nodes         nlen = n/k         ok1  = .TRUE.; ok2  = .TRUE.; ok3  = .TRUE.! ----------------------------------------------------------------------! --- Initialize vectors.         Do i = 1, nlen            x1(i) = 1.0_l_ + me            y1(i) = 2.0E-6_l_         End Do! ----------------------------------------------------------------------! --- Initialize partial communicator.         Call MPI_Group_Incl( allgroup, k, ranks, partgroup, ierr )         Call MPI_Comm_Create( MPI_Comm_World, partgroup, partcomm,     &                         ierr )! ----------------------------------------------------------------------! --- Time simple implementation.         If ( me < k ) Then            Call MPI_Barrier( partcomm, ierr )            eltim1 = MPI_Wtime()            Do i = 1, nrep               sum = dddot1( k, nlen, x1, y1 )            End Do            eltim1  = MPI_Wtime() - eltim1            Call MPI_Reduce( eltim1, eltime1, 1, MPI_Real8,     &                       MPI_Max, 0, partcomm, ierr )             rmfl1  = 2.0_l_*n*nrep*1.0e-6_l_/Max( eltime1, 1.0e-9_l_ )            ok1    = Abs( sum - ( 1.0_l_ + k ) ) < 1.0e-3_l_         End If! ----------------------------------------------------------------------! --- Time tree implementation.         If ( me < k ) Then            Call MPI_Barrier( partcomm, ierr )            eltim2 = MPI_Wtime()            Do i = 1, nrep               sum = dddot2( k, nlen, x1, y1 )            End Do            eltim2  = MPI_Wtime() - eltim2            Call MPI_Reduce( eltim2, eltime2, 1, MPI_Real8,     &                       MPI_Max, 0, partcomm, ierr )             rmfl2  = 2.0_l_*n*nrep*1.0e-6_l_/Max( eltime2, 1.0e-9_l_ )            ok2    = Abs( sum - ( 1.0_l_ + k ) ) < 1.0e-3_l_         End If! ----------------------------------------------------------------------! -- Time MPI_Reduce/Broadcast implementation.         If ( me < k ) Then            Call MPI_Barrier( partcomm, ierr )            eltim3 = MPI_Wtime()            Do i = 1, nrep               sum = dddot3( k, partcomm, nlen, x1, y1 )            End Do            eltim3  = MPI_Wtime() - eltim3            Call MPI_Reduce( eltim3, eltime3, 1, MPI_Real8,     &                       MPI_Max, 0, partcomm, ierr )             rmfl3  = 2.0_l_*n*nrep*1.0e-6_l_/Max( eltime3, 1.0e-9_l_ )            ok3    = Abs( sum - ( 1.0_l_ + k ) ) < 1.0e-3_l_         End If! ----------------------------------------------------------------------! --- Print results.         If ( me == 0 ) Print 1010, k, eltime1, rmfl1, ok1, eltime2,     &                              rmfl2, ok2, eltime3, rmfl3, ok3         If ( me < k ) Then            Call MPI_Comm_Free( partcomm, ierr )            Call MPI_Group_Free( partgroup, ierr )          End If      End Do         ! ---------- End loop over 1,..., nodes processors.      If ( me == 0 ) Print 1020! ----------------------------------------------------------------------! --- Exit network orderly.      Call MPI_Finalize(ierr)! ----------------------------------------------------------------------! --- Formats. 1000 Format( /, 'Test of distributed dotproduct implementations:',     &        ' No. of procs. = ', i3 /     &    79('-')/     &    '|', 37x, 'Method:', 33x, '|'/     &    79('-')/      &    '|     |        Simple         |', '          Tree         |',     &    '     Bcast/Red.        |'/     &    79('-')/     &    '|Procs|  Time(s)   |Mflop/s|OK|  Time(s)   |Mflop/s|OK|',      &    '  Time(s)   |Mflop/s|OK|'/     &    79('-') ) 1010 Format( i5, ' |', g12.4, '|', f7.1, '|', l2, '|', g12.4, '|',     &        f7.1, '|', l2, '|',  g12.4, '|',  f7.1, '|', l2, '|' ) 1020 Format ( 79('-') )! ----------------------------------------------------------------------      End Program dddot
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -