📄 readme

📁 网络带宽测试工具
💻
字号:
              How to run the EuroBen Efficiency Benchmark.              ===========================================Below we describe how to go about installing and running the EuroBenEfficiency Benchmark. In the case that you run into troubleplease mail to:                 Aad van der Steen; steen@phys.uu.nl                  -----------------------------------The EuroBen Efficiency Benchmark has the following structure:                         |-- Makefile			 |------------ commun/			 |-- dddot/			 |------------ fft1d/			 |-- gmxm/			 |------------ gmxv/             effbench/ --|-- linful/                         |------------ linspar/	                 |-- ping/			 |------------ pingpong/			 |-- qsort/			 |------------ smxv/			 |-- transp/			 |------------ wvlt2d/================================================================================                         INSTALLATION AND EXECUTION================================================================================The master Makefile in effbench/ can be used for installation of the13 programs:commun  -- A test for the speed of various communication patterns (MPI).dddot   -- A test for the speed a distributed dotproduct (MPI).fft1d   -- A test for the speed of a 1-D FFT.gmxv    -- A test for the speed of the matrix-vector multiply Ab = c.gmxm    -- A test for the speed of the matrix-matrix multiply AB = C.linful  -- A test for the solution of a full linear system Ax = b.linspar -- A test for the solution of a sparse linear system Ax = b.ping    -- A very detailed test to assess bandwidth and latency           in point-to-point communication.(1-sided communication, MPI).pingpong-- A very detailed pingpong test to assess bandwidth and latency           in point-to-point communication.(2-sided communication, MPI).	   qsort   -- A test for the speed of Quicksort on Integers and 8-byte Reals.	   	   smxv    -- A test for the speed of the sparse matrix-vector multiply Ab = c in           CRS format.transp  -- A test for the speed of a global distributed matrix transpose (MPI).	   wvlt2d  -- A test for the speed of a 2-D Haar Wavelet Transform.We assume that, at least, for the first time, you will want to runthe programs with the same compiler options. You should perform the followingsteps:1) cd basics/   1a - Modify the subroutine 'state.f' such that it reflects the state        of the system: type of machine, compiler version, compiler        options, OS version, etc.   1b - OPTIONAL (Non-MPI programs):	The program directories for the sequential programs contain	the timing functions 'wclock.f' and 'cclock.c'. 'wclock.f' is a	Fortran interface routine that calls 'cclock.c', which in turn	relies on the 'gettimeofday' Unix system routine. This timer	works almost everywhere (except under UNICOS) and delivers the	wallclock time with a resolution of a few microseconds. It is	generally better than the Fortran 90 routine 'System_Clock'.	If, for any reason you want to use another/better wallclock	timer, modify the Real*8 function wclock.f in basics.2) Go back to effbench/   2a - Do a 'make state': The 'state.f' routine that you have modified        is copied to all the program directories.   2b - OPTIONAL (Non-MPI programs):	If you have modified 'wclock.f' for the sequential programs in	basics/, do a 'make clock-seq':  The 'wclock.f' is copied to	the relevant program directories.3) cd install/   3a - In install/ you will find header files with definitions for        the 'make' utility.   3a1: Sequential programs:        Modify the 'Make.Incl-seq' such that is contains the correct         name for the Fortran 90 compiler, Loader (usually the same as        the compiler), and the options for the Fortran 90 and C        compiler. For completeness' sake there are empty definitions        for libraries (LIBS) and include files (INCS) you might want         to use but in normal situations they are not needed for the        sequential programs.    3a2: Parallel programs:	Modify the 'Make.Incl-mpi' such that is contains the correct	name for the Fortran 90 compiler, Loader (usually the same as	the compiler), and the options for the Fortran 90 compiler. The	names for the compiler systems for MPI programs may be	different from those for sequential programs. For completeness'	sake there is an empty definition for the include file (INCS) you	might want to use but in normal situations this is not needed.	For libraries (LIBS) fill in the name of the MPI library	(if necessary).   3a3: Modify the 'Speed.Incl' file:        It contains only one line starting with '++++'         Replace it by the Theoretical Peak Performance of your         system expressed in Mflop/s per CPU. So for a system with a	Theoretical Peak Performance of 3.6 Gflop/s per processor:	++++ --> 3600.	NOTE: This should really be per processor (per socket if you will)	      and NOT per core.4) Go back to effbench/   Do a 'make lib'. This will cause an object library 'intlib.a' to be   made that is used by the sequential numerical programs to compute   the integral of the performance over the appropriate problem size   ranges and to calculate latencies for some MPI programs.5)   5a: Do a 'make make-seq': This will cause the Makefiles in the       directories of the sequential programs to be completed according       to the specifications you made in 'install/Make.Incl-seq'.   5b: Do a 'make make-mpi': This will cause the Makefiles in the       directories of the MPI programs to be completed according       to the specifications you made in 'install/Make.Incl-mpi'.6) Do a 'make makeall': This will cause in the directories <prog>, where   <prog> is 'commun/', 'dddot/','fft1d/', etc. the executables to be made   each with the name x.<prog>. This will take a minute.    6a - For the non-MPI programs these can be run by: 'x.<prog>'.   6b - For the MPI programs run them by: 'mpirun -np <p> x.<prog>'        or 'mpiexec -n <p> x.<prog> where <p> is the desired number	of processes and x.<prog> the MPI executable. (or by any	equivalent of mpirun if this is not available, also see 8b below).7) Do a 'make speed': This will cause the Theoretical Peak Performance   to be set to the correct value in the relevant directories.8)    8a: Do a 'make runall': This will run all sequential programs in turn.          The results are placed in a directory called 'Log.`hostname`', where       'hostname' is the local name of your system. This will take        a few minutes. The results have names '<prog>.log' where <prog> is       any of the programs listed above.   8b: For the MPI programs 'make runall' will cause the MPI programs to       be run and the results to be transferred to 'Log.`hostname`'. The       programs are run with the following number of processors by:         mpirun -np 6  x.commun  > commun.log         mpirun -np 16 x.dddot > dddot.log	 mpirun -np 2  x.ping > ping.log	 mpirun -np 2  x.pingpong > pingpong.log	 mpirun -np 8  x.transp > transp.log	        NOTE: Although improbable, with newer MPI-2 implementations	     'mpirun -np <procs> <x.prog>' may have to be replaced by:	     'mpiexec -n <procs> <x.prog>'.	     This is provided for in the scripts 'x.all' in the 5	     relevant directories: comment out the 'mpirun' line and	     decomment the 'mpiexec' line.	      ================================================================================                   CUSTOMISING THE RUNS: (OPTIONAL)================================================================================You might want to run some of the programs in an alternative settingThis might include:- Other compiler options.  In that case do the following for any of the programs <prog>: 9a) cd <prog>/    9a1 - Modify the definition of 'FFLAGS' in the Makefile.    9a2 - Modify the compiler options line in subroutine 'state.f'.    9a3 - Do a 'make veryclean': this will remove all old objects and          the excutable.    9a4 - Do a 'make'.    9a5 - Do an 'x.all': this runs the program and writes the result to         '<prog>.log'.    9a6 - mv <prog>.log ../Log.hostname/ or,    9a7 - ATERNATIVELY, when you have run several programs with          altered repeat counts:          a. cd ..          b. Do a 'make collect': this causes any result file              '<prog>.log' to be moved from the '<prog>/' directories             to 'Log.hostname/'.- Substitute library calls or other equivalent code instead of that  of the model implementation. 9b) cd <prog>/    9b1 - Modify the definition of 'FFLAGS' in the Makefile (if required).    9b2 - Modify the compiler options line in subroutine 'state.f' (if          required).    9b3 - Do a 'make veryclean': this will remove all old objects and          the excutable.    9b4 - Invalidate the routines to be replaced by removing or renaming          them and, if necessary, modify the Makefile accordingly.	  Specifically: 	  A. For program 'gmxm' and 'gmxv' it is assumed that you would	     like to replace the given Fortran routines by the routines	     'dgemm' and 'dgemv', respectively. If so, modify the zero in	     the first line in 'gmxm.in' and 'gmxv.in' to an integer value	     /= 0 (and invalidate the supplied BLAS routines in the	     respective directories). If you use routines that are different	     from the BLAS routines, still modify 'gmxm.in' and 'gmxv.in'	     files by changing the zero to a non-zero value, but, in	     addition, change the calls to 'dgemm' and 'dgemv' to that of	     your own favorite library routines.	  B. In program 'linful' the factorisation and solution are based	     on the usual LAPACK routines. So, you only have to invalidate	     the source routines present in the directory.	  C. As there is no universally accepted standard for FFTs there	     is no alternative to modifying the code in 'fft1d.f'. Replace	     lines 81 and 82 by the call(s) to your favorite library	     routine.	            9b5 - Do a 'make'.    9b6 - Run the program as before.     9b7 - BE SURE TO REPORT THE SUBSTITUTION(S) IN THE RESULTS!   ================================================================================           	     ABOUT THE EFFICIENCY MEASURE================================================================================1) Programs 'fft1d', 'linful', 'linspar', and 'wvlt2d' measure an overall   efficiency (ratio of actual performance and theoretical peak performance) by   integrating the actual performance over a range of problem sizes. For   instance, 'linspar' is evaluated in the range N = 1000,...,20000 with 10   additional problem sizes in between. The problem sizes are given in the   appropriate '<prog>.in' file, with <prog> any of the four programs mentioned.   If for any reason (for instance because you suspect that the curve used for   the integration does not catch the performance behaviour of your processor   adequately), you may wish to add measuring points WITHIN the range given for   each of the programs. You are welcome to do so by adding the appropriate   line(s) to the '<prog>.in' file(s). Note, however, that it is NOT allowed to   modify the lower and upper bounds themselves.2) The four programs show a fraction of the peak performance that is required   to be attained and also the effiency measure that actually is attained by   integrating over the observation range. Obviously, the actual fraction of   the theoretical peak performance must be greater or equal to the required   fraction. Also obviously, the better the fraction is, the higher the effiency   of the processor is. It does not matter whether your final result is obtained   by using the original code, by optimising  it, or by using a library as long   as the library is a standard tool and generally accessible for the average   users of such processors.================================================================================           	           FURTHER REMARKS================================================================================1) The program 'ping' is a program that measures bandwidth and latency by   means of one-sided MPI communication (MPI_Get and MPI_Put). The present   situation is that many MPI implementations still do not support one-sided   communication as required in MPI-2. Consequently, program 'ping' may not   compile on your system and hence you will have no result for it. Because of   the slow adoption of full MPI-2 we presently do not consider this result as   mandatory but it certainly adds value to the total result of the benchmark.2) Please run the benchmark FIRST AS-IS, i.e., with the minimal changes to get   it running (probably none are necessary). Then, if you are inclined to do so,   do the optimisations you have in mind and run again.================================================================================                                Lastly,                         ====================                         | Best of success! |                         ====================
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -