📄 cg2.f90

📁 shpf 1.9一个并行编译器
💻 F90
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页
! {{{  comments
!   PROGRAM  CG2
!   ------------
!   Author:  John Merlin
!            Department of Electronics and Computer Science,
!            University of Southampton, Souhthampton S09 5NH, UK.
!      tel:  (+44) 703 593368
!
!   Email:   jhm@ecs.soton.ac.uk
!  
!
!   VERSION
!   -------
!   HPF Version  (September 1995)
!   Modified by John Merlin
!   Changes from F77 version 0.1 :
!   -- Removed sequence associations
!   -- Converted to F90 array syntax
!   -- Put declarations in INCLUDE files.
!   -- Inserted HPF data mapping directives.
!
!   Version :  0.1  (14 March 1989)
!   Changes from the original version (0.0) :
!   
!   (i) Replace calls to real function RAN  (the random number generator that
!       is implicitly defined in VAX Fortran)  with calls to a user-defined
!       function RAND.
!   
!   (ii) Include definition of  REAL FUNCTION RAND.
!   
!   (iii) Replace all instances of common block  /switches/  by a PARAMETER
!       statement (that defines the values of the names in  /switches/ ).
!       Remove the initialisation of  /switches/  from BLOCK DATA.
!   
!   (iv) In subroutine 'output', replace:
!   
!         PRINT '(X,  F10.3,  3X,  F10.3,  3X,  F11.3,  3X,
!        *            I5,     3X,  F13.6,  4X,  F13.6)',
!        *            KE,          PE,          (KE + PE),
!    by :
!         PRINT '(A,  F10.3,  3X,  F10.3,  3X,  F11.3,  3X,
!        *            I5,     3X,  F13.6,  4X,  F13.6)',
!        *       ' ', KE,          PE,          (KE + PE),
!   
!   
!
!   1. INTRODUCTION
!   ---------------
!        This program provides a benchmark, for a single vector processor,
!   of Conjugate Gradient iteration as applied to SU(2) lattice gauge theory
!   with dynamical Kogut-Susskind fermions.
!    
!        Conjugate Gradient (CG) iteration is used to solve a large, sparse 
!   linear system of equations, and forms the core of several important 
!   algorithms for lattice gauge theory with fermions, eg, the hybrid and 
!   hybrid Monte-Carlo algorithms.  Other parts of the full calculation are 
!   omitted, eg, updating the SU(2) gauge field and making measurements; 
!   however, the CG iteration part typically accounts for around 80% of the 
!   CPU time of the full calculation.  No pre-conditioning is used, as the 
!   optimal form of pre-conditioner is the subject of current research and 
!   seems to be parameter and machine dependent; indeed, it is debatable 
!   whether pre-conditioning is worthwhile for KS fermions.
!    
!        Kogut-Susskind fermions are one of two formulations of fermions
!   that are commonly used in lattice gauge theory.  The other type, Wilson
!   fermions, involve more than an order of magnitude more calculation and,
!   when distributed on a multi-processor machine, have a calculation / 
!   communications ratio that is several times larger.  The KS fermions
!   used here are therefore relatively more demanding on the communications 
!   architecture.
!  
!        Four benchmark programs are provided:
!   
!    
!   cg2simd  --  SU(2) lattice gauge theory }  single processor,
!   cg3simd  --  SU(3)    "      "      "   }  vectorisable version
!            
!   cg2mimd  --  SU(2)    "      "      "   }  multi-processor,
!   cg3mimd  --  SU(3)    "      "      "   }  vectorisable version
!  
!    
!   The following sections apply generally to both the SU(2) and SU(3)
!   versions. 
!     
!     
!     
!   2.  BRIEF DESCRIPTION OF PROGRAM
!   --------------------------------
!        In lattice gauge theory, space and time are represented by a
!   4-dimensional lattice of discrete points.  The lattice has dimensions
!   nt * nx **3,  where 'nt' = the number of points in the time dimension
!   and 'nx' = the number of points in each space dimension.
!    
!       The lattice has 'fermion fields' defined at its sites and 'gauge
!   fields' on its links.  With Kogut-Susskind fermions the fields are:
!   
!   'psi' and 'psidot'
!     
!     -- the 'fermion' fields.  They exist only at EVEN sites (i.e. at 
!        sites (t,x,y,z) such that t+x+y+z is even).   The fields at each
!        site form a complex vector, with 2 components for SU(2) and 3 for 
!        SU(3).   They are stored as follows:
!   
!         REAL  psi (nt/2,  nx, ny, nz, ncolor, ncomplex)
!    
!        and likewise for 'psidot', where:
!   
!        nx = ny = nz  -- since the lattice is cubic in the space dimensions,
!        ncolor   = 2 for SU(2),  3 for SU(3)
!        ncomplex = 2  -- this denotes the complex index (1 = real part, 
!                                                         2 = imaginary part)
!  
!        The site indices come first, and thus vary most rapidly, while the
!        complex index is last.  This ordering facilitates vectorisation.
!   
!   'gauge'
!  
!     -- the 'gauge' fields.  For SU(n) gauge theory they are 'SU(n) matrices'
!       (i.e. complex n * n unitary matrices with determinant 1).  There is
!        one matrix per link.
!   
!        For SU(2)
!        ---------
!        Each SU(2) matrix is parameterised by 4 REAL numbers, a(i), i = 1,4,
!        subject to the constraint  a.a = 1.  They are stored as follows:
!  
!         REAL  gauge (nt/2, nx, ny, nz, ngauge, ndir, 0:1)
!   
!        Again, the sites indices (t,x,y,z) are first.  These refer to the 
!        site at the start of the link on which the matrix is defined.
!        The next index (ngauge = 4) gives the 4 components of the matrix,
!        in terms of the parameterisation described above.
!        The next index (ndir = 4) gives the direction of the link on which
!        the matrix is defined.
!        Finally, the gauge field array is decomposed into 2 parts, for
!        even and odd sites.  The final index refers to this decomposition 
!        (0 = even sites, 1 = odd sites).
!   
!        For SU(3)
!        ---------
!        In the SU(3) case, all of the components of the 3 * 3 complex 
!        matrix are explicitly stored, as a REAL array of dimensions (3,3,2).
!        The gauge field for the whole lattice is stored as follows:
!   
!         REAL  gauge (nt/2, nx, ny, nz, ncolor, ncolor, ncomplex, ndir, 0:1)
!   
!        The indices are the same as for SU(2), except that the single
!        index 'ngauge', for the SU(2) component, is replaced by three
!        indices  'ncolor, ncolor, ncomplex'.  These refer to the row,
!        column and complex part of the SU(3) matrix component.
!   
!   
!        The gauge and fermion fields are coupled through the 'Dirac matrix'
!   [D].   If 'i' and 'j' are lattice sites, then an element [D]ij of the 
!   Dirac matrix is nono-zero only if 'i' and 'j' are at opposite ends of a 
!   single link, in which case  [D]ij  is proportional to the gauge field on 
!   that link  (or to the hermitian conjugate of the gauge field if the link 
!   is in a negative direction).  Note that [D] has both site and colour
!   indices, and that it connects even sites to odd and vice versa.
!   
!        The 'psidot' field is updated by solving the [matrix] * {vector}
!   eqn:
!   
!        [-D * D + msq ] {psidot} = {rhs}                 
!   
!   where {} means a vector over all EVEN sites, and {rhs} is known.  
!   'msq' is the unit matrix times the square of the fermion mass - the
!   latter being a parameter of the theory, which is here set to 0.2.
!   The matrix [-D * D + msq] is hermitian.
!   
!        This equation is solved by Conjugate Gradient iteration.
!   {psidot} is actually the 'velocity' of {psi};  having updated {psidot}, 
!   it is used to update {psi}.  An update of {psidot} and {psi} constitutes
!   one sweep of the program.
!    
!        The program times the CG iteration and works out its average 
!   floating point performance in Mflops/s.  Ten sweeps are performed, 
!   although the performance should be the same for every sweep.
!
!        At the start of a run the fields are randomly initialised.  The
!   performance measurements are independent of the field values, so it is 
!   not necessary to standardise the random number generator or its integer
!   seed value.  The seed value is initialised in BLOCK DATA.
!   
! 
!   
!   3.  VECTORISATION
!   -----------------
!        Each CG iteration involves the following operations:
!  
!   --------------------------------------------------------------------------
!   |                         operation                         | times used |
!   |-----------------------------------------------------------|------------|
!   |   (i)   hermitian scalar product    {u}.{v}               |      2     |
!   |  (ii)   scaled vector addition      {u} +  const * {v}    |      3     |
!   | (iii)   [matrix] * {vector}         [-D * D + msq] {u}    |      1     |
!   --------------------------------------------------------------------------
!   
!        Only operation (i) is not vectorisable.
!   
!        (ii) is vectorisable over all indices (vector length = N * ncolor, 
!   where N = total number of lattice sites).
!   
!        (iii) is calculated by a double operation of [D] on a vector, 
!   followed by scaled vector addition.  The calculation of {v} = [D]{u}
!   is vectorisable over sites (vector length = N/2), as explained in detail
!   in subroutines DVEC and DVECDIR, where the calculation is performed.
!   This is the reason that the site indices come first in the declarations
!   of the {fermion vectors} and the [gauge field] array   (see section 2).
!   In order to vectorise {v} = [D]{u}, one of the vectors {u} or {v} must
!   be shifted by one link,  so that all of the site indices are aligned.
!   This is done by subroutine SHIFTVEC.
!   
!   
!   
!   4.  BENCHMARK PARAMETERS
!   ------------------------
!        The performance depends only on the lattice dimensions (nt & nx).
!   
!   
!   Suggested problem sizes for benchmarking are lattices with 2**n points
!   -----------------------
!   for a range of n, up to the available memory capacity, eg:
!
!        4**4,    8 * 4**3,   16 * 4**3,    
!    4 * 8**3,        8**4,   16 * 8**3, 
!   32 * 8**3,   8 * 16**3,       16**4
!   
!   
!   Other adjustable parameters
!   ---------------------------
!        The user might also wish to change the values of 'nsweep',
!   'mquark' and 'msq'.  They do not affect the measured performance,
!   just the total running time of the program.  They are defined in
!   the following DATA statement in the BLOCK DATA subprogram:
!   
!         DATA     h,  halfh,  mquark,  msq,  nsweep /
!        *      0.01,  0.005,    0.2,  0.04,    10   /
!   
!   'nsweep'  gives the number of sweeps.  The measured performance should 
!             be constant for all of them.
!   'mquark'  gives the quark mass.  The smaller this is, the larger the
!             number of CG iterations required for convergence.
!   'msq' = mquark ** 2.
!   
!   
!   
!   5.  TIMING
!   ----------
!        Two calls are made to a machine-specific timing routine in
!   subroutine SOLVE.  They are represented by calls to a subroutine
!   SECONDS (time), where it assumed the time is returned as a real number
!   of seconds.  Modifications may be necessary for the machine specific
!   timing routines.
!   
!   
!  
!   6.  PROGRAM ANALYSIS
!   --------------------
!   
!   6.1 Floating point operations
!   -----------------------------
!        The following table shows the (flops / CG iteration)
!  
!   --------------------------------------------------------------------
!   |     operation    |   times   |         flops         |vectorised |
!   |                  |    used   |   SU(2)   |   SU(3)   |           |
!   |------------------|-----------|-----------|-----------|-----------|
!   | {u}.{v}          |     2     |     4 N   |     6 N   |     no    |
!   | {u} + const {v}  |     3     |     4 N   |     6 N   |    yes    |
!   | [-D.D + msq]{u}  |     1     |   260 N   |   582 N   |    yes    |
!   |------------------------------|-----------|-----------|-----------|
!   | total                        |   280 N   |   612 N   |           |
!   --------------------------------------------------------------------
!   
!   N = total number of sites (i.e. not just even sites).
!   
!   
!   6.2  Analysis of vectorisation
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -