📄 init.f90
字号:
! {{{ comments
! PROGRAM CG2SIMD
! ----------------
! Author: John Merlin
! Department of Electronics and Computer Science,
! University of Southampton, Souhthampton S09 5NH, UK.
! tel: (+44) 703 593368
!
! Email: jhm@ecs.soton.ac.uk
!
!
! VERSION
! -------
! HPF Version (September 1995)
! Modified by John Merlin
! Changes from F77 version 0.1 :
! -- Removed sequence associations
! -- Converted to F90 array syntax
! -- Put declarations in INCLUDE files.
! -- Inserted HPF data mapping directives.
!
! Version : 0.1 (14 March 1989)
! Changes from the original version (0.0) :
!
! (i) Replace calls to real function RAN (the random number generator that
! is implicitly defined in VAX Fortran) with calls to a user-defined
! function RAND.
!
! (ii) Include definition of REAL FUNCTION RAND.
!
! (iii) Replace all instances of common block /switches/ by a PARAMETER
! statement (that defines the values of the names in /switches/ ).
! Remove the initialisation of /switches/ from BLOCK DATA.
!
! (iv) In subroutine 'output', replace:
!
! PRINT '(X, F10.3, 3X, F10.3, 3X, F11.3, 3X,
! * I5, 3X, F13.6, 4X, F13.6)',
! * KE, PE, (KE + PE),
! by :
! PRINT '(A, F10.3, 3X, F10.3, 3X, F11.3, 3X,
! * I5, 3X, F13.6, 4X, F13.6)',
! * ' ', KE, PE, (KE + PE),
!
!
!
! 1. INTRODUCTION
! ---------------
! This program provides a benchmark, for a single vector processor,
! of Conjugate Gradient iteration as applied to SU(2) lattice gauge theory
! with dynamical Kogut-Susskind fermions.
!
! Conjugate Gradient (CG) iteration is used to solve a large, sparse
! linear system of equations, and forms the core of several important
! algorithms for lattice gauge theory with fermions, eg, the hybrid and
! hybrid Monte-Carlo algorithms. Other parts of the full calculation are
! omitted, eg, updating the SU(2) gauge field and making measurements;
! however, the CG iteration part typically accounts for around 80% of the
! CPU time of the full calculation. No pre-conditioning is used, as the
! optimal form of pre-conditioner is the subject of current research and
! seems to be parameter and machine dependent; indeed, it is debatable
! whether pre-conditioning is worthwhile for KS fermions.
!
! Kogut-Susskind fermions are one of two formulations of fermions
! that are commonly used in lattice gauge theory. The other type, Wilson
! fermions, involve more than an order of magnitude more calculation and,
! when distributed on a multi-processor machine, have a calculation /
! communications ratio that is several times larger. The KS fermions
! used here are therefore relatively more demanding on the communications
! architecture.
!
! Four benchmark programs are provided:
!
!
! cg2simd -- SU(2) lattice gauge theory } single processor,
! cg3simd -- SU(3) " " " } vectorisable version
!
! cg2mimd -- SU(2) " " " } multi-processor,
! cg3mimd -- SU(3) " " " } vectorisable version
!
!
! The following sections apply generally to both the SU(2) and SU(3)
! versions.
!
!
!
! 2. BRIEF DESCRIPTION OF PROGRAM
! --------------------------------
! In lattice gauge theory, space and time are represented by a
! 4-dimensional lattice of discrete points. The lattice has dimensions
! nt * nx **3, where 'nt' = the number of points in the time dimension
! and 'nx' = the number of points in each space dimension.
!
! The lattice has 'fermion fields' defined at its sites and 'gauge
! fields' on its links. With Kogut-Susskind fermions the fields are:
!
! 'psi' and 'psidot'
!
! -- the 'fermion' fields. They exist only at EVEN sites (i.e. at
! sites (t,x,y,z) such that t+x+y+z is even). The fields at each
! site form a complex vector, with 2 components for SU(2) and 3 for
! SU(3). They are stored as follows:
!
! REAL psi (nt/2, nx, ny, nz, ncolor, ncomplex)
!
! and likewise for 'psidot', where:
!
! nx = ny = nz -- since the lattice is cubic in the space dimensions,
! ncolor = 2 for SU(2), 3 for SU(3)
! ncomplex = 2 -- this denotes the complex index (1 = real part,
! 2 = imaginary part)
!
! The site indices come first, and thus vary most rapidly, while the
! complex index is last. This ordering facilitates vectorisation.
!
! 'gauge'
!
! -- the 'gauge' fields. For SU(n) gauge theory they are 'SU(n) matrices'
! (i.e. complex n * n unitary matrices with determinant 1). There is
! one matrix per link.
!
! For SU(2)
! ---------
! Each SU(2) matrix is parameterised by 4 REAL numbers, a(i), i = 1,4,
! subject to the constraint a.a = 1. They are stored as follows:
!
! REAL gauge (nt/2, nx, ny, nz, ngauge, ndir, 0:1)
!
! Again, the sites indices (t,x,y,z) are first. These refer to the
! site at the start of the link on which the matrix is defined.
! The next index (ngauge = 4) gives the 4 components of the matrix,
! in terms of the parameterisation described above.
! The next index (ndir = 4) gives the direction of the link on which
! the matrix is defined.
! Finally, the gauge field array is decomposed into 2 parts, for
! even and odd sites. The final index refers to this decomposition
! (0 = even sites, 1 = odd sites).
!
! For SU(3)
! ---------
! In the SU(3) case, all of the components of the 3 * 3 complex
! matrix are explicitly stored, as a REAL array of dimensions (3,3,2).
! The gauge field for the whole lattice is stored as follows:
!
! REAL gauge (nt/2, nx, ny, nz, ncolor, ncolor, ncomplex, ndir, 0:1)
!
! The indices are the same as for SU(2), except that the single
! index 'ngauge', for the SU(2) component, is replaced by three
! indices 'ncolor, ncolor, ncomplex'. These refer to the row,
! column and complex part of the SU(3) matrix component.
!
!
! The gauge and fermion fields are coupled through the 'Dirac matrix'
! [D]. If 'i' and 'j' are lattice sites, then an element [D]ij of the
! Dirac matrix is nono-zero only if 'i' and 'j' are at opposite ends of a
! single link, in which case [D]ij is proportional to the gauge field on
! that link (or to the hermitian conjugate of the gauge field if the link
! is in a negative direction). Note that [D] has both site and colour
! indices, and that it connects even sites to odd and vice versa.
!
! The 'psidot' field is updated by solving the [matrix] * {vector}
! eqn:
!
! [-D * D + msq ] {psidot} = {rhs}
!
! where {} means a vector over all EVEN sites, and {rhs} is known.
! 'msq' is the unit matrix times the square of the fermion mass - the
! latter being a parameter of the theory, which is here set to 0.2.
! The matrix [-D * D + msq] is hermitian.
!
! This equation is solved by Conjugate Gradient iteration.
! {psidot} is actually the 'velocity' of {psi}; having updated {psidot},
! it is used to update {psi}. An update of {psidot} and {psi} constitutes
! one sweep of the program.
!
! The program times the CG iteration and works out its average
! floating point performance in Mflops/s. Ten sweeps are performed,
! although the performance should be the same for every sweep.
!
! At the start of a run the fields are randomly initialised. The
! performance measurements are independent of the field values, so it is
! not necessary to standardise the random number generator or its integer
! seed value. The seed value is initialised in BLOCK DATA.
!
!
!
! 3. VECTORISATION
! -----------------
! Each CG iteration involves the following operations:
!
! --------------------------------------------------------------------------
! | operation | times used |
! |-----------------------------------------------------------|------------|
! | (i) hermitian scalar product {u}.{v} | 2 |
! | (ii) scaled vector addition {u} + const * {v} | 3 |
! | (iii) [matrix] * {vector} [-D * D + msq] {u} | 1 |
! --------------------------------------------------------------------------
!
! Only operation (i) is not vectorisable.
!
! (ii) is vectorisable over all indices (vector length = N * ncolor,
! where N = total number of lattice sites).
!
! (iii) is calculated by a double operation of [D] on a vector,
! followed by scaled vector addition. The calculation of {v} = [D]{u}
! is vectorisable over sites (vector length = N/2), as explained in detail
! in subroutines DVEC and DVECDIR, where the calculation is performed.
! This is the reason that the site indices come first in the declarations
! of the {fermion vectors} and the [gauge field] array (see section 2).
! In order to vectorise {v} = [D]{u}, one of the vectors {u} or {v} must
! be shifted by one link, so that all of the site indices are aligned.
! This is done by subroutine SHIFTVEC.
!
!
!
! 4. BENCHMARK PARAMETERS
! ------------------------
! The performance depends only on the lattice dimensions (nt & nx).
! These are changed by globally changing one line of a PARAMETER statement
! (which is identical everywhere and so needs just one editor command!) :
!
! PARAMETER ( nt = 4, nx = 4, nsites = nt * nx * nx * nx,
!
! Note 'nt' and 'nx' must be EVEN.
! ----
! Suggested problem sizes for benchmarking are lattices with 2**n points
! -----------------------
! for a range of n, up to the available memory capacity, eg:
!
! 4**4, 8 * 4**3, 16 * 4**3,
! 4 * 8**3, 8**4, 16 * 8**3,
! 32 * 8**3, 8 * 16**3, 16**4
!
!
! Other adjustable parameters
! ---------------------------
! The user might also wish to change the values of 'nsweep',
! 'mquark' and 'msq'. They do not affect the measured performance,
! just the total running time of the program. They are defined in
! the following DATA statement in the BLOCK DATA subprogram:
!
! DATA h, halfh, mquark, msq, nsweep /
! * 0.01, 0.005, 0.2, 0.04, 10 /
!
! 'nsweep' gives the number of sweeps. The measured performance should
! be constant for all of them.
! 'mquark' gives the quark mass. The smaller this is, the larger the
! number of CG iterations required for convergence.
! 'msq' = mquark ** 2.
!
!
!
! 5. TIMING
! ----------
! Two calls are made to a machine-specific timing routine in
! subroutine SOLVE. They are represented by calls to a subroutine
! SECONDS (time), where it assumed the time is returned as a real number
! of seconds. Modifications may be necessary for the machine specific
! timing routines.
!
!
!
! 6. PROGRAM ANALYSIS
! --------------------
!
! 6.1 Floating point operations
! -----------------------------
! The following table shows the (flops / CG iteration)
!
! --------------------------------------------------------------------
! | operation | times | flops |vectorised |
! | | used | SU(2) | SU(3) | |
! |------------------|-----------|-----------|-----------|-----------|
! | {u}.{v} | 2 | 4 N | 6 N | no |
! | {u} + const {v} | 3 | 4 N | 6 N | yes |
! | [-D.D + msq]{u} | 1 | 260 N | 582 N | yes |
! |------------------------------|-----------|-----------|-----------|
! | total | 280 N | 612 N | |
! --------------------------------------------------------------------
!
! N = total number of sites (i.e. not just even sites).
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -