📄 kernf77.txt

📁 The subroutines glkern.f and lokern.f use an efficient and fast algorithm for automatically adapti
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
 Programs for Kernel Regression (Fortran) 


This page provides a description of the Fortran F77 subroutines glkern.f and 
lokern.f (version January 1997) with 
general remarks, computational remarks, recommendations, examples, statistical 
remarks, and references.
This software was developed in the group of Theo Gasser by several people, 
mainly Walter K鰄ler, Alois Kneip and Eva Herrmann. A shell archive can be found 
here. It contains the README file with some instructions for installing them, 
the subroutines glkern.f, lokern.f, and the collection of further subroutines 
subs.f. 
Back to the Software-Page. 



General Remarks 
The subroutines glkern.f and lokern.f use an efficient and fast algorithm for 
automatically adaptive nonparametric regression estimation with a kernel method.
Roughly speaking, the method performs a local averaging of the observations when 
estimating the regression function. Analogously, one can estimate derivatives of 
small order of the regression function.
Crucial for the kernel regression estimation used here is the choice of global 
or local bandwidths. Too small ones will lead to a wiggly curve, too large ones 
will smooth away important details.
The subroutine glkern.f calculates an estimator of the regression function or 
derivatives of the regression function with an automatically chosen global 
plugin bandwidth. The subroutine lokern.f calculates such an estimator with an 
automatically chosen bandwidth function. It is also possible to use global and 
local bandwidths, respectively, which are specified by the user.
The main idea of the plugin method is to estimate the optimal bandwidths by 
estimating the asymptotically optimal mean (integrated) squared error optimal 
bandwidths. Therefore, one has to estimate the variance for homoscedastic error 
variables and a functional of a smooth variance function for heteroscedastic 
error variables, respectively. Also, one has to estimate an integral functional 
of the squared k-th derivative of the regression function (k=KORD) for the 
global bandwidth and the squared k-th derivative itself for the local 
bandwidths. Here, a further kernel estimator for this derivative is used with a 
bandwidth which is adapted iteratively to the regression function.
A convolution form of the kernel estimator for the regression function and its 
derivatives is used. Thereby, one can adapt the S-array (which is 
S(I)=(T(I)+T(I+1))/2 in the standard convolution form) to be a smoothed grid 
more suitable for random design, see Herrmann (1996). Using this estimator leads 
to an asymptotically minimax efficient estimator for fixed and random design.
Polynomial kernels and boundary kernels are used with a fast and stable updating 
algorithm for kernel regression estimation.
More details can be found in the papers referred to in the references.
Goto top, computational remarks, recommendations, examples, statistical remarks, 
and references.



Computational Remarks and Usage
Here, we give some details on how to implement the subroutines. They are 
programmed in a way that most of the parameters may be chosen by default values. 
Both driver subroutines glkern.f and lokern.f for nonparametric smoothing and 
differentiation of a regression function need further subroutines which are 
contained in the file subs.f . 
    They can be called by 
      CALL GLKERN(T,X,N,TT,M,IHOM,NUE,KORD,IRND,
     .            ISMO,M1,TL,TU,S,SIG,WN,W1,B,Y)
    and 
      CALL LOKERN(T,X,N,TT,M,IHOM,NUE,KORD,IRND,
     .            ISMO,M1,TL,TU,S,SIG,WN,W1,WM,BAN,Y)
    respectively. 
    Necessary input variables and arrays which have to be chosen for the 
    estimation situation at hand are:
    
      INTEGER N,M
      DOUBLE PRECISION T(N),X(N),TT(M)
    Thereby, (T(1),X(1)),...,(T(N),X(N)) describes the raw data with ordered 
    points T(1)<T(2)<...<T(N) (the design). The array TT(1),...,TT(M) should 
    contain the ordered points where the regression function and/or its 
    derivatives have to be estimated.
    For equidistant points T(1),...,T(N), the TT-Array and the T-Array will 
    often coincide. Another typical situation will occur if the estimated curve 
    should be sent to some graphics software. In this situation, it might be 
    useful to choose TT(1),...,TT(M) as about M=300 equidistant points with 
    nearly the same range as T(1),...,T(N), for example 
    
      M=300
      DO 100 J=1,M
100      TT(J)=T(1)+(T(N)-T(1))*DBLE(J-1)/DBLE(M-1)
    Input variables and arrays which can often been chosen by default values are 
    given by:
    
      INTEGER IHOM,NUE,KORD,IRND,ISMO,M1
      DOUBLE PRECISION TL,TU,S(0:N),SIG
    with defaults 
      PARAMETER (IHOM=0,NUE=0,KORD=NUE+2)      
      PARAMETER (IRND=0,ISMO=0,M1=400)
      DATA TL,TU/1.0,0.0/,S/N*0.0,-1.0/,SIG/-1.0/      
    Now we give some detailed remarks on the effect of these variables.
    ISMO: This variable simply indicates if previously specified local or global 
    bandwidths should be used. Than put ISMO<>0 and specify a global bandwidth 
    by the variable B in glkern.f and a local bandwidth array BAN of length M in 
    lokern.f.
    Data-adaptive plugin bandwidths are calculated for ISMO=0 (default).
    NUE: This variable specifies which derivative of the regression function 
    should be estimated. For automatic bandwidth choice (ISMO=0, default) only 
    NUE=0,1 or 2 are possible, for smoothing with specified bandwidths (ISMO<>0) 
    NUE=0,1,2,3 or 4 are possible. Note that the plugin bandwidths will differ 
    for different NUE. See also the statistical remarks below on derivative 
    estimation.
    KORD: This variable specifies the kernel order which is used by the kernel 
    regression estimator. KORD-NUE has to be an even number, KORD=NUE+2 will be 
    the default choice.
    Only NUE+2<=KORD<=4 is allowed for automatic bandwidth choice (ISMO=0, 
    default) and NUE+2<=KORD<=6 is allowed for smoothing with specified 
    bandwidths (ISMO<>0).
    IHOM, SIG: These variables indicate which variance estimator should be used 
    for IRND=0 or ISMO=0 (default values). They are not used at all for both 
    IRND<>0 and ISMO<>0.
    For IHOM=0 (default) homoscedastic variance is assumed, i.e. that the 
    variance of the error variables equals a constant and thus does not depend 
    on the location or on the regression function. For IHOM=0 and SIG<=0 (both 
    default values), the variance is estimated nonparametrically and can be 
    found as output of SIG as long as IRND=0 or ISMO=0 (default values). For 
    IHOM=0 and SIG>0 the algorithm uses the input value of SIG as variance 
    estimator for IRND=0 or ISMO=0 (default values).
    For IHOM<>0 a smooth variance function is assumed and estimated by the 
    algorithm for IRND=0 or ISMO=0 (default values). The variable SIG is not 
    used than.
    IRND: This variable will specify the weights of the convolution kernel 
    estimator and the calculation of the S-array. The classical Gasser-M黮ler 
    form of the weights is used for IRND<>0. This will be useful for equidistant 
    and nearly equidistant design points T(1),...,T(N). If the distances 
    T(I)-T(I-1) are very variable, e.g. if T(1),...,T(N) are ordered variables 
    from a random sample, such a convolution kernel estimator is inefficient. 
    Than it is useful to perform an easy modification which is done for IRND=0 
    (default). For large sample size N this modification can also handle 
    multiple design points automatically.
    B>M1: This variable gives the number of grid points when approximating 
    integral functionals for plugin bandwidth selection. The default value 
    M1=400 can be used in most situations. Only for very large sample size N and 
    wiggly regression or variance functions it may be increased further.
    TL, TU: These variables specify lower and upper bounds of the region for the 
    global bandwidth estimation step (even in lokern.f). If TL>=TU (default) 
    they are set automatically and exclude a small part of the boundary region.
    S(0:N): This array is of methodological interest mainly. It specifies the 
    weights of the convolution kernel estimator. For S(N)>=S(0) (default) this 
    array is computed automatically according to the IRND-variable. For IRND=0 
    (default) the computation of this array does not depend on the bandwidths, 
    the same array may be used if the subroutines are called for different 
    bandwidths (ISMO<>0). It is very important to perform a new initialization 
    for different design points and useful to do so for different NUE, KORD and 
    variance too.
    For IRND<>0 the S-array does only depend on T(1),...,T(N).
    Output variables beside TL, TU, S(0:N) and SIG which are discussed above are 
    given by 
      DOUBLE PRECISION B,Y(M)
    for glkern.f and by 
      DOUBLE PRECISION BAN(M),Y(M)
    for lokern.f. For ISMO=0 (default) B and BAN(M) will give the plugin 
    bandwidths. Thereby, in the inner part, mainly those data points 
    (T(1),X(1)),...,(T(N),X(N)) are used for estimation of the NUE-th derivative 
    of the regression function at a point TT(J) with a bandwidth B where 
    TT(J)-B<=T(I)<=TT(J)+B. At the left boundary region mainly those with 
    T(1)<=T(I)<=T(1)+2B are used and analogously at the right boundary region. 
    Note that the exact regions are more complicated, especially for IRND=0.
    The estimated NUE-th derivative of the regression function at a point TT(J) 
    can be found in Y(J). 
    Besides glkern.f needs the following work arrays 
 
      Double Precision WN(0:N,5),W1(M1,3)
    and lokern.f needs the work arrays 
 
      Double Precision WN(0:N,5),W1(M1,3),WM(M)
Goto top, general remarks, computational remarks, recommendations, examples, 
statistical remarks, and references.



Some Recommendations
    For estimating derivatives (deriv = 1,2), it is advisable to interpolate the 
    data linearly to an equally spaced, relatively dense grid. This is true in 
    particular if the sample size is small. Note: To estimate the optimal 
    bandwidth the original data needs to be used in a step prior to estimating 
    derivatives. 
    Estimating the optimal bandwidth(s) is accompanied inevitably by quite some 
    variability. If one has a sample of similar curves, one might use the 
    average bandwidth(s) with some advantage. This is particularly true for 
    local bandwidths. An example are growth curves for a sample of children. 
Goto top, general remarks, computational remarks, examples, statistical remarks, 
and references. 



Examples
1. In the first example 75 simulated data points for an equidistant grid on the 
unit interval are given by a data statement. Subroutine glkern.f is used to 
estimate the regression function (a linear function plus an exponential peak) 
and its first and second derivative. Most parameters are chosen by default 
values. Only IRND=1 is used because of the equidistant grid.

      PROGRAM
      INTEGER N,M,IHOM,NUE,KORD,IRND,ISMO,M1,I,J
      PARAMETER(N=75,M=300,IHOM=0,IRND=1,ISMO=0,M1=400)
      DOUBLE PRECISION T(N),X(N),TT(M),TL,TU,S(0:N),SIG
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -