📄 kernf77.txt
字号:
Programs for Kernel Regression (Fortran)
This page provides a description of the Fortran F77 subroutines glkern.f and
lokern.f (version January 1997) with
general remarks, computational remarks, recommendations, examples, statistical
remarks, and references.
This software was developed in the group of Theo Gasser by several people,
mainly Walter K鰄ler, Alois Kneip and Eva Herrmann. A shell archive can be found
here. It contains the README file with some instructions for installing them,
the subroutines glkern.f, lokern.f, and the collection of further subroutines
subs.f.
Back to the Software-Page.
General Remarks
The subroutines glkern.f and lokern.f use an efficient and fast algorithm for
automatically adaptive nonparametric regression estimation with a kernel method.
Roughly speaking, the method performs a local averaging of the observations when
estimating the regression function. Analogously, one can estimate derivatives of
small order of the regression function.
Crucial for the kernel regression estimation used here is the choice of global
or local bandwidths. Too small ones will lead to a wiggly curve, too large ones
will smooth away important details.
The subroutine glkern.f calculates an estimator of the regression function or
derivatives of the regression function with an automatically chosen global
plugin bandwidth. The subroutine lokern.f calculates such an estimator with an
automatically chosen bandwidth function. It is also possible to use global and
local bandwidths, respectively, which are specified by the user.
The main idea of the plugin method is to estimate the optimal bandwidths by
estimating the asymptotically optimal mean (integrated) squared error optimal
bandwidths. Therefore, one has to estimate the variance for homoscedastic error
variables and a functional of a smooth variance function for heteroscedastic
error variables, respectively. Also, one has to estimate an integral functional
of the squared k-th derivative of the regression function (k=KORD) for the
global bandwidth and the squared k-th derivative itself for the local
bandwidths. Here, a further kernel estimator for this derivative is used with a
bandwidth which is adapted iteratively to the regression function.
A convolution form of the kernel estimator for the regression function and its
derivatives is used. Thereby, one can adapt the S-array (which is
S(I)=(T(I)+T(I+1))/2 in the standard convolution form) to be a smoothed grid
more suitable for random design, see Herrmann (1996). Using this estimator leads
to an asymptotically minimax efficient estimator for fixed and random design.
Polynomial kernels and boundary kernels are used with a fast and stable updating
algorithm for kernel regression estimation.
More details can be found in the papers referred to in the references.
Goto top, computational remarks, recommendations, examples, statistical remarks,
and references.
Computational Remarks and Usage
Here, we give some details on how to implement the subroutines. They are
programmed in a way that most of the parameters may be chosen by default values.
Both driver subroutines glkern.f and lokern.f for nonparametric smoothing and
differentiation of a regression function need further subroutines which are
contained in the file subs.f .
They can be called by
CALL GLKERN(T,X,N,TT,M,IHOM,NUE,KORD,IRND,
. ISMO,M1,TL,TU,S,SIG,WN,W1,B,Y)
and
CALL LOKERN(T,X,N,TT,M,IHOM,NUE,KORD,IRND,
. ISMO,M1,TL,TU,S,SIG,WN,W1,WM,BAN,Y)
respectively.
Necessary input variables and arrays which have to be chosen for the
estimation situation at hand are:
INTEGER N,M
DOUBLE PRECISION T(N),X(N),TT(M)
Thereby, (T(1),X(1)),...,(T(N),X(N)) describes the raw data with ordered
points T(1)<T(2)<...<T(N) (the design). The array TT(1),...,TT(M) should
contain the ordered points where the regression function and/or its
derivatives have to be estimated.
For equidistant points T(1),...,T(N), the TT-Array and the T-Array will
often coincide. Another typical situation will occur if the estimated curve
should be sent to some graphics software. In this situation, it might be
useful to choose TT(1),...,TT(M) as about M=300 equidistant points with
nearly the same range as T(1),...,T(N), for example
M=300
DO 100 J=1,M
100 TT(J)=T(1)+(T(N)-T(1))*DBLE(J-1)/DBLE(M-1)
Input variables and arrays which can often been chosen by default values are
given by:
INTEGER IHOM,NUE,KORD,IRND,ISMO,M1
DOUBLE PRECISION TL,TU,S(0:N),SIG
with defaults
PARAMETER (IHOM=0,NUE=0,KORD=NUE+2)
PARAMETER (IRND=0,ISMO=0,M1=400)
DATA TL,TU/1.0,0.0/,S/N*0.0,-1.0/,SIG/-1.0/
Now we give some detailed remarks on the effect of these variables.
ISMO: This variable simply indicates if previously specified local or global
bandwidths should be used. Than put ISMO<>0 and specify a global bandwidth
by the variable B in glkern.f and a local bandwidth array BAN of length M in
lokern.f.
Data-adaptive plugin bandwidths are calculated for ISMO=0 (default).
NUE: This variable specifies which derivative of the regression function
should be estimated. For automatic bandwidth choice (ISMO=0, default) only
NUE=0,1 or 2 are possible, for smoothing with specified bandwidths (ISMO<>0)
NUE=0,1,2,3 or 4 are possible. Note that the plugin bandwidths will differ
for different NUE. See also the statistical remarks below on derivative
estimation.
KORD: This variable specifies the kernel order which is used by the kernel
regression estimator. KORD-NUE has to be an even number, KORD=NUE+2 will be
the default choice.
Only NUE+2<=KORD<=4 is allowed for automatic bandwidth choice (ISMO=0,
default) and NUE+2<=KORD<=6 is allowed for smoothing with specified
bandwidths (ISMO<>0).
IHOM, SIG: These variables indicate which variance estimator should be used
for IRND=0 or ISMO=0 (default values). They are not used at all for both
IRND<>0 and ISMO<>0.
For IHOM=0 (default) homoscedastic variance is assumed, i.e. that the
variance of the error variables equals a constant and thus does not depend
on the location or on the regression function. For IHOM=0 and SIG<=0 (both
default values), the variance is estimated nonparametrically and can be
found as output of SIG as long as IRND=0 or ISMO=0 (default values). For
IHOM=0 and SIG>0 the algorithm uses the input value of SIG as variance
estimator for IRND=0 or ISMO=0 (default values).
For IHOM<>0 a smooth variance function is assumed and estimated by the
algorithm for IRND=0 or ISMO=0 (default values). The variable SIG is not
used than.
IRND: This variable will specify the weights of the convolution kernel
estimator and the calculation of the S-array. The classical Gasser-M黮ler
form of the weights is used for IRND<>0. This will be useful for equidistant
and nearly equidistant design points T(1),...,T(N). If the distances
T(I)-T(I-1) are very variable, e.g. if T(1),...,T(N) are ordered variables
from a random sample, such a convolution kernel estimator is inefficient.
Than it is useful to perform an easy modification which is done for IRND=0
(default). For large sample size N this modification can also handle
multiple design points automatically.
B>M1: This variable gives the number of grid points when approximating
integral functionals for plugin bandwidth selection. The default value
M1=400 can be used in most situations. Only for very large sample size N and
wiggly regression or variance functions it may be increased further.
TL, TU: These variables specify lower and upper bounds of the region for the
global bandwidth estimation step (even in lokern.f). If TL>=TU (default)
they are set automatically and exclude a small part of the boundary region.
S(0:N): This array is of methodological interest mainly. It specifies the
weights of the convolution kernel estimator. For S(N)>=S(0) (default) this
array is computed automatically according to the IRND-variable. For IRND=0
(default) the computation of this array does not depend on the bandwidths,
the same array may be used if the subroutines are called for different
bandwidths (ISMO<>0). It is very important to perform a new initialization
for different design points and useful to do so for different NUE, KORD and
variance too.
For IRND<>0 the S-array does only depend on T(1),...,T(N).
Output variables beside TL, TU, S(0:N) and SIG which are discussed above are
given by
DOUBLE PRECISION B,Y(M)
for glkern.f and by
DOUBLE PRECISION BAN(M),Y(M)
for lokern.f. For ISMO=0 (default) B and BAN(M) will give the plugin
bandwidths. Thereby, in the inner part, mainly those data points
(T(1),X(1)),...,(T(N),X(N)) are used for estimation of the NUE-th derivative
of the regression function at a point TT(J) with a bandwidth B where
TT(J)-B<=T(I)<=TT(J)+B. At the left boundary region mainly those with
T(1)<=T(I)<=T(1)+2B are used and analogously at the right boundary region.
Note that the exact regions are more complicated, especially for IRND=0.
The estimated NUE-th derivative of the regression function at a point TT(J)
can be found in Y(J).
Besides glkern.f needs the following work arrays
Double Precision WN(0:N,5),W1(M1,3)
and lokern.f needs the work arrays
Double Precision WN(0:N,5),W1(M1,3),WM(M)
Goto top, general remarks, computational remarks, recommendations, examples,
statistical remarks, and references.
Some Recommendations
For estimating derivatives (deriv = 1,2), it is advisable to interpolate the
data linearly to an equally spaced, relatively dense grid. This is true in
particular if the sample size is small. Note: To estimate the optimal
bandwidth the original data needs to be used in a step prior to estimating
derivatives.
Estimating the optimal bandwidth(s) is accompanied inevitably by quite some
variability. If one has a sample of similar curves, one might use the
average bandwidth(s) with some advantage. This is particularly true for
local bandwidths. An example are growth curves for a sample of children.
Goto top, general remarks, computational remarks, examples, statistical remarks,
and references.
Examples
1. In the first example 75 simulated data points for an equidistant grid on the
unit interval are given by a data statement. Subroutine glkern.f is used to
estimate the regression function (a linear function plus an exponential peak)
and its first and second derivative. Most parameters are chosen by default
values. Only IRND=1 is used because of the equidistant grid.
PROGRAM
INTEGER N,M,IHOM,NUE,KORD,IRND,ISMO,M1,I,J
PARAMETER(N=75,M=300,IHOM=0,IRND=1,ISMO=0,M1=400)
DOUBLE PRECISION T(N),X(N),TT(M),TL,TU,S(0:N),SIG
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -