⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 regem.m

📁 一个很有用的EM算法程序包
💻 M
📖 第 1 页 / 共 2 页
字号:
function [X, M, C, Xerr] = regem(X, options)%REGEM   Imputation of missing values with regularized EM algorithm.%%    [X, M, C, Xerr] = REGEM(X, OPTIONS) replaces missing values%    (NaNs) in the data matrix X with imputed values. REGEM%    returns%  %       X,    the data matrix with imputed values substituted for NaNs,  %       M,    the estimated mean of X, %       C,    the estimated covariance matrix of X,%       Xerr, an estimated standard error of the imputed values.%  %    Missing values are imputed with a regularized expectation%    maximization (EM) algorithm. In an iteration of the EM algorithm,%    given estimates of the mean and of the covariance matrix are%    revised in three steps. First, for each record X(i,:) with%    missing values, the regression parameters of the variables with%    missing values on the variables with available values are%    computed from the estimates of the mean and of the covariance%    matrix. Second, the missing values in a record X(i,:) are filled%    in with their conditional expectation values given the available%    values and the estimates of the mean and of the covariance%    matrix, the conditional expectation values being the product of%    the available values and the estimated regression%    coefficients. Third, the mean and the covariance matrix are%    re-estimated, the mean as the sample mean of the completed%    dataset and the covariance matrix as the sum of the sample%    covariance matrix of the completed dataset and an estimate of the%    conditional covariance matrix of the imputation error. %%    In the regularized EM algorithm, the parameters of the regression%    models are estimated by a regularized regression method. By%    default, the parameters of the regression models are estimated by%    an individual ridge regression for each missing value in a%    record, with one regularization parameter (ridge parameter) per%    missing value.  Optionally, the parameters of the regression%    models can be estimated by a multiple ridge regression for each%    record with missing values, with one regularization parameter per%    record with missing values. The regularization parameters for the%    ridge regressions are selected as the minimizers of the%    generalized cross-validation (GCV) function. As another option,%    the parameters of the regression models can be estimated by%    truncated total least squares. The truncation parameter, a%    discrete regularization parameter, is fixed and must be given as%    an input argument. The regularized EM algorithm with truncated%    total least squares is faster than the regularized EM algorithm%    with with ridge regression, requiring only one eigendecomposition%    per iteration instead of one eigendecomposition per record and%    iteration. But an adaptive choice of truncation parameter has not%    been implemented for truncated total least squares. So the%    truncated total least squares regressions can be used to compute%    initial values for EM iterations with ridge regressions, in which%    the regularization parameter is chosen adaptively.%  %    As default initial condition for the imputation algorithm, the%    mean of the data is computed from the available values, mean%    values are filled in for missing values, and a covariance matrix%    is estimated as the sample covariance matrix of the completed%    dataset with mean values substituted for missing%    values. Optionally, initial estimates for the missing values and%    for the covariance matrix estimate can be given as input%    arguments.% %    The OPTIONS structure specifies parameters in the algorithm:%%     Field name         Parameter                                  Default%%     OPTIONS.regress    Regression procedure to be used:           'mridge'%                        'mridge': multiple ridge regression%                        'iridge': individual ridge regressions%                        'ttls':   truncated total least squares %                                  regression %  %     OPTIONS.stagtol    Stagnation tolerance: quit when            5e-3 %                        consecutive iterates of the missing%                        values are so close that%                          norm( Xmis(it)-Xmis(it-1) ) %                             <= stagtol * norm( Xmis(it-1) )%  %     OPTIONS.maxit      Maximum number of EM iterations.           30%  %     OPTIONS.inflation  Inflation factor for the residual          1 %                        covariance matrix. Because of the %                        regularization, the residual covariance %                        matrix underestimates the conditional %                        covariance matrix of the imputation %                        error. The inflation factor is to correct %                        this underestimation. The update of the %                        covariance matrix estimate is computed %                        with residual covariance matrices %                        inflated by the factor OPTIONS.inflation,%                        and the estimates of the imputation error%                        are inflated by the same factor. %%     OPTIONS.disp       Diagnostic output of algorithm. Set to     1%                        zero for no diagnostic output.%%     OPTIONS.regpar     Regularization parameter.                  not set %                        For ridge regression, set regpar to %                        sqrt(eps) for mild regularization; leave %                        regpar unset for GCV selection of%                        regularization parameters.%                        For TTLS regression, regpar must be set%                        and is a fixed truncation parameter. %%     OPTIONS.relvar_res Minimum relative variance of residuals.    5e-2%                        From the parameter OPTIONS.relvar_res, a%                        lower bound for the regularization %                        parameter is constructed, in order to %                        prevent GCV from erroneously choosing %                        too small a regularization parameter.%  %     OPTIONS.minvarfrac Minimum fraction of total variation in     0%                        standardized variables that must be %                        retained in the regularization.%                        From the parameter OPTIONS.minvarfrac, %                        an approximate upper bound for the %                        regularization parameter is constructed. %                        The default value OPTIONS.minvarfrac = 0 %                        essentially corresponds to no upper bound %                        for the regularization parameter.   %  %     OPTIONS.Xmis0      Initial imputed values. Xmis0 is a         not set%                        (possibly sparse) matrix of the same %                        size as X with initial guesses in place%                        of the NaNs in X.  %  %     OPTIONS.C0         Initial estimate of covariance matrix.     not set%                        If no initial covariance matrix C0 is %                        given but initial estimates Xmis0 of the %                        missing values are given, the sample %                        covariance matrix of the dataset %                        completed with initial imputed values is %                        taken as an initial estimate of the %                        covariance matrix. %  %     OPTIONS.Xcmp       Display the weighted rms difference        not set%                        between the imputed values and the %                        values given in Xcmp, a matrix of the %                        same size as X but without missing %                        values. By default, REGEM displays %                        the rms difference between the imputed %                        values at consecutive iterations. The %                        option of displaying the difference %                        between the imputed values and reference %                        values exists for testing purposes.%%     OPTIONS.neigs      Number of eigenvalue-eigenvector pairs     not set%                        to be computed for TTLS regression. %                        By default, all nonzero eigenvalues and %                        corresponding eigenvectors are computed. %                        By computing fewer (neigs) eigenvectors, %                        the computations can be accelerated, but %                        the residual covariance matrices become %                        inaccurate. Consequently, the residual %                        covariance matrices underestimate the %                        imputation error conditional covariance %                        matrices more and more as neigs is %                        decreased.  %    References: %    [1] T. Schneider, 2001: Analysis of incomplete climate data:%        Estimation of mean values and covariance matrices and%        imputation of missing values. Journal of Climate, 14,%        853--871.  %    [2] R. J. A. Little and D. B. Rubin, 1987: Statistical%        Analysis with Missing Data. Wiley Series in Probability%        and Mathematical Statistics. (For EM algorithm.) %    [3] P. C. Hansen, 1997: Rank-Deficient and Discrete Ill-Posed%        Problems: Numerical Aspects of Linear Inversion. SIAM%        Monographs on Mathematical Modeling and Computation.%        (For regularization techniques, including the selection of %        regularization parameters.)    error(nargchk(1, 2, nargin))     % check number of input arguments     if ndims(X) > 2,  error('X must be vector or 2-D array.'); end  % if X is a vector, make sure it is a column vector (a single variable)  if length(X)==prod(size(X))          X = X(:);                        end   [n, p]       = size(X);  % number of degrees of freedom for estimation of covariance matrix  dofC         = n - 1;            % use degrees of freedom correction           % ==============           process options        ========================  if nargin ==1 | isempty(options)    fopts      = [];  else    fopts      = fieldnames(options);  end  % initialize options structure for regression modules  optreg       = [];    if strmatch('regress', fopts)    regress    = lower(options.regress);    switch regress     case {'mridge', 'iridge'}      % OK     case {'ttls'}            if ~strmatch('regpar', fopts)	error('Truncation parameter for TTLS regression must be given.')      else	trunc  = min([options.regpar, n-1, p]);      end            if strmatch('neigs', fopts)	neigs  = options.neigs;      else	neigs  = min(n-1, p);      end           otherwise            error(['Unknown regression method ', regress])          end  else    regress    = 'mridge';  end    if strmatch('stagtol', fopts)    stagtol    = options.stagtol;  else    stagtol    = 5e-3;  end  if strmatch('maxit', fopts)    maxit      = options.maxit;  else    maxit      = 30;  end    if strmatch('inflation', fopts)    inflation  = options.inflation;  else    inflation  = 1;  end    if strmatch('relvar_res', fopts)    optreg.relvar_res = options.relvar_res;   else    optreg.relvar_res = 5e-2;   end    if strmatch('minvarfrac', fopts)    optreg.minvarfrac = options.minvarfrac;   else    optreg.minvarfrac = 0;   end  h_given      = 0;  if strmatch('regpar', fopts)    h_given    = 1;    optreg.regpar = options.regpar;    if strmatch(regress, 'iridge')      regress  = 'mridge';

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -