⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 logda.m

📁 Implementation to linear, quadratic and logistic discriminant analysis, for examples
💻 M
字号:
function [f, iter, dev, hess] = logda(X, k, prior, maxit, est)%LOGDA Logistic Discriminant Analysis.%   F = LOGDA(X, K, PRIOR) returns a logistic discriminant analysis%   object F based on the feature matrix X, class indeces in K and the%   prior probabilities in PRIOR where PRIOR is optional. See the help%   for LOGDA's parent object CLASSIFIER for information on the input%   arguments X, K and PRIOR.%%   In addition to the fields defined by the CLASSIFIER class, F%   contains the following field:%%   COEFS: a g-1 by p+1 matrix where g is the number of groups in the%   class index K and p is the number of features or columns in%   X. These are the coefficients from the multiple logistic%   regression of the feature matrix on the n by g indicator matrix G%   of K. The parameters for the first class (indexed by 1) in K is%   assumed to have all 0 coefficients and is not represented in the%   array. The ceofficients for the remaining classes are given in the%   rows. The first column represents the intercept or bias term and%   the remaining columns represent the p variables or features.%%   LOGDA(X, K, PRIOR, MAXITER) where MAXITER is a positive integer%   aborts the algorithm after that many iterations. The default value%   depends on the algorithm used (described below).%%   LOGDA(X, K, PRIOR, MAXITER, EST) where EST is one of 'glm' or%   'slp' uses either a generalised linear model or a single layer%   perceptron to minimise the residual deviance of the model. For a%   generalised linear model, the residual deviance is minimised using%   an iterative weighted non-linear regression using either a%   binomial link and variance function for a two class problem or the%   conditional poisson link and variance function for more than%   two. The single layer perceptron uses a variable metric conjugate%   gradient descent algorithm in which the conditional posterior%   probabilities are used to determine the partial derivatives (see%   SOFTMAX). For two categories, the default EST is 'glm' while for%   more than two classes 'slp' is used unless the Hessian matrix is%   specifically requested, in which case the default EST is%   'slp'. Usually 'slp' is faster for large problems with more than%   two problems, but 'glm' may be optimal for small data sets. For%   data sets with only two classes, 'glm' is always faster. When%   specifying EST, only the first few disambiguating letters need be%   given: i.e., 'g' or 's'.%%   Either MAXITER or EST may be omitted in which case default%   values are assigned.%%   LOGDA(X, K, OPTS) allows optional arguments to be passed in the%   fields of the structure OPTS. Fields that are used by LDA are%   PRIOR, MAXITER and EST.%%   [F, ITER, DEV, HESS] = LOGDA(X, K, ...) additionally returns the%   number of iterations required by the algorithm before convergence%   in ITER, the residual deviance for the fit in DEV, and the Hessian%   matrix of the coefficients in HESS. HESS is a square matrix with%   (g-1)*(p+1) rows and columns. The coefficients are ordered with%   group categories varying fastest and with the first variate%   representing the bias or offset term (i.e., as if the matrix%   F.COEFS were vectorised F.COEFS(:)). The eigenvalues of HESS%   should be all positive indicating convergance to a global%   minimum. In order to return HESS, EST must be 'slp' if given.%%   LOGDA(X, G, MAXIT, EST) where G is a p by g matrix of posterior%   probabilities or counts, models this instead of absolute class%   memberships. If G represents counts, all of its values must be%   positive integers. Otherwise the rows of G represent posterior%   probabilities and must all sum to 1. It is an error to give the%   argument PRIOR in this case. If G represents posterior%   probabilities, F.PRIOR will be calculated as the normalised sum of%   the columns of G and F.COUNTS will be a scalar value representing%   the number of observations. Otherwise, F.COUNTS will be the sum of%   the columns and F.PRIOR will represent the observed prior%   distribution.%%   See also CLASSIFER, LDA, QDA, SOFTMAX.%%   Example:%   %generate artificial data with 3 classes and 2 variates%   r = randn(2);%   C = r'*r;     % random positive definite symmetric matrix%   M = randn(3, 2)*2;    % random means%   k = ceil(rand(400, 1)*3);    % random class indeces%   X = randn(400, 2)*chol(C) + M(k, :);  %   f = logda(X, k); disp(f)%   g = lda(X, k);%   plotobs(X, k); hold on, plotdr(f)%   plotdr(g), plotcov(g)%%   References:%   P McCullagh and J. A. Nelder (1989) Generalized Linear%   Models. Second Edition. Chapman & Hall.%   B. D. Ripley (1996) Pattern Classification and Neural%   Networks. Cambridge.%   Copyright (c) 1999 Michael Kiefte.%   $Log$error(nargchk(2, 5, nargin))if nargin > 2 & isstruct(prior)  if nargin > 3    error(sprintf(['Cannot have arguments following option struct:\n' ...		   '%s'], nargchk(3, 3, 4)))      end  [prior maxit est] = parseopt(prior, 'prior', 'maxiter', 'est');elseif nargin < 5  est = [];  if nargin < 4    maxit = [];    if nargin < 3      prior = [];    end  endendif nargin >= 3 & isa(prior, 'double') & length(prior) == 1 & ...	   prior ~= 1  est = maxit;  maxit = prior;  prior = [];elseif nargin == 3 & ischar(prior)  est = prior;  prior = [];elseif nargin == 4 & ischar(maxit)  est = maxit;  maxit = [];end[n p] = size(X);if prod(size(k)) ~= length(k)  if ~isempty(prior)    error(sprintf(['Cannot give prior probabilities PRIOR with' ...		   ' incidence matrix\nor posterior probabilities G:\n' ...		   '%s'], nargchk(2, 3, 4)))  end    [h G w] = classifier(X, k);  g = size(G, 2);  logG = G;  logG(find(G)) = log(G(find(G)));else  [h G] = classifier(X, k, prior);  nj = h.counts;  g = length(nj);  w = (nj./(n*h.prior))';  w = w(k);  logG = 0;endif isempty(est)  if nargout == 4    est = 1;  elseif g == 2    est = 0;  else    est = 1;  endelseif ~ischar(est) | ndims(est) ~= 2 | length(est) ~= size(est, 2) ...      | size(est, 1) ~= 1  error('EST must be a string.')else  est = find(strncmp(est, {'glm', 'slp'}, length(est)));  if isempty(est)    error('EST must be one of ''glm'' or ''slp''.')  end  est = est - 1;    if nargout == 4 & ~est    error('Must use single layer perceptron to output Hessian.')  endendif isempty(maxit)  if est    maxit = 100;  else     maxit = 10;  endelseif ~isa(maxit, 'double') | length(maxit) ~= 1 | ~isreal(maxit) | ...      round(maxit) ~= maxit | maxit < 0  error('Maximum number of iterations MAXITER must be a positive integer.')endrange = h.range;if est  X = (X - range(ones(n,1),:)) * diag(1./diff(range));endwarning = 'off';trace = ~strcmp(warning, 'off');if est  col = sparse(n+1:g*n, repmat(1:g-1, n, 1), 1);  U = [col [col(:, repmat((1:g-1)', 1, p)) .* ...	    repmat(X(:,repmat(1:p, g-1, 1)), g, 1)]];  delta = w(:, ones(1, g-1)) .* (1/g - G(:,2:end));  grad = sum([delta, delta(:,repmat((1:g-1)', 1, p)) .* ...	      X(:, repmat(1:p, g-1, 1))])';  H = eye((g-1)*(p+1));  Dold = sum(w' * (G .* (logG + log(g)) - G + 1/g));  betaold = zeros((g-1)*(p+1), 1);elseif g == 2  U = [ones(n, 1) X];  mu = (G(:,2) + .5)/2;  eta = log(mu./(1 - mu));  eeta = exp(eta);  Dold = sum(w'*(G.*(logG + log(2))));else  col = sparse(n+1:g*n, repmat(1:g-1, n, 1), 1);  U = [col [col(:, repmat((1:g-1)', 1, p)) .* ...	    repmat(X(:,repmat(1:p, g-1, 1)), g, 1)], ...      sparse(1:g*n, repmat((1:n)', 1, g), 1)];  mu = full(G + .1);  eta = log(mu);  Dold = sum(w'*(G.*(logG + log(g)) - G + 1/g));end  for iter = 1:maxit  if est    dir = -H*grad;    Dp = grad' * dir;    lambda = [1 0]';    lambdamin = 2*eps*max(abs(dir)./max(abs(betaold), 1));    while 1      if lambda(1) < lambdamin	beta = betaold;	break      end      beta = betaold + lambda(1)*dir;      eta = reshape(U*beta, n, g);      mu = exp(eta - repmat(max(eta, [], 2), 1, g));      mu = mu./repmat(sum(mu, 2), 1, g);      if any(any(~mu & G))	D = inf;      else	logmu = G;	logmu(find(G)) = log(mu(find(G)));	D = sum(w' * (G .* (logG - logmu) - G + mu));      end      if D <= Dold + 1e-4*Dp*lambda(1)	break      elseif lambda(1) == 1	lambda = [-Dp/(2*(D - Dold - Dp)); 1];      else	ab = [1 -1; -lambda(2) lambda(1)] * diag(1./lambda.^2) * ...	     ([D; D2] - Dp*lambda - Dold) / diff(lambda);	lambda(2) = lambda(1);	if ab(1) == 0	  if ab(2) == 0	    break	  end	  lambda(1) = -Dp/(2*ab(2));	else	  lambda(1) = (-ab(2) + sqrt(ab(2)^2 - 3*ab(1)*Dp))/(3*ab(1));	end      end            if ~isreal(lambda)	lambda(1) = .1*lambda(2);      else	lambda(1) = max(min(lambda(1), .5*lambda(2)), ...			.1* lambda(2));      end      D2 = D;    end  elseif g == 2    deta = eeta./(1 + eeta).^2;    W = sqrt(w) .* deta ./ sqrt(mu .* (1 - mu));    z = eta + (G(:,2) - mu)./deta;    beta = (W(:,ones(p+1,1)) .* U) \ (W .* z);    eta = U*beta;    eeta = exp(eta);    mu = eeta./(1 + eeta);    logmu = [1-mu, mu];    if any(any(~logmu & G))      error('Deviance infinite.')    end    logmu(find(G)) = log(logmu(find(G)));    D = sum(w' * (G .* (logG - logmu)));  else    W = sparse(1:n, 1:n, sqrt(w)) * mu ./ sqrt(mu);    z = eta + (G - mu)./mu;    beta = (sparse(1:g*n, 1:g*n, W) * U) \ reshape((W .* z), n*g, 1);    eta = reshape(U*beta, n, g);    mu = exp(eta);    if any(any(~mu & G))      error('Deviance infinite.')    end    logmu = G;    logmu(find(G)) = log(mu(find(G)));    D = sum(w' * (G .* (logG - logmu) - G + mu));  end  if trace    disp(sprintf('Iter: %d; Dev: %g', iter, 2*D))  end    if Dold - D < 0    warning('Deviance diverged.')    beta = betaold;    D = Dold;    break  elseif Dold - D < D*n*eps    if trace      disp('Deviance converged.')    end    break  end    if est    grad1 = grad;    delta = mu(:,2:end) - G(:,2:end);    grad = [delta, (delta(:,repmat((1:g-1)', 1, p)) .* ...		    X(:, repmat(1:p, g-1, 1)))]' * w;    dir = beta - betaold;    if max(dir./max(beta, 1)) < 4*eps      if trace	disp('Gradient converged.')      end      break    end    dg = grad - grad1;    pdg = dir'*dg;    Hdg = H*dg;    gHg = dg'*Hdg;    u = dir/pdg - Hdg/gHg;    H = H + dir*dir'/pdg - Hdg*Hdg'/gHg + gHg*u*u';  end    Dold = D;  betaold = beta;endif ~est & g > 2  coefs = reshape(beta(1:(g-1)*(p+1)), g-1, p+1);else  coefs = reshape(beta, g-1, p+1);endif est  coefs(:,2:p+1) = coefs(:,2:p+1) * diag(1./diff(range));  coefs(:,1) = coefs(:,1) - coefs(:,2:p+1) * range(1,:)';endf = class(struct('coefs', coefs), 'logda', h);  if nargout > 2  dev = 2*D;    if nargout > 3    hess = inv(H);  endend

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -