📄 softmax.m
字号:
function [f, iter, dev, hess] = softmax(X, k, prior, varargin)%SOFTMAX Multinomial feed-forward neural-network% F = SOFTMAX(X, K, PRIOR) returns a SOFTMAX object containing the% weights of a feed-forward neural network trained to minimise the% multinomial log-likelihood deviance based on the feature matrix X,% class indeces in K and the prior probabilities in PRIOR where% PRIOR is optional. See the help for SOFTMAX's parent object class% CLASSIFIER for information on the input arguments X, K, and% PRIOR. Traditional neural networks minimise the sum squared error,% whereas this model assumes that the outputs are a Poisson process% conditional on their sum and calculates the error as the residual% deviance.%% In addition to the fields defined by the CLASSIFIER class, F% contains the following field:%% WEIGHTS: a sparse matrix representing the optimised connection% weights where rows represent connections from units that feed% other units and columns represent connections to units that are% fed by other units. Each non-zero value in this matrix represents% a weight connecting unit i to unit j where i is the row and j is% the column. There are p+1 input units that are not fed by other% units (i.e., the first p+1 columns are all zeros). The first unit% always represents the bias, while the following p units represent% the inputs to the entire network (i.e., from X). In addition there% are g-1 output units where g represents the number of different% classes in k. Because output probabilities are normalised over the% sum of the exponents, it is assumed that the first class recieves% all zero weights and is therefore not explicitly represented in% the weight matrix. Output units do not feed other units and% therefore there are g-1 less rows than columns (assuming that the% missing rows are all zero). All other units are referred to as% hidden units and both feed and are fed by other units.%% Because of argument structure ambiguity, PRIOR is not optional% when using other options. The default can be assigned by giving% an empty PRIOR = [].%% SOFTMAX(X, K, PRIOR, NUNITS, SKIP) where NUNITS is a scalar% positive integer and SKIP is either 0 or 1 specifies how many% hidden units are present in a single hidden layer neural% network. The model is fully connected between adjacent layers. If% SKIP is 1, input units are additionally connected to output% units. SKIP must be specified when there is only a single hidden% layer. If NUNITS is 0, SKIP must also be 0.%% SOFTMAX(X, K, PRIOR, NUNITS) where NUNITS is a vector of positive,% non-zero integers of length n specifies how many units are present% in each of n hidden layers. All adjacent layers are fully% connected, however it is an error to specify a SKIP. If skip% wieghts are desired, the weight matrix must be given explicitly% (see below).%% SOFTMAX(X, K, PRIOR, WEIGHTS, MASK) where WEIGHTS is a matrix% similar to F.WEIGHTS described above, uses the connections and% starting weights specified in the matrix. MASK is optional. If% given MASK is a matrix the same size as weights consisting of all% 1's and 0's indicating which weights are to be optimised by the% training algorithm. This allows the optimisation of weights that% are initially 0 as well as the ability to keep some non-zero% weights fixed.%% SOFTMAX(X, K, PRIOR, MASK) is equivalent to SOFTMAX(X, K, PRIOR,% WEIGHTS, MASK) where WEIGHTS are assigned randomly. If the initial% random weights used by the training algorithm are needed, the MASK% argument (or the NUNITS plus SKIP arguments) can be used with a% value of 0 for MAXITER (see below).%% By default, SOFTMAX uses no hidden units with skip weights which% is functionally equivalent to a logistic discriminant analysis% (see LOGDA). However, SOFTMAX will be much slower as the% algorithm has been generalised for hidden units.%% SOFTMAX(X, K, PRIOR, ..., DECAY) where DECAY is a positive scalar% value less than 1 gives the weight decay for the model. The% default decay is 0. DECAY forces the estimate of the residual% deviance to be penalised by the magnitude of the estimated% weights. Typical values range from .01 for a very large DECAY to a% moderate value 10e-6. Because SOFTMAX initially normalises the% inputs, this value is independent of the range of X. (However,% SOFTMAX rescales the returned weights so that rescaling of input% values is not necessary when classifying new data.)%% SOFTMAX(X, K, PRIOR, ..., DECAY, MAXITER) where MAXITER is a% positive integer aborts the algorithm after that many% iterations. The default value is 200. If a value of 0 is given% as MAXITER the algorithm terminates before optimising the% connection weights. This is useful for returning a random% matrix of weights which can be later manipulated before% optimisation. However, if MAXITER is 0, a DECAY value must be% given to avoid ambiguity in the arguments.%% SOFTMAX(X, K, PRIOR, ..., MAXITER) is otherwise equivalent to% supplying a DECAY of 0 (unless MAXITER is also 0---see above).%% SOFTMAX(X, K, OPTS) allows optional arguments to be passed in the% fields of the structure OPTS. Fields that are used by SOFTMAX are% PRIOR, NUNITS, SKIPFLAG, WEIGHTS, MASK, DECAY, and% MAXITER. However, neither NUNITS nor SKIP may be specified with% either WEIGHTS or MASK.%% [F, NITER, DEV, HESS] = SOFTMAX(X, k, ...) Additionally returns% the number of iterations required by the algorithm before% convergence in NITER, the residual deviance for the fit in DEV and% the Hessian matrix of the weights in HESS. HESS is a square matrix% where each row and column represents a single weight. The weights% are ordered according to the vectorised weight matrix% F.WEIGHTS(:);%% SOFTMAX(X, G, ...) where G is a p by g matrix of posterior% probabilities or counts, models this instead of absolute class% memberships. If G represents counts, all of its values must be% positive integers. Otherwise the rows of G represent posterior% probabilities and must all sum to 1. It is an error to give the% argument PRIOR in this case. If G represents posterior% probabilities, F.PRIOR will be calculated as the normalised sum of% the columns of G and F.COUNTS will be a scalar value representing% the number of observations. Otherwise, F.COUNTS will be the sum of% the columns and F.PRIOR will represent the observed prior% distribution.%% SOFTMAX(F) where F is an object of class LOGDA returns the% SOFTMAX equivalent of the logistic discriminant analysis.%% See also CLASSIFIER, LDA, QDA, LOGDA.%% Notes:% The argument structure can be rather complicated. The program% tries to figure out which argument is which heuristically, but% it's probably easy to defeat it. Arguments that are passed to% SOFTMAX must be in the order described above although they may% be entirely omitted allowing defaults to be used instead.%% References:% B. D. Ripley (1996) Pattern Classification and Neural% Networks. Cambridge.% Copyright (c) 1999 Michael Kiefte.% $Id: softmax.m,v 1.1 1999/06/04 18:50:50 michael Exp $% $Log: softmax.m,v $% Revision 1.1 1999/06/04 18:50:50 michael% Initial revision%if isa(X, 'logda') error(nargchk(1, 1, nargin)) weights = [sparse(X.nvar+1, X.nvar+1) X.coefs']; f = class(struct('weights', weights), 'softmax', X.classifier); returnenderror(nargchk(2, 7, nargin))if nargin > 2 & isstruct(prior) % using option structure if nargin > 3 error(sprintf(['Cannot have arguments following option struct:\n' ... '%s'], nargchk(3, 3, 4))) end [prior nhid skip weights mask decay maxit] = ... parseopt(prior, 'prior', 'nunits', 'skip', 'weights', 'mask', ... 'decay', 'maxiter'); if (~isempty(nunits) | ~isempty(skip)) & (~isempty(weights) | ... ~isempty(mask)) error(['May not specify NUNITS or SKIPFLAG with either WEIGHTS' ... ' or MASK.']) endelseif nargin < 3 prior = [];end[n p] = size(X);if prod(size(k)) ~= length(k) % Multinomial incidence matrix or posterior probabilities if length(varargin) > 4 error(sprintf(['Assuming second argument is an incidence matrix' ... ' of multinomial counts\nor posterior probabilities:' ... ' %s'], nargchk(0, 4, 5))) end [h G w] = classifier(X, k); g = size(G, 2); logG = G; logG(find(G)) = log(G(find(G)));else % Vector of class indeces [h G] = classifier(X, k, prior); nj = h.counts; g = length(nj); w = (nj./(n*h.prior))'; w = w(k); logG = 0;end% Normalise inputs between (0, 1)range = h.range;X = (X - repmat(range(1,:), n, 1)) * diag(1./diff(range));trace = ~strcmp(warning, 'off');% varargin will be in this order:weights = [];mask = [];nhid = [];skip = [];decay = [];maxit = [];if length(varargin) % all arguments are real doubles if ~isempty(varargin{1}) & isa(varargin{1}, 'double') & ... isreal(varargin{1}) if prod(size(varargin{1})) ~= length(varargin{1}) %specify weights as matrix if length(varargin) >= 2 & ... all(size(varargin{2}) == size(varargin{1})) %with mask matrix if length(varargin) > 4 error(sprintf(['Assuming fifth argument is MASK:' ... ' %s'], nargchk(2, 4, 5))) end varargin = [varargin(1:2), repmat({[]}, 1, 2), varargin(3:end)]; elseif all(nonzeros(varargin{1}) == 1) %only mask matrix if length(varargin) > 3 error(sprintf(['Assuming fourth argument is MASK:' ... ' %s'], nargchk(1, 3, 4))) end varargin = [{[]}, varargin(1), repmat({[]}, 1, 2), ... varargin(2:end)]; else %without mask matrix if length(varargin) > 3 error(sprintf(['Assuming fourth argument is WEIGHTS:' ... ' %s'], nargchk(1, 3, 4))) end varargin = [varargin(1), repmat({[]}, 1, 3), ... varargin(2:end)]; end elseif length(varargin{1}) > 1 % specify number of units in each hidden layer if length(varargin) > 3 error(sprintf(['Assuming fourth argument is the number of' ... ' hidden units\nin each hidden layer:' ... ' %s'], nargchk(1, 3, 4))) end varargin = [repmat({[]}, 1, 2), varargin(1), {[]}, ... varargin(2:end)]; elseif round(varargin{1}) == varargin{1} if length(varargin) >= 2 & isa(varargin{2}, 'double') & ... isreal(varargin{2}) & length(varargin{2}) == 1 & ... (varargin{2} == 1 | varargin{2} == 0) % single hidden layer with skip flag if length(varargin) > 4 error(sprintf(['Assuming fifth argument is SKIPFLAG:\n' ... ' %s'], nargchk(2, 4, 5))) end varargin = [repmat({[]}, 1, 2), varargin]; else % third argument is maximum number of iterations if length(varargin) > 1 error(sprintf(['Assuming fourth argument is MAXITER:' ... ' %s'], nargchk(1, 1, 2))) end varargin = [repmat({[]}, 1, 5), varargin]; end else % third argument is decay if length(varargin) > 2 error(sprintf('Assuming fourth argument DECAY: %s', ... nargchk(1, 2, 3))) end varargin = [repmat({[]}, 4, 1), varargin]; end else error('Can''t figure out what third argument should be.') end if length(varargin) == 5 & isa(varargin{5}, 'double') & ... length(varargin{5}) == 1 & ... round(varargin{5}) == varargin{5} % maxiter in decay position varargin(5:6) = [{[]}, varargin(5)]; end if length(varargin) < 6 varargin{6} = []; end [weights mask nhid skip decay maxit] = deal(varargin{:});endif isempty(decay) decay = 0;elseif ~isa(decay, 'double') | ~isreal(decay) | length(decay) ~= 1 | ... decay < 0 | decay >= 1 | isnan(decay) error('DECAY must be a positive scalar less than 1.')endif ~isempty(weights) | ~isempty(mask) normw = 1;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -