⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 treec.m

📁 The pattern recognition matlab toolbox
💻 M
📖 第 1 页 / 共 2 页
字号:
%TREEC Build a decision tree classifier% %   W = TREEC(A,CRIT,PRUNE,T)% % Computation of a decision tree classifier out of a dataset A using % a binary splitting criterion CRIT:%   INFCRIT  -  information gain%   MAXCRIT  -  purity (default)%   FISHCRIT -  Fisher criterion% % Pruning is defined by prune:%   PRUNE = -1 pessimistic pruning as defined by Quinlan. %   PRUNE = -2 testset pruning using the dataset T, or, if not%              supplied, an artificially generated testset of 5 x size of%              the training set based on parzen density estimates.%              see PARZENML and GENDATP.%   PRUNE = 0 no pruning (default).%   PRUNE > 0 early pruning, e.g. prune = 3%   PRUNE = 10 causes heavy pruning.% % If CRIT or PRUNE are set to NaN they are optimised by REGOPTC.%% see also DATASETS, MAPPINGS, TREE_MAP, REGOPTC% Copyright: R.P.W. Duin, r.p.w.duin@prtools.org% Faculty EWI, Delft University of Technology% P.O. Box 5031, 2600 GA Delft, The Netherlands% $Id: treec.m,v 1.8 2008/01/25 10:13:29 duin Exp $function w = treec(a,crit,prune,t)	prtrace(mfilename);	% When no input data is given, an empty tree is defined:	if nargin == 0 | isempty(a)		if nargin <2, 			w = mapping('treec');		elseif nargin < 3, w = mapping('treec',{crit});		elseif nargin < 4, w = mapping('treec',{crit,prune});		else, w = mapping('treec',{crit,prune,t});		end		w = setname(w,'Decision Tree');		return	end		if nargin < 3, prune = []; end	if nargin < 2, crit = []; end	parmin_max = [1,3;-1,10];	optcrit = inf;	if isnan(crit) & isnan(prune)        % optimize criterion and pruning, grid search		global REGOPT_OPTCRIT REGOPT_PARS		for n = 1:3			defs = {n,0};			v = regoptc(a,mfilename,{crit,prune},defs,[2],parmin_max,testc([],'soft'),[0,0]);			if REGOPT_OPTCRIT < optcrit				w = v; optcrit = REGOPT_OPTCRIT; regoptpars = REGOPT_PARS;			end		end		REGOPT_PARS = regoptpars;	elseif isnan(crit)                    % optimize criterion 		defs = {1,0};		w = regoptc(a,mfilename,{crit,prune},defs,[1],parmin_max,testc([],'soft'),[0,0]);	elseif isnan(prune)                    % optimize pruning		defs = {1,0};		w = regoptc(a,mfilename,{crit,prune},defs,[2],parmin_max,testc([],'soft'),[0,0]);			else %  training for given parameters			islabtype(a,'crisp');		isvaldfile(a,1,2); % at least 1 object per class, 2 classes		a = testdatasize(a);		% First get some useful parameters:		[m,k,c] = getsize(a);		nlab = getnlab(a);		% Define the splitting criterion:		if nargin == 1 | isempty(crit), crit = 2; end		if ~isstr(crit)			if crit == 0 | crit == 1, crit = 'infcrit'; 			elseif crit == 2, crit = 'maxcrit';			elseif crit == 3, crit = 'fishcrit';			else, error('Unknown criterion value');			end		end		% Now the training can really start:		if (nargin == 1) | (nargin == 2)			tree = maketree(+a,nlab,c,crit);		elseif nargin > 2			% We have to apply a pruning strategy:			if prune == -1, prune = 'prunep'; end			if prune == -2, prune = 'prunet'; end			% The strategy can be prunep/prunet:			if isstr(prune)				tree = maketree(+a,nlab,c,crit);				if prune == 'prunep'					tree = prunep(tree,a,nlab);				elseif prune == 'prunet'					if nargin < 4						t = gendatp(a,5*sum(nlab==1));					end					tree = prunet(tree,t);				else					error('unknown pruning option defined');				end			else				% otherwise the tree is just cut after level 'prune'				tree = maketree(+a,nlab,c,crit,prune);			end		else			error('Wrong number of parameters')		end		% Store the results:		w = mapping('tree_map','trained',{tree,1},getlablist(a),k,c);		w = setname(w,'Decision Tree');		w = setcost(w,a);			end	return%MAKETREE General tree building algorithm% % 	tree = maketree(A,nlab,c,crit,stop)% % Constructs a binary decision tree using the criterion function% specified in the string crit ('maxcrit', 'fishcrit' or 'infcrit' % (default)) for a set of objects A. stop is an optional argument % defining early stopping according to the Chi-squared test as % defined by Quinlan [1]. stop = 0 (default) gives a perfect tree % (no pruning) stop = 3 gives a pruned version stop = 10 a heavily % pruned version. % % Definition of the resulting tree:% % 	tree(n,1) - feature number to be used in node n% 	tree(n,2) - threshold t to be used% 	tree(n,3) - node to be processed if value <= t% 	tree(n,4) - node to be processed if value > t% 	tree(n,5:4+c) - aposteriori probabilities for all classes in% 			node n% % If tree(n,3) == 0, stop, class in tree(n,1)% % This is a low-level routine called by treec.% % See also infstop, infcrit, maxcrit, fishcrit and mapt.% Authors: Guido te Brake, TWI/SSOR, Delft University of Technology%     R.P.W. Duin, TN/PH, Delft University of Technology% Copyright: R.P.W. Duin, duin@ph.tn.tudelft.nl% Faculty of Applied Physics, Delft University of Technology% P.O. Box 5046, 2600 GA Delft, The Netherlandsfunction tree = maketree(a,nlab,c,crit,stop) 	prtrace(mfilename);	[m,k] = size(a); 	if nargin < 5, stop = 0; end;	if nargin < 4, crit = []; end;	if isempty(crit), crit = 'infcrit'; end;	% Construct the tree:	% When all objects have the same label, create an end-node:	if all([nlab == nlab(1)]) 		% Avoid giving 0-1 probabilities, but 'regularize' them a bit using		% a 'uniform' Bayesian prior:		p = ones(1,c)/(m+c); p(nlab(1)) = (m+1)/(m+c);		tree = [nlab(1),0,0,0,p];	else		% now the tree is recursively constructed further:		[f,j,t] = feval(crit,+a,nlab); % use desired split criterion		if isempty(t)			crt = 0;		else			crt = infstop(+a,nlab,j,t);    % use desired early stopping criterion		end		p = sum(expandd(nlab),1);		if length(p) < c, p = [p,zeros(1,c-length(p))]; end		% When the stop criterion is not reached yet, we recursively split		% further:		if crt > stop			% Make the left branch:			J = find(a(:,j) <= t);			tl = maketree(+a(J,:),nlab(J),c,crit,stop);			% Make the right branch:			K = find(a(:,j) > t);			tr = maketree(+a(K,:),nlab(K),c,crit,stop);			% Fix the node labelings before the branches can be 'glued'			% together to a big tree:			[t1,t2] = size(tl);			tl = tl + [zeros(t1,2) tl(:,[3 4])>0 zeros(t1,c)];			[t3,t4] = size(tr);			tr = tr + (t1+1)*[zeros(t3,2) tr(:,[3 4])>0 zeros(t3,c)];			% Make the complete tree: the split-node and the branches:			tree= [[j,t,2,t1+2,(p+1)/(m+c)]; tl; tr]; 		else			% We reached the stop criterion, so make an end-node:			[mt,cmax] = max(p);			tree = [cmax,0,0,0,(p+1)/(m+c)];		end	end	return%MAXCRIT Maximum entropy criterion for best feature split.% % 	[f,j,t] = maxcrit(A,nlabels)% % Computes the value of the maximum purity f for all features over % the data set A given its numeric labels. j is the optimum feature,% t its threshold. This is a low level routine called for constructing% decision trees.% % [1] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, % Classification and regression trees, Wadsworth, California, 1984. % Copyright: R.P.W. Duin, duin@ph.tn.tudelft.nl % Faculty of Applied Physics, Delft University of Technology% P.O. Box 5046, 2600 GA Delft, The Netherlandsfunction [f,j,t] = maxcrit(a,nlab)	prtrace(mfilename);	[m,k] = size(a);	c = max(nlab);	% -variable T is an (2c)x k matrix containing:	%      minimum feature values class 1	%      maximum feature values class 1	%      minimum feature values class 2	%      maximum feature values class 2	%            etc.	% -variable R (same size) contains:	%      fraction of objects which is < min. class 1.	%      fraction of objects which is > max. class 1.	%      fraction of objects which is < min. class 2.	%      fraction of objects which is > max. class 2.	%            etc.	% These values are collected and computed in the next loop:	T = zeros(2*c,k); R = zeros(2*c,k);	for j = 1:c		L = (nlab == j);		if sum(L) == 0			T([2*j-1:2*j],:) = zeros(2,k);			R([2*j-1:2*j],:) = zeros(2,k);		else			T(2*j-1,:) = min(a(L,:),[],1);			R(2*j-1,:) = sum(a < ones(m,1)*T(2*j-1,:),1);			T(2*j,:) = max(a(L,:),[],1);			R(2*j,:) = sum(a > ones(m,1)*T(2*j,:),1);		end	end	% From R the purity index for all features is computed:	G = R .* (m-R);	% and the best feature is found:	[gmax,tmax] = max(G,[],1);	[f,j] = max(gmax);	Tmax = tmax(j);	if Tmax ~= 2*floor(Tmax/2)		t = (T(Tmax,j) + max(a(find(a(:,j) < T(Tmax,j)),j)))/2;	else		t = (T(Tmax,j) + min(a(find(a(:,j) > T(Tmax,j)),j)))/2;	end

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -