📄 ex_bic.m

📁 隐马尔可夫工具箱
💻 M
字号:
%ex_bic    Example of cluster analysis with varying number of mixture%          components using the BIC (Bayesian Information Criterion) to assess%          the fit provided by mixture model of different sizes.%%       The funny thing here is to play with the parameters of the simulated%       data (mixture parameters and number of outliers) and with the%       estimation parameters (initialization procedure and number of%       iterations). It turns out that the decision is somewhat sensitive to%       these (although it usually gives 3)...%%	Beware : This is not a real demo, so you should take a look at the%       source file before executing it if you wan't to understand what's%       going on! If using octave --traditional, you should also add the%       subdirectory h2m/octave to your loadpath using the path command.% H2M Toolbox, Version 2.0% O. Cappe, 19/05/99 - 24/08/2001% ENST Dpt. TSI / LTCI (CNRS URA 820), Parisclear;% Constant: Flag for full covariance matricesFULL_COV = 0;% Generate some 2D synthetic data (Gaussian mixture model with 3 compoenent% and some outliers)% Total number of observationsT = 200;% Number of outliersOUTL_NB = 5;% Mixture data% Data dimensionp = 2;% Number of componentsN = 3;w = [0.4 0.4 0.2];mu = [1 0; 0 1; -1 0];Sigma = [0.1 0; 0 0.1; 0.1 -0.03; -0.03 0.1; 0.2 0; 0 0.2];X = mix_gen(w, mu, Sigma, T-OUTL_NB);% Append outliers (this of course doesn't look like iid data anymore but the% following analysis would be exactly the same if the data points had been% shuffled randomly)X = [X; 4*(rand(OUTL_NB,2)-0.5)];% Cluster analysis% Maximal size of mixture modelNC_MAX = 5;% Values of the BIC criterionBIC = zeros(1,NC_MAX);% Number of EM iterationsNIT = 20;% NC = 1 corresponds to a gaussian modelNC = 1;w = 1;mu = mean(X);Sigma = cov(X);logl = sum(gauslogv(X, mu, Sigma,1));BIC(NC) = 2*logl-(p*NC+p*NC+NC-1)*log(T);% Plot data fitclf;% Data pointssubplot(3, 3, 1);plot(X(:,1), X(:,2), '.');axis([-2 2 -2 2]);% Gausian elipsoid with 50% of the datasubplot(3, 3, 2);title(['BIC = ' sprintf('%.0f', BIC(NC))]);gauselps(mu,Sigma,'-',0.5);axis([-2 2 -2 2]);drawnow;for NC = 2:NC_MAX  NC  % Here is an example of a simple heuristic for mixture-based hierarchic  % clustering which may be viewed as an alternative to vector quantization for  % the initialization of the model (it also has the advantage that one is not  % restricted to powers of 2 for NC) Find most represented class  [tmp, ind] = max(w);  % Find main axis  [V, D] = eig(Sigma(1+(ind-1)*p:ind*p,:));  [tmp, ind2] = max(diag(D));  % Duplicate one component  mu = [mu(1:ind-1,:); mu(ind,:)-sqrt(D(ind2,ind2))*V(:,ind2)'; ...	mu(ind,:)+sqrt(D(ind2,ind2))*V(:,ind2)'; mu(ind+1:NC-1,:)];  Sigma = [Sigma(1:ind*p,:); Sigma(1+(ind-1)*p:(NC-1)*p,:)];  % EM iterations  w = [w(1:ind-1) w(ind)/2 w(ind)/2 w(ind+1:NC-1)];  % Compute log-likelihood  [w, mu, Sigma, logl] = mix(X, w, mu, Sigma, NIT);  [gamma, logl] = mix_post(X, w, mu, Sigma, 1);  % and BIC criterion  BIC(NC) = 2*logl-(p*NC+p*NC+NC-1)*log(T);  % Plot data fit  subplot(3, 3, 1+NC);  % Gausian elipsoids with 50% of the data  title(['BIC = ' sprintf('%.0f', BIC(NC))]);  gauselps(mu,Sigma,'-',0.5);  axis([-2 2 -2 2]);  drawnow;end% When evrything is done report BIC criterion (best fit is given by the% maximal value)subplot(3,1,3);axis;plot(BIC);title('BIC');BIC
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -