📄 plsa_em.m

📁 一个学习自然场景类别的贝叶斯模型、基于“词袋”模型的目标分类。来源于Feifei Li的论文。是近年来的目标识别模型热点之一。
💻 M
字号:
% function [Pw_z,Pd_z,Pz,Li] = pLSA_EM(X,Fixed_Pw_z,K,Learn)%% Probabilistic Latent semantic alnalysis (pLSA)%% Notation:% X ... (m x nd) term-document matrix (observed data)%       X(i,j) stores number of occurrences of word i in document j%% m  ... number of words (vocabulary size)% nd ... number of documents% K  ... number of topics%% Fixed_Pw_z ... fixed Pw_z density (for recognition only)%                leave empty in learning%                N.B. Must be of size m by K%% Learn ... Structure holding all settings %% Li   ... likelihood for each iteration% Pz   ... P(z)% Pd_z ... P(d|z) % Pw_z ... P(w|z) corresponds to beta parameter in LDA%% Pz_wd ... P(z|w,d) posterior on z%% % References: % [1] Thomas Hofmann: Probabilistic Latent Semantic Analysis, % Proc. of the 15th Conf. on Uncertainty in Artificial Intelligence (UAI'99) % [2] Thomas Hofmann: Unsupervised Learning by Probabilistic Latent Semantic% Analysis, Machine Learning Journal, 42(1), 2001, pp.177.196 %% Josef Sivic% josef@robots.ox.ac.uk% 30/7/2004%% Extended by Rob Fergus% fergus@csail.mit.edu% 03/10/05function [Pw_z,Pd_z,Pz,Li] = pLSA_EM(X,Fixed_Pw_z,K,Learn)%% small offset to avoid numerical problems  ZERO_OFFSET = 1e-7;%%% Default settingsif nargin<3   Learn.Max_Iterations  = 100;   Learn.Min_Likelihood_Change   = 1;      Learn.Verbosity = 0;end;   if Learn.Verbosity   figure(1); clf;   title('Log-likelihood');   xlabel('Iteration'); ylabel('Log-likelihood');end;m  = size(X,1); % vocabulary sizend = size(X,2); % # of documents% initialize Pz, Pd_z,Pw_z[Pz,Pd_z,Pw_z] = pLSA_init(m,nd,K);% if in recognition, reset Pw_z to fixed distribution% from learning phase....if isempty(Fixed_Pw_z)    %% learning mode    FIXED_PW_Z = 0;else    %% recognition mode    FIXED_PW_Z = 1;        %%% check that the size is compatible    if ((size(Pw_z,1)==size(Fixed_Pw_z,1)) & (size(Pw_z,2)==size(Fixed_Pw_z,2)))        %% overwrite random Pw_z        Pw_z = Fixed_Pw_z;    else        error('Dimensions of fixed Pw_z density do not match VQ.Codebook_Size and Learn.Num_Topics');    end end% allocate memory for the posteriorPz_dw = zeros(m,nd,K); Li    = [];maxit = Learn.Max_Iterations;% EM algorithmfor it = 1:maxit      fprintf('Iteration %d ',it);      % E-step   Pz_dw = pLSA_Estep(Pw_z,Pd_z,Pz);      % M-step   [Pw_z,Pd_z,Pz] = pLSA_Mstep(X,Pz_dw);   % if in recognition mode, reset Pw_z to fixed density   if (FIXED_PW_Z)       Pw_z = Fixed_Pw_z;   end        if Learn.Verbosity>=2     Pw_z   end;         % Evaluate data log-likelihood   Li(it) = pLSA_logL(X,Pw_z,Pz,Pd_z);              % plot loglikelihood   if Learn.Verbosity>=3      figure(ff(1));      plot(Li,'b.-');   end;         %%% avoid numerical problems.   Pw_z = Pw_z + ZERO_OFFSET;      % convergence?   dLi = 0;   if it > 1     dLi    = Li(it) - Li(it-1);     if dLi < Learn.Min_Likelihood_Change, break; end;      end;   fprintf('dLi=%f \n',dLi);end;fprintf('\n');%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% initialize conditional probabilities for EM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function [Pz,Pd_z,Pw_z] = pLSA_init(m,nd,K)% m  ... number of words (vocabulary size)% nd ... number of documents% K  ... number of topics%% Pz   ... P(z)% Pd_z ... P(d|z)% Pw_z ... P(w|z)Pz   = ones(K,1)/K; % uniform prior on topics% random assignmentPd_z = rand(nd,K);   % word probabilities conditioned on topicC    = 1./sum(Pd_z,1);  % normalize to sum to 1Pd_z = Pd_z * diag(C);% random assignmentPw_z = rand(m,K);C    = 1./sum(Pw_z,1);    % normalize to sum to 1Pw_z = Pw_z * diag(C);return;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (1) E step compute posterior on z,  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function Pz_dw = pLSA_Estep(Pw_z,Pd_z,Pz)   K = size(Pw_z,2);   for k = 1:K      Pz_dw(:,:,k) = Pw_z(:,k) * Pd_z(:,k)' * Pz(k);   end;      C = sum(Pz_dw,3);   % normalize posterior   for k = 1:K      Pz_dw(:,:,k) = Pz_dw(:,:,k) .* (1./C);   end;   return;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  (2) M step, maximazize log-likelihood%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function [Pw_z,Pd_z,Pz] = pLSA_Mstep(X,Pz_dw)   K = size(Pz_dw,3);      for k = 1:K      Pw_z(:,k) = sum(X .* Pz_dw(:,:,k),2);   end;      for k = 1:K      Pd_z(:,k) = sum(X.* Pz_dw(:,:,k),1)';   end;      Pz = sum(Pd_z,1);      % normalize to sum to 1   C = sum(Pw_z,1);   Pw_z = Pw_z * diag(1./C);   C = sum(Pd_z,1);   Pd_z = Pd_z * diag(1./C);      C = sum(Pz,2);   Pz = Pz .* 1./C;   Pz;return;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% data log-likelihood%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function L = pLSA_logL(X,Pw_z,Pz,Pd_z)   L = sum(sum(X .* log(Pw_z * diag(Pz) * Pd_z' + eps)));return; return;
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -