📄 emcpm_singleclass_demo.m
字号:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Demonstration of use of the single class EM-CPM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%clear all; %dbstop if error; more off;%% check that the CPM matlab path has been addedif ~exist('makeBins','file') error('Please make sure that the full CPM directory structure is on your path');end%% check that version of minimize is okcheckMinimizeVersion();%% check that maxSparseMEX.c has been compiledcheckMaxSparseMEX();% load the data (here stored in variable called 'data')% -here each time series has 400 time points, where each point% - is a vector of length 2402, and there are 11 time series% - data is of dimensions [numTimeSeries,numTimePoints,numFeatures]% - see DATA/README.txt for explanation of the data setif exist('eColi_singleClass_11perClass.mat','file') load eColi_singleClass_11perClass.mat;else error('Please add eColi_singleClass_11perClass.mat to your path');end% NOTE: this data has been pre-processed with a single shift and a% single global scaling factor for each observed time series. If% your data has huge variations in these respects, it would be wise% to do a similar pre-processing -- otherwise you risk bad local minima, % and/or slow convergence.% Choose some reduced dimensionality version of the data, if desired% This function spreads out the mass of the features evenly across% a new set of smaller features (i.e. in an LC-MS experiment, it tries% to distribute the ion abundance evenly). One could instead use PCA, for% example.numBins=1; %i.e. the dimensionality of the vector at each time point data= makeBins(data,numBins);%% WARNING: dimensionality of each time point increases the CPU time %% roughly linearly, and one often need not use the full%% dimensionality to get good results. Furthermore, in my%% useage so far, I often get numerical problems for more than%% 24 dimensions (numBins).[numTimeSeries,numTimes,numBins]=size(data);%% sum out the features and take a look if you want%figure,plot(squeeze(sum(data,3))','-'); title('Reduced Dimensionality Pre-ALigned Data');%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% set EM-CPM options %%maxIter=10; % max number of iterations allowed (10 might look pretty good, but % more is probably better). Play around with your problem to % see what you need, or just set it running for 100 iterations % and wait.myThresh=10^-5; % convergence tolerance on log likelihood % (will pursue this difference until maxIterations is reached)USE_SPLINE=0; % if set to 1, uses spline scaling rather than HMM scale statesnumCtrlPts=10; % if USE_SPLINE=1, then this allows the number of spline control % points to be set. If USE_SPLINE=0, this variable is ignored.oneScaleOnly=0; % if USE_SPLINE=0, this allows us to use no HMM scale states % in which case only a single global scaling factor is % applied to each time series.extraPercent=0.05; % the length of the latent trace(s) will be constructed % so that, for observed time series of length N, it will % have length M = 2*(N+extraPercent);lambda=0; % smoothing penalty on latent trace (positive only -- higher % is more smooth)nu=0; % inter-trace penalty for multi-class version %(ignored if only one class specified; positive only -- higher is % tighter correspondance between classes)learnStateTransitions=0; % whether to learn the HMM state transition probabilities % this is very expensive, and provides little benefit.learnGlobalScaleFactor=1; % learn the single global scale factor for each time serieslearnEmissionNoise=1; % learn the HMM emission noise for each time serieslearnLatentTrace=1; % learn the latent trace(s)initLatentTrace=[]; % capability to specify the initial latent trace % if desired -- leave blank for an automatic % initialization. Should be of same dimensions % as that returned by the training algorithm % (i.e. [numLatentTimes numClass numBins] where, % numLatentTimes = 2*G.numRealTimes + 2*ceil(extraPercent*numRealTimes)%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Indicate which indexes of the data set are for training, and for hold out% (test) and which class each belongs to (here we're doing single class)%%%%%%%% **IF** YOU DON'T WANT TO RUN TEST data (i.e. if you just want to%% align all of the data you have) AND you want a SINGLE_CLASS alignment, %% then just use these settings, as they are written here (unmodified)if 0 trainSetInd=1:numTimeSeries; testSetInd=[]; thisClass=1; classesTrain{thisClass}=trainSetInd; classesTest{thisClass}=[];%%%%%%%% **ELSEIF** you want TRAIN AND TEST, SINGLE-CLASS%% alignment, TAILOR THIS TO YOUR LIKING,else trainSetInd = [1 3 5 7 9 11]; %% indexes of 'data' to be used for training %testSetInd = [2 4 6 8 10]; %% indexes of 'data' to be used for testing testSetInd=[]; %% to make demo faster...don't test at all %% now specify the classes of each of these (we're just using one class here) %% (this is relative to their ordering in trainSetInd/testSetInd) thisClass=1; %% do it for each class, if more than 1 classesTrain{thisClass}=1:length(trainSetInd); classesTest{thisClass} =1:length(testSetInd);end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Specify where to dump log files and saved workspaces while training:%% THREE files will be created: %% 1) .mat file containing saved results%% 2) .LOG file containing printout of stuff during training/testing%% 3) .FINISHED file which indicates the method trainTestEMCPM has finished%% Leave empty if don't want to use these%saveDir=[pwd '/']; saveName='eColi'; saveDir = ''; saveName = '';%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% train/test the EM-CPM[allGtrain allGtest logLikesTrain logLikesTest]=trainTestEMCPM(... data,trainSetInd,testSetInd,classesTrain,classesTest,... USE_SPLINE,oneScaleOnly,maxIter,numCtrlPts,extraPercent,... lambda,nu,myThresh, learnStateTransitions,learnGlobalScaleFactor,... learnEmissionNoise,learnLatentTrace,saveDir,saveName,initLatentTrace);% %he parameters over iterations are stored in allGtrain, and allGtest,% with the last item in these cell arrays being the final learned values.% logLikesTrain/logLikesTest give the log likelhood at each iteration.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Look at the log likelihood over training/testing iterations%%%% (Note that because some parameters are shared between cases,%% the test is not directly comparable to the train likelihood,%% though it is to a large degree, since the data terms dominante%% the calculation).numTrain = length(trainSetInd);numTest = length(testSetInd);figure,semilogy(logLikesTrain/numTrain,'k+-'); xlabel('Iteration'); ylabel('Log Likelihood Per Time Series');if numTest>0 title('Log Likelihood During Training/Testing'); hold on; semilogy(logLikesTest/numTest,'r+-'); legend('Train','Test','Location','SouthEast');else title('Log Likelihood During Training'); legend('Train','Location','SouthEast');endset(gca,'Color','white');%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% look at the learned latent tracefigure,plot(squeeze(allGtrain{end}.z),'-+','MarkerSize',3);set(gca,'Color','white');title('Learned Latent Trace(s)'); xlabel('Latent Time'); ylabel('Signal Magnitude');%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Viterbi alignments:trainData = data(trainSetInd,:,:);testData = data(testSetInd,:,:);clear data;% Get Viterbi alignments of the training data Gtrain=reviveG(allGtrain{end},0,0);scaleAndTimesTrain = viterbiAlign(Gtrain,trainData);% Alternatively, sample from the posterior % (requires a pass of the Forward algorithm inside this, so slow)% if you want to sample many times, re-use 'alphas' for huge speed-up (then% only do Forward the first time around)if 0 %% WARNING: this requires sample_hist.m from Tom Minka's Lightspeed toolbox alphas=''; [encodedStates alphas]= sampleHMMstatesGivenAlphas(trainData,Gtrain,'',alphas); scaleAndTimesTrain=unwrapStates(encodedStates,Gtrain); clear alphas; end%%%% Get Viterbi alignments of the test data, if it exists and you want itif ~isempty(testData) Gtest=reviveG(allGtest{end},0,0); scaleAndTimesTest = viterbiAlign(Gtest,testData);end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Align, interpolate and scale the training data in the ALIGNMENT spacescaleType=2; %% type 'help alignInterpolateScale' to see possible valuesnewData = alignInterpolateScale(trainData,scaleAndTimesTrain,Gtrain,scaleType);%% plot it (collapsing it to a single dimension for viewing purposes)oneD_alignedData=squeeze(sum(newData,3))';figure, subplot(2,1,1),set(gca,'Color','white');plot(oneD_alignedData,'-');title('Aligned and Scaled Data'); xlabel('Latent Time');%% contrast to unalinged/scaled datasubplot(2,1,2)plot(squeeze(sum(trainData,3))','-');set(gca,'Color','white');title('Unaligned and Unscaled Data'); xlabel('Experimental Time');%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Align, interpolate and scale the training data in the ORIGINAL space%% (or really, in whatever space you decide to use here)%% (need to re-load original data because we overwrote it with the reduced%% dimension version)load eColi_singleClass_11perClass.mat;trainData = data(trainSetInd,:,:);scaleType=2; %% type 'help alignInterpolateScale' to see possible valuesnewData = alignInterpolateScale(trainData,scaleAndTimesTrain,Gtrain,scaleType);%% plot it (collapsing it to a single dimension for viewing purposes)%% NOTE: this figure should look the same as the last one, since we have%% collapsed it to a 1D space in both instancesif 0 oneD_alignedData=squeeze(sum(newData,3))'; figure, subplot(2,1,1), plot(oneD_alignedData,'-'); set(gca,'Color','white'); title('Aligned and Scaled Data'); xlabel('Latent Time'); %% contrast to unalinged/scaled data subplot(2,1,2) plot(squeeze(sum(trainData,3))','-'); set(gca,'Color','white'); title('Unaligned and Unscaled Data'); xlabel('Experimental Time');end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% NOTES:%% The figures generated in this script, when run by me, can be%% found in ./demos/figs/ directory.%% The figures are called EM_CPM_singleClass_demo_X.jpg, %% Each training iteration took just under 1 minute when I ran this with%% 6 training time series (for all 6 of these), using the HMM scale states%% (rather than the scaling spline), and not learning the state transition%% probabilities.return;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% You can plug the code below into the script above if you want to%% try using a hold out to find a good value of lambda (smoothing penalty%% on the latent trace) to use, although frequently it turns out that no%% smoothing is just fine. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Choose what values of lambda you want to try (probably more than this)%% -lambda is smoothing penalty on latent trace (positive only -- higher%% is more smooth). Reasonable values are data set dependent.tryLambda=[1e-18 1e-16 1e-14 1e-12];nxt=1;for lambda=tryLambda %% train/test the EM-CPM: only story the test log likelihood each time [allGtrain allGtest logLikesTrain logLikesTest]=trainTestEMCPM(... data,trainSetInd,testSetInd,classesTrain,classesTest,... USE_SPLINE,oneScaleOnly,maxIter,numCtrlPts,extraPercent,... lambda,nu,myThresh, learnStateTransitions,learnGlobalScaleFactor,... learnEmissionNoise,learnLatentTrace,saveDir,saveName,initLatentTrace); holdOutLogLike(nxt)=logLikesTest(end); nxt=nxt+1;end%% Look at the log likelihood over hold out numHoldOut=length(testSetInd);figure, semilogx(tryLambda,holdOutLogLike/numHoldOut,'-+');title('Hold Out Log Likelihood');xlabel('Lambda (smoothing parameter on latent trace)');ylabel('Hold Out Log Likelihood Per Hold Out Case');grid on;set(gca,'Color','white');%%NOTES:%% this data set does not appear to require any smoothing of the latent trace
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -