📄 mass_spec_demo.html
字号:
DR(:,i) = interp1(NH_MZ,D(:,i),[1:nts]',<span class="string">'linear'</span>); <span class="keyword">end</span></pre><p>To see the effect of the resampling, take a look at heatmaps of the original spectra and the resampled spectra. In the original data, there are many samples at the start. </p> <p>Plot a heat map of the original data.</p><pre class="codeinput">figure;imagesc([NH_IN,OC_IN])title(<span class="string">'Original Data'</span>);xlabel(<span class="string">'Group'</span>);ylabel(<span class="string">'M/Z'</span>);<span class="comment">% display actual m/z value instead of index</span>set(gca,<span class="string">'yticklabel'</span>,NH_MZ(2000:2000:14000));<span class="comment">% set the xlabels to show the groups</span>set(gca,<span class="string">'tickdir'</span>,<span class="string">'out'</span>)set(gca,<span class="string">'xtick'</span>,[45 92 172])set(gca,<span class="string">'xticklabel'</span>,{<span class="string">'Control'</span>,<span class="string">''</span>,<span class="string">'Ovarian Cancer'</span>})colorbar</pre><img vspace="5" hspace="5" src="mass_spec_demo_08.png"> <p>Plot a heat map of the resampled data.</p><pre class="codeinput">figureimagesc(DR);title(<span class="string">'Resampled Data'</span>);xlabel(<span class="string">'Group'</span>);ylabel(<span class="string">'M/Z'</span>);<span class="comment">% set the xlabels to show the groups</span>set(gca,<span class="string">'tickdir'</span>,<span class="string">'out'</span>)set(gca,<span class="string">'xtick'</span>,[45 92 172])set(gca,<span class="string">'xticklabel'</span>,{<span class="string">'Control'</span>,<span class="string">''</span>,<span class="string">'Ovarian Cancer'</span>})colorbar</pre><img vspace="5" hspace="5" src="mass_spec_demo_09.png"> <h2>Finding significant features<a name="15"></a></h2> <p>A naive approach for finding which features in the sample are significant is to assume that each M/Z value is independent and do a two-way t-test. </p><pre class="codeinput">numPoints = numel(NH_MZ);h = false(numPoints,1);p = nan+zeros(numPoints,1);<span class="keyword">for</span> count = 1:numPoints [h(count) p(count)] = ttest2(NH_IN(count,:),OC_IN(count,:),.0001,<span class="string">'both'</span>,<span class="string">'unequal'</span>);<span class="keyword">end</span><span class="comment">% h can be used to extract the significant M/Z values</span>sig_Masses = NH_MZ(find(h));</pre><p>We can plot the p-values over the spectra</p><pre class="codeinput">figure(hFig);plot(NH_MZ,-log(p),<span class="string">'g'</span>)<span class="comment">% notice that there are significant regions at high m/z values but low</span><span class="comment">% intensity.</span></pre><img vspace="5" hspace="5" src="mass_spec_demo_10.png"> <pre class="codeinput">close <span class="string">all</span>; clear <span class="string">DR</span></pre><p>We could also use the p-value to extract significant points.</p><pre class="codeinput">sig_Masses = NH_MZ(find(p<1e-6));</pre><h2>Classification methods<a name="19"></a></h2> <p>There are many classification tools in MATLAB. The Statistics Toolbox includes classification trees and discriminant analsysis. We will try a number classification techniques. </p><pre class="codeinput"><span class="comment">% First we calculate some useful values.</span>D = [NH_IN OC_IN];ns = size(D,1); <span class="comment">% number of points</span>nC = size(OC_IN,2); <span class="comment">% number of samples with cancer</span>nH = size(NH_IN,2); <span class="comment">% number of healty samples</span>tn = size(D,2); <span class="comment">% total number of samples</span><span class="comment">% make a indicator vector, where 1s correspond to health samples, 2s to</span><span class="comment">% ovarian cancer samples.</span>id = [ones(1,nH) 2*ones(1,nC)];</pre><h2>K-Nearest Neighbor classifier<a name="20"></a></h2><pre class="codeinput"><span class="keyword">for</span> j=1:3 <span class="comment">% run random simulation a few times</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% Select random training and test sets %</span> per_train = 0.5; <span class="comment">% percentage of samples for training</span> nCt = floor(nC * per_train); <span class="comment">% number of cancer samples in training</span> nHt = floor(nH * per_train); <span class="comment">% number of healty samples in training</span> nt = nCt+nHt; <span class="comment">% total number of training samples</span> sel_H = randperm(nH); <span class="comment">% randomly select samples for training</span> sel_C = nH + randperm(nC); <span class="comment">% randomly select samples for training</span> sel_t = [sel_C(1:nCt) sel_H(1:nHt)]; <span class="comment">% samples chosen for training</span> sel_e = [sel_C(nCt+1:end) sel_H(nHt+1:end)]; <span class="comment">% samples for evaluation</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% available from the MATLAB Central File Exchange</span> c = knnclassify(D(:,sel_e)',D(:,sel_t)',id(sel_t),3,<span class="string">'corr'</span>); <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% How well did we do?</span> per_corr(j) = (1-sum(abs(c-id(sel_e)'))/numel(sel_e))*100; disp(sprintf(<span class="string">'KNN Classifier Step %d: %.2f%% correct\n'</span>,j, per_corr(j))) <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span><span class="keyword">end</span></pre><pre class="codeoutput">KNN Classifier Step 1: 98.43% correctKNN Classifier Step 2: 97.64% correctKNN Classifier Step 3: 92.13% correct</pre><h2>Linear Discriminant Analysis<a name="21"></a></h2> <p>Run PCA/LDA classifier - simplified version of "Q5" algorithm proposed by Lilien et al in "Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum," (with R. Lilien and H. Farid), Journal of Computational Biology, 10(6) 2003, pp. 925-946. </p><pre class="codeinput"><span class="keyword">for</span> j=1:3 <span class="comment">% run random simulation a few times</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% Select random training and test sets %</span> per_train = 0.5; <span class="comment">% percentage of samples for training</span> nCt = floor(nC * per_train); <span class="comment">% number of cancer samples in training</span> nHt = floor(nH * per_train); <span class="comment">% number of healty samples in training</span> nt = nCt+nHt; <span class="comment">% total number of training samples</span> sel_H = randperm(nH); <span class="comment">% randomly select samples for training</span> sel_C = nH + randperm(nC); <span class="comment">% randomly select samples for training</span> sel_t = [sel_C(1:nCt) sel_H(1:nHt)]; <span class="comment">% samples chosen for training</span> sel_e = [sel_C(nCt+1:end) sel_H(nHt+1:end)]; <span class="comment">% samples for evaluation</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% select only the significant features.</span> ndx = find(p < 1e-6); <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% PCA to reduce dimensionality</span> P = princomp(D(ndx,sel_t)',<span class="string">'econ'</span>); <span class="comment">% Project into PCA space</span> x = D(ndx,:)' * P(:,1:nt-2); <span class="comment">% Use linear classifier</span> c = classify(x(sel_e,:),x(sel_t,:),id(sel_t)); <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% How well did we do?</span> per_corr(j) = (1-sum(abs(c-id(sel_e)'))/numel(sel_e))*100; disp(sprintf(<span class="string">'PCA/LDA Classifier Step %d: %.2f%% correct\n'</span>,j, per_corr(j))) <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span><span class="keyword">end</span></pre><pre class="codeoutput">PCA/LDA Classifier Step 1: 100.00% correctPCA/LDA Classifier Step 2: 99.21% correctPCA/LDA Classifier Step 3: 100.00% correct</pre><h2>Variation on PCA/LDA classifier<a name="22"></a></h2> <p>Instead of working with the raw intensity values, we tried running the PCA/LDA classifier over the gradient of the resampled data. </p><pre class="codeinput">DI=diff(D0);<span class="keyword">for</span> j=1:3 <span class="comment">% run simulation 10 times</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% Select random training and test sets %</span> per_train = 0.5; <span class="comment">% percentage of samples for training</span> nCt = floor(nC * per_train); <span class="comment">% number of cancer samples in training</span> nHt = floor(nH * per_train); <span class="comment">% number of healty samples in training</span> nt = nCt+nHt; <span class="comment">% total number of training samples</span> sel_H = randperm(nH); <span class="comment">% randomly select samples for training</span> sel_C = nH + randperm(nC); <span class="comment">% randomly select samples for training</span> sel_t = [sel_C(1:nCt) sel_H(1:nHt)]; <span class="comment">% samples chosen for training</span> sel_e = [sel_C(nCt+1:end) sel_H(nHt+1:end)]; <span class="comment">% samples for evaluation</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% This time use an entropy based data reduction method</span> md = mean(DI(:,sel_t(id(sel_t)==2)),2); <span class="comment">% mean of healthy samples</span> Q = DI - repmat(md,1,tn); <span class="comment">% residuals</span> mc = mean(Q(:,sel_t(id(sel_t)==1)),2); <span class="comment">% residual mean of cancer samples</span> sc = std(Q(:,sel_t(id(sel_t)==1)),[],2);<span class="comment">% and also std</span> [dump,sel] = sort(-abs(mc./sc)); <span class="comment">% metric to reduce samples</span> sel = sel(1:2000); <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% PCA/LDA classifier</span> P = princomp(Q(sel,sel_t)',<span class="string">'econ'</span>); x = Q(sel,:)' * P(:,1:nt-3); c= classify(x(sel_e,:),x(sel_t,:),id(sel_t)); <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% How well did we do?</span> per_corr(j) = (1-sum(abs(c-id(sel_e)'))/numel(sel_e))*100; disp(sprintf(<span class="string">'PCA/LDA Classifier %d: %.2f%% correct\n'</span>,j, per_corr(j))) <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="keyword">end</span></pre><pre class="codeoutput">PCA/LDA Classifier 1: 100.00% correctPCA/LDA Classifier 2: 100.00% correctPCA/LDA Classifier 3: 100.00% correct</pre><h2>Neural Network Classifier<a name="23"></a></h2> <p>Try a simple perceptron</p><pre class="codeinput"><span class="keyword">for</span> j = 1:3 <span class="comment">% run random simulation a few times</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% Select random training and test sets %</span> per_train = 0.5; <span class="comment">% percentage of samples for training</span> nCt = floor(nC * per_train); <span class="comment">% number of cancer samples in training</span> nHt = floor(nH * per_train); <span class="comment">% number of healty samples in training</span> nt = nCt+nHt; <span class="comment">% total number of training samples</span> sel_H = randperm(nH); <span class="comment">% randomly select samples for training</span> sel_C = nH + randperm(nC); <span class="comment">% randomly select samples for training</span> sel_t = [sel_C(1:nCt) sel_H(1:nHt)]; <span class="comment">% samples chosen for training</span> sel_e = [sel_C(nCt+1:end) sel_H(nHt+1:end)]; <span class="comment">% samples for evaluation</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% select only the significant features.</span> ndx = find(p < 1e-6); <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% PCA to reduce dimensionality</span> P = princomp(D(ndx,sel_t)',<span class="string">'econ'</span>); <span class="comment">% Project into PCA space</span> x = D(ndx,:)' * P(:,1:nt-2); <span class="comment">% Use Neural Network</span> <span class="comment">% c = classify(x(sel_e,:),x(sel_t,:),id(sel_t));</span> idnn = double(id == 1); [norm_data,mindata,maxdata] = premnmx(x'); range=[mindata,maxdata]; net=newp(range,1); net.trainParam.show = Inf; net = train(net,norm_data(:,sel_t),idnn(sel_t)); apt =sim(net,norm_data(:,sel_e)); <span class="comment">% [c,ba,ra] = postreg(apt,idnn(sel_e));</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% How well did we do?</span> per_corr(j) = (1-sum(abs(apt-idnn(sel_e)))/numel(sel_e))*100; disp(sprintf(<span class="string">'Perceptron Classifier Step %d: %.2f%% correct\n'</span>,j, per_corr(j))) <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span><span class="keyword">end</span></pre><pre class="codeoutput">Perceptron Classifier Step 1: 89.76% correctPerceptron Classifier Step 2: 88.19% correctPerceptron Classifier Step 3: 87.40% correct</pre><p>Now a more complex network. Feed forward back propagation network with 2 hidden layers.</p><pre class="codeinput"><span class="keyword">for</span> j = 1:3 <span class="comment">% run random simulation a few times</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% Select random training and test sets %</span> per_train = 0.5; <span class="comment">% percentage of samples for training</span> nCt = floor(nC * per_train); <span class="comment">% number of cancer samples in training</span> nHt = floor(nH * per_train); <span class="comment">% number of healty samples in training</span> nt = nCt+nHt; <span class="comment">% total number of training samples</span> sel_H = randperm(nH); <span class="comment">% randomly select samples for training</span> sel_C = nH + randperm(nC); <span class="comment">% randomly select samples for training</span> sel_t = [sel_C(1:nCt) sel_H(1:nHt)]; <span class="comment">% samples chosen for training</span> sel_e = [sel_C(nCt+1:end) sel_H(nHt+1:end)]; <span class="comment">% samples for evaluation</span> <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% select only the significant features.</span> ndx = find(p < 1e-6); <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span> <span class="comment">% PCA to reduce dimensionality</span> P = princomp(D(ndx,sel_t)',<span class="string">'econ'</span>); <span class="comment">% Project into PCA space</span> x = D(ndx,:)' * P(:,1:nt-2); <span class="comment">% Use Neural Network</span> [norm_data,mindata,maxdata] = premnmx(x'); range=[mindata,maxdata]; ffnet = newff(range,[9 5 5 2],{<span class="string">'tansig'</span>,<span class="string">'tansig'</span>,<span class="string">'tansig'</span>,<span class="string">'tansig'</span>},<span class="string">'traincgb'</span>); <span class="comment">% set a couple of parameters</span> ffnet.trainParam.show = inf; <span class="comment">% hide output</span> ffnet.trainParam.goal = 0.001; ffnet.trainParam.epochs = 500; ffnet.trainParam.min_grad = 1e-10; <span class="comment">% can sometimes get stuck at local minima</span> <span class="comment">% train the network</span> <span class="comment">% here we use a two ouput tan sigmoidal transfer function so the targe</span> idff = [-1 + 2*(id==1);1- 2*(id==1)];
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -