📄 classification.html

📁 高斯过程在回归和分类问题中的应用
💻 HTML
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
  n1=80; n2=40;                         % number of data points from each class  randn('seed',17)  S1 = eye(2); S2 = [1 0.95; 0.95 1];             % the two covariance matrices  m1 = [0.75; 0]; m2 = [-0.75; 0];                              % the two means  x1 = chol(S1)'*randn(2,n1)+repmat(m1,1,n1);          % samples from one class  x2 = chol(S2)'*randn(2,n2)+repmat(m2,1,n2);              % and from the other  x = [x1 x2]';                                      % these are the inputs and  y = [repmat(-1,1,n1) repmat(1,1,n2)]';         % outputs used a training data</pre>Below the samples are show together with the "Bayes Decision Probabilities",obtained from complete knowledge of the data generating process:</p><center><img src="fig2d.gif"></center><br>Note, that the ideal predictive probabilities depend only on the relativedensity of the two classes, and not on the absolute density. We would egexpect that the structure in the upper right hand corner of the plot may be very difficult to obtain based on the samples, because the data density isvery low. The contour plot is obtained by:<pre>  [t1 t2] = meshgrid(-4:0.1:4,-4:0.1:4);  t = [t1(:) t2(:)];                                % these are the test inputs  tt = sum((t-repmat(m1',length(t),1))*inv(S1).*(t-repmat(m1',length(t),1)),2);  z1 = n1*exp(-tt/2)/sqrt(det(S1));  tt = sum((t-repmat(m2',length(t),1))*inv(S2).*(t-repmat(m2',length(t),1)),2);  z2 = n2*exp(-tt/2)/sqrt(det(S2));  contour(t1,t2,reshape(z2./(z1+z2),size(t1)),[0.1:0.1:0.9]);  hold on  plot(x1(1,:),x1(2,:),'b+')  plot(x2(1,:),x2(2,:),'r+')</pre>  Now, we will fit a probabilistic Gaussian process classifier to this data,using an implementation of the Expectation Propagation (EP) algorithm. We mustspecify a covariance function. First, we will try the squared exponentialcovariance function <tt>covSEiso</tt>. We must specify the parameters of thecovariance function (hyperparameters). For the isotropic squared exponentialcovariance function there are two hyperparameters, the lengthscale (kernelwidth) and the magnitude. We need to specify values for these hyperparameters(see below for how to learn them). Initially, we will simply set the log ofthese hyperparameters to zero, and see what happens. For the likelihoodfunction, we use the cumulative Gaussian:<pre>  loghyper = [0; 0];  p2 = binaryEPGP(loghyper, 'covSEiso', x, y, t);  clf  contour(t1,t2,reshape(p2,size(t1)),[0.1:0.1:0.9]);  hold on  plot(x1(1,:),x1(2,:),'b+')  plot(x2(1,:),x2(2,:),'r+')</pre>to produce predictive probabilities on the grid of test points:</p>  <center><img src="fig2de1.gif"></center><br>Although the predictive contours in this plot look quite different from the"Bayes Decision Probabilities" plotted above, note that the predictiveprobabilities in regions of high data density are not terribly different fromthose of the generating process. Recall, that this plot was made usinghyperparameter which we essentially pulled out of thin air. Now, we find thevalues of the hyperparameters which maximize the marginal likelihood (orstrictly, the EP approximation of the marginal likelihood):<pre>  newloghyper = minimize(loghyper, 'binaryEPGP', -20, 'covSEiso', x, y)  p3 = binaryEPGP(newloghyper, 'covSEiso', x, y, t);</pre>where the argument <tt>-20</tt> tells minimize to evaluate the function at most<tt>20</tt> times. The new hyperparameters have a fairly similar length scale, but a much larger magnitude for the latent function. This leads to more extremepredictive probabilities:<pre>  clf  contour(t1,t2,reshape(p3,size(t1)),[0.1:0.1:0.9]);  hold on  plot(x1(1,:),x1(2,:),'b+')  plot(x2(1,:),x2(2,:),'r+')</pre>produces:</p><center><img src="fig2de2.gif"></center><br>Note, that this plot still shows that the predictive probabilities revert toone half, when we move away from the data (in stark contrast to the "BayesDecision Probabilities" in this example). This may or may not be seen as anappropriate behaviour, depending on our prior expectations about the data. Itis a direct consequence of the behaviour of the squared exponential covariancefunction. We can instead try a neural network covariance function, which hasthe ability to saturate at specific latent values, as we move away from zero:<pre>  newloghyper = minimize(loghyper, 'binaryEPGP', -20, 'covNN', x, y);  p4 = binaryEPGP(newloghyper, 'covNN', x, y, t);  clf  contour(t1,t2,reshape(p4,size(t1)),[0.1:0.1:0.9]);  hold on  plot(x1(1,:),x1(2,:),'b+')  plot(x2(1,:),x2(2,:),'r+')</pre>to produce:</p><center><img src="fig2de3.gif"></center><br>which shows extreme probabilities even at the boundaries of the plot.<h3 id="ep-ex-usps">Example of the Expectation Propagation Algorithm applied tohand-written digit classification</h3>You can either follow the example below or run the short <ahref="../gpml/gpml-demo/demo_ep_usps.m">demo_ep_usps.m</a> script. You will need the code form the <ahref="http://www.gaussianprocess.org/gpml/code/gpml-matlab.tar.gz">gpml-matlab.tar.gz</a>archive and additionally the data, see below. Consider compiling the mex files(see <a href="../gpml/README">README</a> file), although the programs shouldwork even without.</p>As an example application we will consider the USPS resampled data which can befound <ahref="http://www.gaussianprocess.org/gpml/data">here</a>. Specifically, we willneed to download <tt>usps_resampled.mat</tt> and unpack the data from <ahref="http://www.gaussianprocess.org/gpml/data/usps_resampled.mat.bz2">bz2</a>(7.0 Mb) or <ahref="http://www.gaussianprocess.org/gpml/data/usps_resampled.mat.zip">zip</a>(8.3 Mb) files. As an example, we will consider the binary classification taskof separating images of the digit "3" from images of the digit "5".<pre>  cd gpml-demo                      % you need to be in the gpml-demo directory  cd ..; w=pwd; addpath([w,'/gpml']); cd gpml-demo  % add the path for the code  [x y xx yy] = loadBinaryUSPS(3,5);</pre>which will create the training inputs <tt>x</tt> (of size 767 by 256), binary(+1/-1) training targets <tt>y</tt> (of length 767), and <tt>xx</tt> and<tt>yy</tt> containing the test set (of size 773). Usee.g. <tt>imagesc(reshape(x(3,:),16,16)');</tt> to view an image (here the 3rdimage of the training set).</p>Now, let us make predictions using the squared exponential covariance function<tt>covSEiso</tt>. We must specify the parameters of the covariance function(hyperparameters). For the isotropic squared exponential covariance functionthere are two hyperparameters, the lengthscale (kernel width) and themagnitude.  We need to specify values for these hyperparameters(see below for how to learn them). One strategy, which is perhaps not toounreasonable would be to set the characteristic length-scale to be equal to theaverage pair-wise distance between points, which for this data is around 22;the signal variance can be set to unity, since this is the range at which thesigmoid non-linearity comes into play. Other strategies for setting initialguesses for the hyperparameters may be reasonable as well. We need to specifythe logarithm of the hypers, roughly log(22)=3 and log(1)=0:<pre>  loghyper = [3.0; 0.0];   % set the log hyperparameters  p = binaryEPGP(loghyper, 'covSEiso', x, y, xx);</pre>which may take a few minutes to compute. We can visualize the predictiveprobabilities for the test cases using:<pre>  plot(p,'.')  hold on  plot([1 length(p)],[0.5 0.5],'r')</pre>to get:</p><br><center><img src="figepp.gif"></center><br>where we should keep in mind that the test cases are ordered according to theirtarget class. Notice that there are misclassifications, but there are no veryconfident misclassifications. The number of test set errors (out of 773 testcases) when thresholding the predictive probability at 0.5 and the averageamount of information about the test set labels in excess of a 50/50 model inbits are given by:<pre>  sum((p>0.5)~=(yy>0))  mean((yy==1).*log2(p)+(yy==-1).*log2(1-p))+1</pre>which shows test set 35 prediction errors (out of 773 test cases) and anaverage amount of information of about 0.72 bits, compare to Figure 3.8(b) and3.8(d) on page 65.</p>More interestingly, we can also use the program to find the values for thehyperparameters which optimize the marginal likelihood, as follows:<pre>  [loghyper' binaryEPGP(loghyper, 'covSEiso', x, y)]  [newloghyper logmarglik] = minimize(loghyper, 'binaryEPGP', -20, 'covSEiso', x, y);  [newloghyper' logmarglik(end)]</pre>where the third argument <tt>-20</tt> to the minimize function, tells minimizeto evaluate the function at most 20 times (this is adequate when the parameterspace is only 2-dimensional. Warning: the optimization above may take about an hour depending on your machine and whether you have compiled the mex files.If you don't want to wait that long, you can train on a subset of the data (butnote that the cases are ordered according to class).</p>Above, we first evaluate the negative log marginal likelihood at our initialguess for <tt>loghyper</tt> (which is about <tt>222</tt>), then minimize thenegative log marginal likelihood w.r.t. the hyperparameters, to obtain newvalues of <tt>2.55</tt> for the log lengthscale and <tt>3.96</tt> for the logmagnitude.  The negative log marginal likelihood decreases to about <tt>90</tt>at the minimum, see also Figure 3.8(a) page 65. The negative log marginallikelihood decreases to about <tt>90</tt> at the minimum, see also Figure3.7(a) page 64. The marginal likelihood has thus increased by a factor ofexp(222-90) or roughly 2e+57.</p>Finally, we can make test set predictions with the new hyperparameters:<pre>  pp = binaryEPGP(newloghyper, 'covSEiso', x, y, xx);  plot(pp,'g.')</pre>to obtain:</p><br><center><img src="figepp2.gif"></center><br>We note that the new predictions (in green) are much more extreme than the oldones (in blue). Evaluating the new test predictions:<pre>  sum((pp>0.5)~=(yy>0))  mean((yy==1).*log2(pp)+(yy==-1).*log2(1-pp))+1</pre>reveals that there are now 19 test set misclassifications (as opposed to 35)and the information about the test set labels has increased from 0.72 bits to0.89 bits.</p><br><br>Go back to the <a href="http://www.gaussianprocess.org/gpml">web page</a> forGaussian Processes for Machine Learning.    <hr><!-- Created: Mon Nov  7 09:52:03 CET 2005 --><!-- hhmts start -->Last modified: Thu Mar 23 15:23:00 CET 2006<!-- hhmts end -->  </body></html>
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -