📄 regression.html

📁 高斯过程在回归和分类问题中的应用
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
  1.5461  0.1443</pre>Note that the hyperparameters learned here a close, but not identical to thelog hyperparameters <tt>1.0, 1.0, 0.1</tt> used when generating the data. Thediscrepancy is partially due to the small training sample size, and partiallydue to the fact that we only get information about the process in a verylimited range of input values. Predicting with the learned hyperparametersproduces:<br><center><img src="figlf.gif"></center><br>which is not very different from the plot above (based on the hyperparametersused to generate the data). Repeating the experiment with more training pointsdistributed over a wider range leads to more accurate estimates.</p>Note, that above we have use the same functional form for the covariancefunction, as was used to generate the data. In practise things are seldom sosimple, and one may have to try different covariance functions. Here, we tryto explore how a Matern form, with a shape parameter of 3/2 (see eq. 4.17) would do.<pre>  covfunc2 = {'covSum',{'covMatern3iso','covNoise'}};  loghyper2 = minimize([-1; -1; -1], 'gpr', -100, covfunc2, x, y);</pre>Comparing the value of the marginal likelihoods for the two models<pre>  -gpr(loghyper, covfunc, x, y)  -gpr(loghyper2, covfunc2, x, y)</pre>with values of <tt>-15.6</tt> for SE and <tt>-18.0</tt> for Matern3, shows thatthe SE covariance function is about <tt>exp(18.0-15.6)=11</tt> times moreprobable than the Matern form for these data (in agreement with the datagenerating process). The predictions from the worse Matern-based model<pre>  [mu S2] = gpr(loghyper2, covfunc2, x, y, xstar);  S2 = S2 - exp(2*loghyper2(3));</pre>  look like this:<br><center><img src="figlm.gif"></center><br>Notice how the uncertainty grows more rapidly in the vicinity of data-points,reflecting the property that for the Matern class with a shape parameter of 3/2 the stochastic process is not twice mean-squaredifferentiable (and is thusmuch less smooth than the SE covariance function).<h3 id="ard">3. GPR demo using gpr.m with multi-dimensional input space andARD</h3>This demonstration illustrates the use of a Gaussian Processregression for a multi-dimensional input, and illustratesthe use of automatic relevance determination (ARD). It is basedon <a href="#williams-rasmussen-96">Williams and Rasmussen (1996)</a>.</p>You can either follow the example below or run the script <ahref="../gpml-demo/demo_gparm.m">demo_gparm.m</a>.</p>We initially consider a 2-d nonlinear robot arm mapping problem as defined in<a href="#mackay-92">Mackay (1992)</a><pre>  f(x_1,x_2) = r_1 cos (x_1) + r_2 cos(x_1 + x_2)</pre>where x_1 is chosen randomly in [-1.932, -0.453], x_2 is chosen randomly in[0.534, 3.142], and r_1 = 2.0, r_2 = 1.3.  The target values are obtained byadding Gaussian noise of variance 0.0025 to f(x_1,x_2).  Following <ahref="#neal-96">Neal (1996)</a> we add four further inputs, two of which (x_3and x_4) are copies of x_1 and x_2 corrupted by additive Gaussian noise ofstandard deviation 0.02, and two of which (x_5 and x_6) are N(0,1) Gaussiannoise variables.  Our dataset has n=200 training points and nstar=200 testpoints.</p>The training and test data is contained in the file <ahref="../gpml-demo/data_6darm.mat">data_6darm.mat</a>. The raw training data isin the input matrix <tt>X</tt> (<tt>n=</tt>200 by <tt>D=</tt>6) and the targetvector <tt>y</tt> (200 by 1). Assuming that the current directory is<tt>gpml-demo</tt> we need to add the path of the code, and load the data:<pre>  addpath('../gpml');    % add path for the code  load data_6darm</pre>We first check the scaling of these variables using<pre>  mean(X), std(X), mean(y), std(y)</pre>In particular we might be concerned if the standard deviation is very differentfor different input dimensions; however, that is not the case here so we do notcarry out rescaling for <tt>X</tt>.  However, <tt>y</tt> has a non-zero meanwhich is not appropriate if we assume a zero-mean GP. We could add a constantonto the SE covariance function corresponding to a prior on constant offsets,but here instead we centre <tt>y</tt> by setting:<pre>  offset = mean(y);  y = y - offset;         % centre targets around 0.</pre>We use Gaussian process regression with a squared exponential covariancefunction, and allow a separate lengthscale for each input dimension, as ineqs. 5.1 and 5.2.  These lengthscales (and the other hyperparameters sigma_fand sigma_n) are adapted by maximizing the marginal likelihood (eq. 5.8)w.r.t. the hyperparameters. The covariance function is specified by<pre>  covfunc = {'covSum', {'covSEard','covNoise'}};</pre>We now wish to train the GP by optimizing the hyperparameters.  Thehyperparameters are stored as <tt>logtheta = [log(ell_1), log(ell_2),... log(ell_6), log(sigma_f), log(sigma_n)]</tt> (as <tt>D = 6</tt>), and areinitialized to <tt>logtheta0 = [0; 0; 0; 0; 0; 0; 0; log(sqrt(0.1))]</tt>.  Thelast value means that the initial noise variance sigma^2_n is set to 0.1.  Themarginal likelihood is optimized using the <tt>minimize</tt> function.<pre>  logtheta0 = [0; 0; 0; 0; 0; 0; 0; log(sqrt(0.1))];  [logtheta, fvals, iter] = minimize(logtheta0, 'gpr', -100, covfunc, X, y);  exp(logtheta) </pre>By doing <tt>exp(logtheta)</tt> we find that:<pre>  ell_1      1.804377  ell_2      1.963956  ell_3      8.884361  ell_4     34.417657  ell_5   1081.610451  ell_6    375.445823  sigma_f    2.379139  sigma_n    0.050835</pre>Notice that the lengthscales ell_1 and ell_2 are short indicating that inputsx_1 and x_2 are relevant to the task.  The noisy inputs x_3 and x_4 have longerlengthscales, indicating they are less relevant, and the pure noise inputs x_5and x_6 have very long lengthscales, so they are effectively irrelevant to theproblem, as indeed we would hope.  The process std deviation sigma_f is similarin magnitude to the standard deviation of the data <tt>std(y) = 1.2186</tt>.The learned noise standard deviation sigma_n is almost exactly equal to thegenerative noise level sqrt(0.0025)=0.05.</p>We now make predictions on the test points and assess the accuracy of thesepredictions. The test data is in <tt>Xstar</tt> and <tt>ystar</tt>. Recall thatthe training targets were centered, thus we must adjust the predictions by<tt>offset</tt>:<pre>  [fstar S2] = gpr(logtheta, covfunc, X, y, Xstar);  fstar = fstar + offset;  % add back offset to get true prediction</pre>We compute the residuals, the mean squared error (mse) and the predictive loglikelihood (pll). Note that the predictive variance for <tt>ystar</tt> includesthe noise variance, as explained on p. 18.<pre>  res = fstar-ystar;  % residuals  mse = mean(res.^2)  pll = -0.5*mean(log(2*pi*S2)+res.^2./S2)</pre>The mean squared error is 0.002489. Note that this is almost equal to the value0.0025, as would be obtained from the perfect predictor, due to the added noisewith variance 0.0025.</p> We can also plot the residuals and the predictive (noise-free) variance foreach test case. Note that the order along the x-axis is arbitrary.<pre>  subplot(211), plot(res,'.'), ylabel('residuals'), xlabel('test case')  subplot(212), plot(sqrt(S2),'.'), ylabel('predictive std deviation'), xlabel('test case')</pre><h3 id="ref">4. References</h3><ul><li id=mackay-92>D. J. C. MacKay, <ahref="http://www.inference.phy.cam.ac.uk/mackay/backprop.nc.ps.gz">A PracticalBayesian Framework for Backpropagation Networks</a>, Neural Computation 4,448-472, (1992)</li><li id=neal-96>R. M. Neal, <a href="http://www.cs.utoronto.ca/~radford/bnn.book.html">BayesianLearning for Neural Networks</a>, Springer, (1996)</li><li id=williams-rasmussen-96>C. K. I. Williams and C. E. Rasmussen, <ahref="http://www.gaussianprocess.org/#williams-rasmussen-96">Gaussian Processesfor Regression</a>, Advances in Neural Information Processing Systems 8, pp514-520, MIT Press (1996).</li></ul>Go back to the <a href="http://www.gaussianprocess.org/gpml">web page</a> forGaussian Processes for Machine Learning.<hr><!-- Created: Mon Oct 17 14:16:13 CEST 2005 --><!-- hhmts start -->Last modified: Mon Mar 27 16:10:14 CEST 2006<!-- hhmts end -->  </body></html>
上一页 12
💿 文件大小 816 K
👤 上传用户 smellteen
📂 所属分类 matlab例程
🏷️ 相关标签

#高斯 #过程 #回归 #分类
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -