📄 optimtips_21_36.m
字号:
Consider the very simple (linear) regression model: y = a0 + a1*x + error.The sum of squares surface as a function of the parameters(a0,a1) will have ellipsoidal contours. Theory tells us this,but its always nice to back up theory with an example. ThenI'll look at what happens in a nonlinear regression.%}%% % A simple linear modeln = 20;x = randn(n,1);y = 1 + x + randn(n,1);closefigureplot(x,y,'o')title 'Linear data with noise'xlabel 'x'ylabel 'y'%%% linear regression estimatesM = [ones(n,1),x];a0a1 = M\y% look at various possible values for a0 & a1v = -1:.1:3;nv = length(v);[a0,a1] = meshgrid(v);a0=a0(:)';a1=a1(:)';m = length(a0);SS = sum(((ones(n,1)*a0 + x*a1) - repmat(y,1,m)).^2,1);SS = reshape(SS,nv,nv);surf(v,v,SS)title 'Sum of squares error surface'xlabel 'a0'ylabel 'a1'figurecontour(v,v,SS)hold onplot(a0a1(1),a0a1(2),'rx')plot([[-1;3],[1;1]],[[1;1],[-1;3]],'g-')hold offtitle 'Linear model: Sum of squares (elliptical) error contours'xlabel 'a0'ylabel 'a1'axis equalaxis squaregrid on% Note: the min SSE will occur roughly at (1,1), as indicated% by the green crossed lines. The linear regression estimates% are marked by the 'x'.% As predicted, the contours are elliptical.%%% next, we will do the same analysis for a nonlinear model% with two parameters:%% y = a0*exp(a1*x.^2) + error%% First, we will look at the error surface% A simple nonlinear modeln = 20;x = linspace(-1.5,1.5,n)';y = 1*exp(1*x.^2) + randn(n,1);figureplot(x,y,'o')title 'Nonlinear data with noise'xlabel 'x'ylabel 'y'%%% Nonlinear regression estimates. Use pleas for this one.a1_start = 2;[a1,a0] = pleas({@(a1,xdata) exp(a1*xdata.^2)},a1_start,x,y)a0a1 = [a0,a1];%%% look at various possible values for a0 & a1v = .5:.01:1.5;nv = length(v);[a0,a1] = meshgrid(v);a0=a0(:)';a1=a1(:)';m = length(a0);SS = sum((((ones(n,1)*a0).*exp(x.^2*a1)) - repmat(y,1,m)).^2,1);SS = reshape(SS,nv,nv);minSS = min(SS(:));surf(v,v,SS)title 'Nonlinear model: Sum of squares error surface'xlabel 'a0'ylabel 'a1'figurecontour(v,v,SS,minSS + (0:.25:2))hold onplot(a0a1(1),a0a1(2),'rx')plot([[-1;3],[1;1]],[[1;1],[-1;3]],'g-')hold offtitle 'Nonlinear model: Sum of squares error contours'xlabel 'a0'ylabel 'a1'axis equalaxis squaregrid on% Note: This time, the sums of squares surface is not as% neatly elliptical. %% 28. Confidence limits on a regression model%{There are various things that people think of when the phrase"confidence limits" arises.- We can ask for confidence limits on the regression parametersthemselves. This is useful to decide if a model term is statisticallysignificant, if not, we may choose to drop it from a model.- We can ask for confidence limits on the model predictions (yhat)These goals are related of course. They are obtained from theparameter covariance matrix from the regression. (The reference tolook at here is again Draper and Smith.)If our regression problem is to solve for x, such that A*x = y,then we can compute the covariance matrix of the parameter vectorx by the simple V_x = inv(A'*A)*s2where s2 is the error variance. Typically the error variance isunknown, so we would use a measure of it from the residuals. s2 = sum((y-yhat).^2)/(n-p);Here n is the number of data points, and p the number of parametersto be estimated. This presumes little or no lack of fit in the model.Of course, significant lack of fit would invalidate any confidencelimits of this form.Often only the diagonal of the covariance matrix is used. Thisprovides simple variance estimates for each parameter, assumingindependence between the parameters. Large (in absolute value)off-diagonal terms in the covariance matrix will indicate highlycorrelated parameters.In the event that only the diagonal of the covariance matrix isrequired, we can compute it without an explicit inverse of A'*A,and in way that is both computationally efficient and as wellconditioned as possible. Thus if one solves for the solution toA*x=y using a qr factorization as x = R\(Q*y)then recognize that inv(A'*A) = inv(R'*R) = inv(R)*inv(R') = inv(R)*inv(R)'If we have already computed R, this will be more stablenumerically. If A is sparse, the savings will be more dramatic.There is one more step to take however. Since we really wantonly the diagonal of this matrix, we can get it as: diag(inv(A'*A)) = sum(inv(R).^2,2)%}%%% Compare the two approaches on a random matrix.A=rand(10,3);diag(inv(A'*A))[Q,R]=qr(A,0);sum(inv(R).^2,2)% Note that both gave the same result.%%% As a test, run 1000 regressions on random data, compare how% many sets of 95% confidence intervals actually contained the% true slope coefficient.m = 1000; % # of runs to verify confidence intervalsn = 100; % # of data points% ci will contain the upper and lower limits on the slope% parameter in columns 1 and 2 respectively.ci = zeros(m,2);p = 2;for i=1:m x = randn(n,1); y = 1+2*x + randn(n,1); % solve using \ M = [ones(n,1),x]; coef = M\y; % the residual vector res = M*coef - y; % yes, I'll even be lazy and use inv. M is well conditioned % so there is no fear of numerical problems, and there are % only 2 coefficients to estimate so there is no time issue. s2 = sum(res.^2)/(n-p); sigma = s2*diag(inv(M'*M))'; % a quick excursion into the statistics toolbox to get % the critical value from tinv for 95% limits: % tinv(.975,100) % ans = % 1.9840 ci(i,:) = coef(2) + 1.984*sqrt(sigma(2))*[-1 1];end% finally, how many of these slope confidence intervals% actually contained the true value (2). In a simulation% with 1000 events, we expect to see roughly 950 cases where% the bounds contained 2.sum((2>=ci(:,1)) & (2<=ci(:,2)))% 950 may not have been too bad a prediction after all.%%% Given a complete covariance matrix estimate, we can also compute% uncertainties around any linear combination of the parameters.% Since a linear regression model is linear in the parameters,% confidence limits on the predictions of the model at some point% or set of points are easily enough obtained, since the prediction% at any point is merely a linear combination of the parameters.% An example of confidence limits around the curve on a linear% regression is simple enough to do.n = 10;x = sort(rand(n,1))*2-1;y = 1 + 2*x + 3*x.^2 + randn(n,1);M = [ones(n,1),x,x.^2];coef = M\y;% Covariance matrix of the parametersyhat = M*coef;s2 = sum((y - yhat).^2)/(n-3);sigma = s2*inv(M'*M)% Predictions at a set of equally spaced points in [0,1].x0 = linspace(-1,1,101)';M0 = [ones(101,1),x0,x0.^2];y0 = M0*coef;V = diag(M0*sigma*M0');% Use a (2-sided) t-statistic for the intervals at a 95% level% tinv(.975,100)% ans =% 1.9840tcrit = 1.9840;close allplot(x,y,'ro')hold onplot(x0,y0,'b-')plot(x0,[y0-tcrit*V,y0+tcrit*V],'g-')hold off%% % Equality constraints are easily dealt with, since they can be% removed from the problem (see section 25.) However, active% inequality constraints are a problem! For a confidence limit,% one cannot assume that an active inequality constraint was truly% an equality constraint, since it is permissible to fall any tiny% amount inside the constraint.% % If you have this problem, you may wish to consider the jackknife% or bootstrap procedures.%% 29. Confidence limits on the parameters in a nonlinear regression%{A nonlinear regression is not too different from a linear one. Yes,it is nonlinear. The (approximate) covariance matrix of the parameters is normally derived from a first order Taylor series. Thus, of J isthe (nxp) Jacobian matrix at the optimum, where n is the number ofdata points, and p the number of parameters to estimate, thecovariance matrix is just S = s2*inv(J'*J)where s2 is the error variance. See section 27 for a descriptionof how one may avoid some of the problem with inv and forming J'*J,especially if you only wish to compute the diagonals of this matrix.What you do not want to use here is the approximate Hessian matrixreturned by an optimizer. These matrices tend to be formed fromupdates. As such, they are often only approximations to the trueHessian at the optimum, at the very least lagging behind a fewiterations. Since they are also built from rank 1 updates to theHessian, they may lack the proper behavior in some dimensions.(For example, a lucky starting value may allow an optimizationto converge in only one iteration. This would also update thereturned Hessian approximation in only 1 dimension, so probably aterrible estimate of the true Hessian at that point.)%}%% 30. Quadprog example, unrounding a curve%{A nice, I'll even claim elegant, example of the use of quadprogis the problem of unrounding a vector. Code for this in the form ofa function can be found on the file exchange as unround.mhttp://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=8719&objectType=FILEConsider a vector composed of some smooth function evaluatedat consecutive (equally spaced) values of its independent variable.Then the function values are rounded to the nearest integer. Canwe recover the original smooth function values? Clearly we cannotdo so exactly for an arbitrary function, but we can attempt to findthe smoothest set of function values that are consistent with thegiven rounded set.We will treat each element of the set as an unknown, so a vectorof length n will have n unknowns to solve for.%}%% % A simple function of xn=101;x = linspace(0,1,n);v0 = (10*x.^2);% rounded vectorvr = round(v0);closeplot(x,v0,'r-',x,vr,'bo')title 'Base functional form, with rounding'xlabel 'x'ylabel 'y'%%% bound constraints. since vr is composed of all integers, then% the origin, unrounded values must lie within simple bounds% from our rounded set.LB = vr - 0.5;UB = vr + 0.5;% sparse system representing smoothness. Consider any row of S.% When multiplied by the vector of unknowns, it computes a% second order finite difference of the vector around some point.% This, by itself cannot be used in quadprog, because S is not% an appropriate quadratic form.S = spdiags(repmat([1 -2 1],n-2,1),[0 1 2],n-2,n);% H is positive semi-definite. Quadprog will be happy.H = S'*S;f = zeros(n,1);% Solve the quadratic programming problem. Make sure we use the% largescale solver. Note that we have carefully constructed% H to be sparse. The largescale solver will allow simple bound% constraints. As important as the constraints, when LargeScale% is 'on', quadprog can handle quite large problems, numbering% into many thousands of variables. Note that I have supplied H% as a sparse matrix. It is tightly banded.options = optimset('quadprog');options.LargeScale = 'on';options.Display = 'final';vur = quadprog(H,f,[],[],[],[],LB,UB,[],options);plot(x,v0,'r-',x,vr,'bo',x,vur,'k--')legend('Base curve','Rounded data','Un-rounded curve','Location','NorthWest')xlabel 'x'ylabel 'y'% Clearly the two curves overlay quite nicely, only at the ends% do they deviate from each other significantly. That deviation% is related to the behavior of a natural spline, despite the% fact that we never explicitly defined a spline model. In effect,% we minimized the approximate integral of the square of the second% derivative of our function. It is this integral from which cubic% splines are derived.%% 31. R^2%{R^2 is a measure of the goodness of fit of a regression model. Whatdoes it mean? We can write R^2 in terms of the vectors y (our data)and yhat (our model predictions) as R^2 = 1 - ( norm(yhat-y) / norm(y-mean(y)) )^2Intuitively, we might describe the term "norm(y-mean(y))" as the total information content of our data. Likewise, the term "norm(yhat-y)"is the amount of signal that remains after removing the predictionsof our model. So when the model describes the data perfectly, thisratio will be zero. Squared and subtracted from 1, a perfect fit willyield an R^2 == 1.At the other end of the spectrum, a fit that describes the data nobetter than does the mean will yield a zero value for R^2. It alsosuggests that a model with no mean term can produce a negativevalue for R^2. The implication in the event of a negative R^2 is themodel is a poorer predictor of the data than would be a simple mean.Finally, note that the formula above is not specific to a linearregression model. R^2 may apply as well to a nonlinear regressionmodel.When do we use R^2? In the words of Draper & Smith, R^2 is the firstthing they might look at when perusing the output of a regressionanalysis. It is also clearly not the only thing they look at. My pointis that R^2 is only one measure of a model. There is no critical valueof R^2 such that a model is acceptable. The following pair of examplesmay help to illustrate this point.%}%% % This curve fit will yield an R^2 of roughly 0.975 from my tests.% Is it a poor model? It depends upon how much you need to fit the% subtle curvature in that data.n=25;x = sort(rand(n,1));y = exp(x/2)+randn(size(x))/30;M = [ones(n,1),x];coef = M\y;yhat = M*coef;Rsqr = 1 - (norm(yhat-y)/norm(y-mean(y))).^2closeplot(x,y,'bo',x,yhat,'r-')title 'R^2 is approximately 0.975'xlabel xylabel y% Was the linear model chosen adequate? It depends on your data and% your needs. My personal opinion of the data in this example is% that if I really needed to know the amount of curvature in this% functional relationship, then I needed better data or much more% data.%%% How about this data? It yields roughly the same value of R^2 as did% the data with a linear model above.n = 25;x = linspace(0,1,n)';y = x + ((x-.5).^2)*0.6;M = [ones(n,1),x];coef = M\y;yhat = M*coef;Rsqr = 1 - (norm(yhat-y)/norm(y-mean(y))).^2plot(x,y,'bo',x,yhat,'r-')title 'R^2 is approximately 0.975'xlabel xylabel y% Is there any doubt that there is lack of fit in this model? You% may decide that a linear model is not adequate here. That is a% decision that you must make based on your goals for a model.%%%{To sum up, the intrepid modeler should look at various statisticsabout their model, not just R^2. They should look at plots of themodel, plots of residuals in various forms. They should considerif lack of fit is present. They should filter all of this informationbased on their needs and goals for the final model and their knowledgeof the system being modeled.%}%% 32. Estimation of the parameters of an implicit function%{How can we estimate the parameters for implicit functions, onesimple example might be: y = a*(y - b*x)^2The simplistic approach is to formulate this as a direct leastsquares problem for lsqnonlin. y - a*(y - b*x)^2 = 0
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -