⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 optimtips_16_31.m

📁 maltab优化工具箱的一些小例子 主要介绍最优化的相关知识 欢迎大家共同探讨
💻 M
📖 第 1 页 / 共 4 页
字号:
provides simple variance estimates for each parameter, assumingindependence between the parameters. Large (in absolute value)off-diagonal terms in the covariance matrix will indicate highlycorrelated parameters.In the event that only the diagonal of the covariance matrix isrequired, we can compute it without an explicit inverse of A'*A,and in way that is both computationally efficient and as wellconditioned as possible. Thus if one solves for the solution toA*x=y using a qr factorization as   x = R\(Q*y)then recognize that   inv(A'*A) = inv(R'*R) = inv(R)*inv(R') = inv(R)*inv(R)'If we have already computed R, this will be more stablenumerically. If A is sparse, the savings will be more dramatic.There is one more step to take however. Since we really wantonly the diagonal of this matrix, we can get it as:  diag(inv(A'*A)) = sum(inv(R).^2,2)%}%%% Compare the two approaches on a random matrix.A=rand(10,3);diag(inv(A'*A))[Q,R]=qr(A,0);sum(inv(R).^2,2)% Note that both gave the same result.%%% As a test, run 1000 regressions on random data, compare how% many sets of 95% confidence intervals actually contained the% true slope coefficient.m = 1000;  % # of runs to verify confidence intervalsn = 100;   % # of data points% ci will contain the upper and lower limits on the slope% parameter in columns 1 and 2 respectively.ci = zeros(m,2);p = 2;for i=1:m  x = randn(n,1);  y = 1+2*x + randn(n,1);  % solve using \  M = [ones(n,1),x];  coef = M\y;    % the residual vector  res = M*coef - y;    % yes, I'll even be lazy and use inv. M is well conditioned  % so there is no fear of numerical problems, and there are  % only 2 coefficients to estimate so there is no time issue.  s2 = sum(res.^2)/(n-p);    sigma = s2*diag(inv(M'*M))';    % a quick excursion into the statistics toolbox to get  % the critical value from tinv for 95% limits:  % tinv(.975,100)  % ans =  %    1.9840  ci(i,:) = coef(2) + 1.984*sqrt(sigma(2))*[-1 1];end% finally, how many of these slope confidence intervals% actually contained the true value (2). In a simulation% with 1000 events, we expect to see roughly 950 cases where% the bounds contained 2.sum((2>=ci(:,1)) & (2<=ci(:,2)))% 950 may not have been too bad a prediction after all.%%% Given a complete covariance matrix estimate, we can also compute% uncertainties around any linear combination of the parameters.% Since a linear regression model is linear in the parameters,% confidence limits on the predictions of the model at some point% or set of points are easily enough obtained, since the prediction% at any point is merely a linear combination of the parameters.% An example of confidence limits around the curve on a linear% regression is simple enough to do.n = 10;x = sort(rand(n,1))*2-1;y = 1 + 2*x + 3*x.^2 + randn(n,1);M = [ones(n,1),x,x.^2];coef = M\y;% Covariance matrix of the parametersyhat = M*coef;s2 = sum((y - yhat).^2)/(n-3);sigma = s2*inv(M'*M)% Predictions at a set of equally spaced points in [0,1].x0 = linspace(-1,1,101)';M0 = [ones(101,1),x0,x0.^2];y0 = M0*coef;V = diag(M0*sigma*M0');% Use a (2-sided) t-statistic for the intervals at a 95% level% tinv(.975,100)% ans =%    1.9840tcrit = 1.9840;close allplot(x,y,'ro')hold onplot(x0,y0,'b-')plot(x0,[y0-tcrit*V,y0+tcrit*V],'g-')hold off%% % Equality constraints are easily dealt with, since they can be% removed from the problem (see section 25.) However, active% inequality constraints are a problem! For a confidence limit,% one cannot assume that an active inequality constraint was truly% an equality constraint, since it is permissible to fall any tiny% amount inside the constraint.% % If you have this problem, you may wish to consider the jackknife% or bootstrap procedures.%{---------------------------------------------------------------28. Confidence limits on the parameters in a nonlinear regressionA nonlinear regression is not too different from a linear one. Yes,it is nonlinear. The (approximate) covariance matrix of the parameters is normally derived from a first order Taylor series. Thus, of J isthe (nxp) Jacobian matrix at the optimum, where n is the number ofdata points, and p the number of parameters to estimate, thecovariance matrix is just  S = s2*inv(J'*J)where s2 is the error variance. See section 27 for a descriptionof how one may avoid some of the problem with inv and forming J'*J,especially if you only wish to compute the diagonals of this matrix.What you do not want to use here is the approximate Hessian matrixreturned by an optimizer. These matrices tend to be formed fromupdates. As such, they are often only approximations to the trueHessian at the optimum, at the very least lagging behind a fewiterations. Since they are also built from rank 1 updates to theHessian, they may lack the proper behavior in some dimensions.(For example, a lucky starting value may allow an optimizationto converge in only one iteration. This would also update thereturned Hessian approximation in only 1 dimension, so probably aterrible estimate of the true Hessian at that point.)%}%%%{---------------------------------------------------------------29. Quadprog example, unrounding a curveA nice, I'll even claim elegant, example of the use of quadprogis the problem of unrounding a vector. Code for this in the form ofa function can be found on the file exchange as unround.mhttp://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=8719&objectType=FILEConsider a vector composed of some smooth function evaluatedat consecutive (equally spaced) values of its independent variable.Then the function values are rounded to the nearest integer. Canwe recover the original smooth function values? Clearly we cannotdo so exactly for an arbitrary function, but we can attempt to findthe smoothest set of function values that are consistent with thegiven rounded set.We will treat each element of the set as an unknown, so a vectorof length n will have n unknowns to solve for.%}%% % A simple function of xn=101;x = linspace(0,1,n);v0 = (10*x.^2);% rounded vectorvr = round(v0);closeplot(x,v0,'r-',x,vr,'bo')title 'Base functional form, with rounding'xlabel 'x'ylabel 'y'%%% bound constraints. since vr is composed of all integers, then% the origin, unrounded values must lie within simple bounds% from our rounded set.LB = vr - 0.5;UB = vr + 0.5;% sparse system representing smoothness. Consider any row of S.% When multiplied by the vector of unknowns, it computes a% second order finite difference of the vector around some point.% This, by itself cannot be used in quadprog, because S is not% an appropriate quadratic form.S = spdiags(repmat([1 -2 1],n-2,1),[0 1 2],n-2,n);% H is positive semi-definite. Quadprog will be happy.H = S'*S;f = zeros(n,1);% Solve the quadratic programming problem. Make sure we use the% largescale solver. Note that we have carefully constructed% H to be sparse. The largescale solver will allow simple bound% constraints. As important as the constraints, when LargeScale% is 'on', quadprog can handle quite large problems, numbering% into many thousands of variables. Note that I have supplied H% as a sparse matrix. It is tightly banded.options = optimset('quadprog');options.LargeScale = 'on';options.Display = 'final';vur = quadprog(H,f,[],[],[],[],LB,UB,[],options);plot(x,v0,'r-',x,vr,'bo',x,vur,'k--')legend('Base curve','Rounded data','Un-rounded curve','Location','NorthWest')xlabel 'x'ylabel 'y'% Clearly the two curves overlay quite nicely, only at the ends% do they deviate from each other significantly. That deviation% is related to the behavior of a natural spline, despite the% fact that we never explicitly defined a spline model. In effect,% we minimized the approximate integral of the square of the second% derivative of our function. It is this integral from which cubic% splines are derived.%%%{---------------------------------------------------------------30. R^2R^2 is a measure of the goodness of fit of a regression model. Whatdoes it mean? We can write R^2 in terms of the vectors y (our data)and yhat (our model predictions) as  R^2 = 1 - ( norm(yhat-y) / norm(y-mean(y)) )^2Intuitively, we might describe the term "norm(y-mean(y))" as the total information content of our data. Likewise, the term "norm(yhat-y)"is the amount of signal that remains after removing the predictionsof our model. So when the model describes the data perfectly, thisratio will be zero. Squared and subtracted from 1, a perfect fit willyield an R^2 == 1.At the other end of the spectrum, a fit that describes the data nobetter than does the mean will yield a zero value for R^2. It alsosuggests that a model with no mean term can produce a negativevalue for R^2. The implication in the event of a negative R^2 is themodel is a poorer predictor of the data than would be a simple mean.Finally, note that the formula above is not specific to a linearregression model. R^2 may apply as well to a nonlinear regressionmodel.When do we use R^2? In the words of Draper & Smith, R^2 is the firstthing they might look at when perusing the output of a regressionanalysis. It is also clearly not the only thing they look at. My pointis that R^2 is only one measure of a model. There is no critical valueof R^2 such that a model is acceptable. The following pair of examplesmay help to illustrate this point.%}%% % This curve fit will yield an R^2 of roughly 0.975 from my tests.% Is it a poor model? It depends upon how much you need to fit the% subtle curvature in that data.n=25;x = sort(rand(n,1));y = exp(x/2)+randn(size(x))/30;M = [ones(n,1),x];coef = M\y;yhat = M*coef;Rsqr = 1 - (norm(yhat-y)/norm(y-mean(y))).^2closeplot(x,y,'bo',x,yhat,'r-')title 'R^2 is approximately 0.975'xlabel xylabel y% Was the linear model chosen adequate? It depends on your data and% your needs. My personal opinion of the data in this example is% that if I really needed to know the amount of curvature in this% functional relationship, then I needed better data or much more% data.%%% How about this data? It yields roughly the same value of R^2 as did% the data with a linear model above.n = 25;x = linspace(0,1,n)';y = x + ((x-.5).^2)*0.6;M = [ones(n,1),x];coef = M\y;yhat = M*coef;Rsqr = 1 - (norm(yhat-y)/norm(y-mean(y))).^2plot(x,y,'bo',x,yhat,'r-')title 'R^2 is approximately 0.975'xlabel xylabel y% Is there any doubt that there is lack of fit in this model? You% may decide that a linear model is not adequate here. That is a% decision that you must make based on your goals for a model.%%%{To sum up, the intrepid modeler should look at various statisticsabout their model, not just R^2. They should look at plots of themodel, plots of residuals in various forms. They should considerif lack of fit is present. They should filter all of this informationbased on their needs and goals for the final model and their knowledgeof the system being modeled.%}%{---------------------------------------------------------------31. Potential topics to be added or expanded in the (near) futureI've realized I may never finish writing this document. Therewill always be segments I'd like to add, expound upon, clarify.So periodically I will repost my current version of the doc,with the expectation that I will add the pieces below as themuse strikes me. My current planned additions are:- Scaling problems - identifying and fixing them- More discussion on singular matrix warnings- Multi-criteria optimization- Using output functions- Examples for section 21 - the actual sequence of functionevaluations chosen for fminsearch and fminunc on a simplefunction.- List of references- Jackknife/Bootstrap examples for confidence intervals on aregression%}close all

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -