📄 optimtips_21_36.m
字号:
Choose values of a and b that drive this expresion towards zerofor each data point (x,y).What could be wrong? The problem is the presumption is of additive(normal) error on y. The value of y that is used inside the parensis the "measured" value. Its the true value, plus any error andlack of fit.We will find that if we do try to use lsqnonlin in the way described,we should expect poor results - possibly non-convergence, especiallyis there are large residuals or your starting values are poor.%}%%% Lets start by building ourselves an implicit function that we% can deal with easily.%% y = (y-2*x)^2% build it from specified values of yn = 20;y = 0.5+5*rand(1,n);% I chose this functional form since I could still find points% on the curve itself easily.x = (y - sqrt(y))/2;% sort it on x for plotting purposes[x,tags]=sort(x);y=y(tags);% Now, lets add some error into the mix. Not too large of% an error.err = randn(1,n)/10;% Added to yye = y + err;% plot the datacloseplot(x,y,'-',x,ye,'o')%%% We need to solve for the model now. In this case I'll% presume the model to be%% y = a*(y - b*x)^2%% We wish to estimate a and b. Of course, the true values% are [1, 2] for a and b respectively.% First of all, can we just throw lsqnonlin at this implicit% function as described above?fun = @(ab) ye-ab(1)*(ye-ab(2)*x).^2;options = optimset('lsqnonlin');options.Display = 'iter';abstart = [1.5 1.5];lsqnonlin(fun,abstart,[],[],options)% Recall that the true values of a and b were [1,2], so it% did work, although not terribly well. Can we do better? %%% Our data as we know it to be is (x,ye). A good test before% we really start is to see if we can predict the error in y,% given values of a and b. From my random data, here is the% first point.[x(1),y(1),err(1),ye(1)]% First, lets see if we can solve for err(1), given the true% values of a and b.a = 1;b = 2;fun = @(e1) (ye(1)-e1 - a*((ye(1)-e1) - b*x(1)).^2)estart = .1;e = fzero(fun,estart)%%% This worked nicely. We were able to find the first additive% error to y, given the true values of the parameters.%% Had we chosen different nominal values for a and b, e would have% been different. I'll try it.a = 1.5;b = 1.5;fun = @(e1) (ye(1)-e1 - a*((ye(1)-e1) - b*x(1)).^2)estart = .1;e = fzero(fun,estart)%%% We are now ready to put this all into the domain of lsqnonlin.% I won't be able to do this with an anonymous function directly.% I've supplied the function below:% ========================================================% function epred = implicit_obj(ab,x,ye)% a = ab(1);% b = ab(2);% n = length(ye);% epred = zeros(1,n);% estart = .1;% for i = 1:n% fun = @(ei) (ye(i)-ei - a*((ye(i)-ei) - b*x(i)).^2);% epred(i) = fzero(fun,estart);% end% ========================================================% Having gone this far, I ALWAYS like to evaluate the objective function% that lsqnonlin will see, especially when its something complicated.% Does it make sense? Try it for a couple of parameter sets.epred=implicit_obj([1.5 1.5],x,ye)epred=implicit_obj([3 1],x,ye)%%% Time to call lsqnonlin now. I'll pass in the vectors x and ye% via an anonymous call, although there are many ways to do it.% As always, I use 'iter' for the Display parameter to lsqnonlin,% at least the first time I call it.options = optimset('lsqnonlin');options.Display = 'iter';ab_start = [1.5 1.5];ab_est = lsqnonlin(@(ab) implicit_obj(ab,x,ye),ab_start,[],[],options)% These parameter estimates are clearly much better than our% earlier attempt was able to achieve.%% 33. Robust fitting schemes%{Certainly a very good tool for robust fitting is robustfit from thestatistics toolbox. I'll try only to give an idea of how such a schemeworks. There are two simple approaches. - Iteratively re-weighted least squares- Nonlinear residual transformationsThe first of these, iteratively reweighted least squares, is quitesimple. We solve a series of weighted least squares problems, whereat each iteration we compute the weights from the residuals to ourprevious fit. Data points with large residuals get a low weight.This method presumes that points with large residuals are outliers,so they are given little weight.%}%%% Consider the simple problem n = 100; x = 2*rand(n,1) - 1; y0 = exp(x); % make some noise with a non-gaussian distribution noise = randn(n,1)/2; noise = sign(noise).*abs(noise).^4; y = y0 + noise; % We can wee tht the data is mostly very low noise, but there% are some serious outliers.close allplot(x,y0,'ro',x,y,'b+') %%% Fit this curve with a fourth order polynomial model, of the form% % y = a1+a2*x+a3*x^2+a4*x^3+a5*x^4 % % using this fitting scheme. %% As a baseline, use polyfit with no weights.polyfit(x,y0,4) % Remember that the Taylor series for exp(x) would have had its% first few terms as[1/24 1/6 1/2 1 1]% So polyfit did reasonably well in its estimate of the truncated% Taylor series.%% % Note how poorly polyfit does on the noisy data. polyfit(x,y,4)%%% How would an iteratively reweighted scheme work?% initial weightsweights = ones(n,1);% We will use lscov to solve the weighted regressionsA = [x.^4,x.^3,x.^2,x,ones(size(x))];% be lazy and just iterate a few timesfor i = 1:5 poly = lscov(A,y,weights)' % residuals at the current step resid = polyval(poly,x) - y; % find the maximum residual to scale the problem maxr = max(abs(resid)); % compute the weights for the next iteration. I'll % just pick a shape that dies off to zero for larger % values. There are many shapes to choose from. weights = exp(-(3*resid/maxr).^2);end% This was much closer to our fit with no noise%%% An alternative is to transform the residuals so as to use a general% nonlinear optimization scheme, such as fminsearch. I'll transform% the residuals using a nonlinear transformation that will downweight% the outliers, then form the sum of squares of the result.% The transformation functionWfun = @(R) erf(R);% fminsearch objectiveobj = @(c) sum(Wfun(y-polyval(c,x)).^2); % Again, this estimation was less sensitive to the very non-normal% noise structure of our data than the simple linear least squares.poly = fminsearch(obj,[1 1 1 1 1]) %% 34. Homotopies%{Homotopies are an alternative solution for solving hard nonlinearproblems. This book was a good reference as I recall, though itappears to be out of print. Garcia and Zangwill, "Pathways to Solutions, Fixed Points andEquilibria", Prentice Hall Think of a homotopy as a continuous transformation from a simpleproblem that you know how to solve to a hard problem that you cannotsolve. For example, suppose we did not know how to solve for x, givena value of y0 here: y0 = exp(x) Yes, I'm sure that with only a few hours of work I can figure out howto use logs. But lets pretend for the moment that this is really ahard problem. I'll formulate the homotopy as H(x,t) = (y0 - exp(x))*t + (y0 - (1+x))*(1-t) So when t = 0, the problem reduces to H(x,0) = y0 - (1+x) This is a problem that we know the root of very well. I used a first order Taylor series approximation to the exponential function. So theroot for t=0 is simple to find. It occurs at x = y0 - 1 As we move from t=0 to t=1, the problem deforms continuously into the more nonlinear one of our goal. At each step, the starting value comes from our last step. If we take reasonably small steps, each successive problem is quickly solved, even if we did not know of agood starting value for the hard problem initially. Lets try it out for our problem, in a rather brute force way. %}%%% Define the homotopy functionH = @(x,t,y0) (y0-exp(x))*t + (y0-(1+x))*(1-t);y0 = 5; % We hope to converge to the solution at% % log(5) == 1.6094379124341 format long gx_t_initial = 0; for t = 0:.1:1 x_t = fzero(@(x) H(x,t,y0),x_t_initial); disp([t,x_t_initial,x_t]) x_t_initial = x_t; end % The final step got us where we wanted to go. Note that each% individual step was an easy problem to solve. In fact, the first% step at t=0 had a purely linear objective function. log(5)%%% Look at how the homotopic family of functions deforms from our% simple linear one to the more nonlinear one at the end.close allfigurefor t = 0:.1:1 fplot(@(x) H(x,t,y0),[0,4]) hold onendgrid onhold off%%% This was admittedly a very crude example. You can even formulate% a homotopy solution that generates the multiple solutions to problem. %% Consider the problem% % sin(x) - x/5 = 0%% We wish to find all solutions to this problem. Formulate a homotopy% function as % % H(x,t) = (sin(x) - x/5) - (1-t)*(sin(x0)-x0/5)%% Where x0 is our starting point. Choose x0 = -6.%% Note that at (x,t) = (x0,0), that H is trivially zero. Also, at% t = 1, H(x,t) is zero if x is a solution to our original problem.%% We could also have used a different homotopy.%% H(x,t) = t*(sin(x) - x/5) - (1-t)*(x-x0)x0 = -10;H = @(x,t) (sin(x) - x/5) - (1-t).*(sin(x0)-x0/5);% Generate a grid in (x,t)[x,t] = meshgrid(-10:.1:10,0:.01:1);% The solutions to sin(x) - x/5 will occur along a path (x(p),t(p))% whenever t = 1. Solve for that path using a contour plot.contour(x,t,H(x,t),[0 0])grid onxlabel 'x(p)'ylabel 't(p)'%%% One can also formulate a differential equation to be solved% for the homotopy path.% % H = @(x,t) (sin(x) - x/5) - (1-t).*(sin(x0)-x0/5);x0 = -10;% The Jacobian matrixdHdx = @(x,t) cos(x) - 1/5;dHdt = @(x,t) sin(x0) - x0/5;% Basic differential equations (see Garcia & Zangwill)fun = @(p,xt) [dHdt(xt(1),xt(2));-dHdx(xt(1),xt(2))];% Solve using an ode solversolution = ode15s(fun,[0,6],[x0;0]);% This curve is the same as that generated by the contour plot above.% Note that we can find the roots of our original function by% interpolating this curve to find where it crosses t==1.plot(solution.y(1,:),solution.y(2,:))hold onplot([min(solution.y(1,:)),max(solution.y(1,:))],[1 1],'r-')hold offgrid on% These examples are just a start on how one can use homotopies to% solve problems.%% 35. Orthogonal polynomial regression%{Orthogonal polynomials are often used in mathematical modelling.Can they be used for regression models? (Of course.) Are they ofvalue? (Of course.) This section will concentrate on one narrowaspect of these polynomials - orthogonality.Consider the first few Legendre orthogonal polynomials. We canfind a list of them in books like Abramowitz and Stegun, "Handbookof Mathematical functions", Table 22.9.P0(x) = 1P1(x) = xP2(x) = 3/2*x.^2 - 1/2P3(x) = 5/2*x.^3 - 3/2*xP4(x) = 4.375*x.^4 - 3.75*x.^2 + 3/8P5(x) = 7.875*x.^5 - 8.75*x.^3 + 1.875*x...These polynomials have the property that over the interval [-1,1],they are orthogonal. Thus, the integral over that interval ofthe product of two such polynomials Pi and Pj will be zerowhenever i and j are not equal. Whenever i == j, the correspondingintegral will evaluate to 1.However, the mere use of orthogonal polynomials on discrete datawill not generally result in orthogonality in the linear algebra.For example ...%}% We can ortho-normalize the polynomials above by multiplying% the i'th polynomial by sqrt((2*i+1)/2)P0 = @(x) ones(size(x))*sqrt(1/2);P1 = @(x) x*sqrt(3/2);P2 = @(x) (3/2*x.^2 - 1/2)*sqrt(5/2);P3 = @(x) (5/2*x.^3 - 3/2*x)*sqrt(7/2);P4 = @(x) (3/8 - 3.75*x.^2 + 4.375*x.^4)*sqrt(9/2);P5 = @(x) (1.875*x - 8.75*x.^3 + 7.875*x.^5)*sqrt(11/2);% Thus if we integrate over the proper domain, this one should% return zero (to within the tolerance set by quad):quad(@(x) P1(x).*P4(x),-1,1)%%% But this integral should return unity:quad(@(x) P3(x).*P3(x),-1,1)%%% Now, lets look at what will happen in a regression context.% Consider an equally spaced vector of pointsx0 = (-1:.001:1)';% Do these polynomials behave as orthogonal functions when% evaluated over a discrete set of points? NO.A = [P0(x0),P1(x0),P2(x0),P3(x0),P4(x0),P5(x0)];% Note that the matrix A'*A should be a multiple of an identity% matrix if the polynomials are orthogonal on discrete data. Here we% see that A'*A is close to an identity matrix.A'*A%%% An interesting thing to try is this same operation on a carefully% selected set of points (see Engels, "Numerical Quadrature and Cubature",% Academic Press, 1980, Table 2.4.3, page 58.)x = [0.16790618421480394; 0.52876178305787999;... 0.60101865538023807; 0.91158930772843447];x = [-x;0;x];A = [P0(x),P1(x),P2(x),P3(x),P4(x),P5(x)];% This time, A'*A has all (essentially) zero off-diagonal terms,% reflecting the use of a set of quadrature nodes for the sample points.A'*A% The point to take from this excursion is that orthogonal polyomials% are not always orthogonal when sampled at dscrete points.%% 36. Potential topics to be added or expanded in the (near) future%{I've realized I may never finish writing this document. Therewill always be segments I'd like to add, expound upon, clarify.So periodically I will repost my current version of the doc,with the expectation that I will add the pieces below as themuse strikes me. My current planned additions are:- Scaling problems - identifying and fixing them- More discussion on singular matrix warnings- Multi-criteria optimization- Using output functions- List of references- Jackknife/Bootstrap examples for confidence intervals on aregression%}close all
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -