📄 optimtips.m
字号:
There are tricks one can use to choose good starting values.Section 5 in this text discusses the idea of a linearizationby logging a model. This will probably give reasonable startingestimates. Other linearizations might not be so helpful however.For example, a simple first order Taylor series approximationto an objective will not show any real gain when used to choosestarting values for a Newton derived method. This is becausemethods like Gauss-Newton simply use that same truncated Taylorseries approximation in their iterations. So that Taylor seriesapproximation was nothing more than an initial iteration of theNewton's method. Starting values derived in this fashion willprobably gain no more than a single iteration of the optimizer.See the section on global optimization and basins of attractionfor more on these topics.%}%{---------------------------------------------------------------8. Before you have a problem There are many options for the different solvers in the optimizationtoolbox. These are all available through the function optimset.The very first (and most important) option that one should learnto use is the 'Display' option. I make it a point that virtuallyEVERY optimization I do always runs initially with this propertyset to 'iter'. Only when I'm comfortable that the code is runningproperly do I revert this setting to one less verbose.Another useful tool provided in the optimset options is theDerivativeCheck property. When set to 'on', it computes a finitedifference approximation to a user supplied gradient, then comparesthe two and reports significant differences.%}%{---------------------------------------------------------------9. Tolerances & stopping criteriaA very common question posed by beginner optimizers is "What dothe different tolerance parameters mean?" I'll offer an explanation,but sometimes doing is better, so I'll build some examples too.Of course, the meaning of these parameters can sometimes be subtlydifferent as they apply to the different optimizers.There are many ways to stop an optimization. You can control thetotal number of function evaluations allowed, or the total numberof iterations. These two parameters are really a check on thingsgetting out of control. Usually when your optimization stopsbecause one of these limits is exceeded, its because there is aproblem of some sort. Perhaps the objective function is particularlynasty, or the optimization is diverging to infinity on an unboundedproblem. If your problem exceeds these limits, first check to seeif there is a reason before just increasing the limits.The other main stopping criteria are on the parameter values andon the objective function values. We can choose to stop if theparameter values are unchanging, or if the objective functionis unchanging. (In the case of a root finding problem, this wouldbe a test to see if the objective was close enough to zero.)%}%%% MaxIter versus MaxFunEvals:% The Rosenbrock function is a nice test of an optimizer. It has% a global minimizer at (x,y) = (1,1), and a curved valley.rosen = @(xy) (1-xy(1)).^2 + 105*(xy(2)-xy(1).^2).^2;xystart = [2 3];xyfinal=fminunc(rosen,xystart,optimset('disp','final'))% Note the warning message: since we did not provide a gradient,% the default trust region method will not run. We could have% preempted this call by turning it off in the first place.%%% This time I forced the use of the older style line search method% on fminunc. Lets look at the output of fminunc to understand what% is happening. I've intentionally turned the output on full to see% the meaning of these parameters.xyfinal=fminunc(rosen,xystart,optimset('disp','iter','LargeScale','off'))% Line search methods are fairly simple. At each iteration they% estimate the gradient using finite differences (a forward finite% difference, costing n additional function evaluations in n-dimensions.)% Then the code decides which direction to search along - a line search.% Finally, the code (approximately) minimizes the objective function% along this linear path through the parameter space. I.e., approximately% minimize the function with respect to alpha: f(xk + alpha*d), where % xk is the current iterate, d is the search direction, and alpha is% the stepsize.%% One nicety is that this line search need not be perfect. As long as% the objective is sufficiently improved along the search direction% before each iteration terminates, the overall optimization will% still converge.%% So each iteration of the optimizer will often require roughly n% function evaluations to compute the gradient, plus a few more to% do the line search. As the optimization proceeds, the optimizer begins% to "learn" the local general shape of the surface. Once enough such% learning has taken place, this often results in the first guess at% the length of the step it should take along the line to be a very% good guess. Note that in our example, the first step taken by the % optimizer was very short, but after that first iteration, most step% lengths were 1.0.%%% Eventually fminunc terminated on this function when the norm of% the gradient was smaller than indicated by TolFun. It had taken% 24 iterations, and 93 function evaluations to find the solution% on my computer. (Your exact numbers may vary on different computers% or versions of matlab or the optimization toolbox.)%% Had we lowered either MaxIter or MaxFunEvals, the optimization% would have terminated early with a warning to that effect.options = optimset('disp','iter','LargeScale','off','MaxIter',10);xyfinal=fminunc(rosen,xystart,options)% Note that this solution may be inadequate for your purposes.%% A good clue to the existence of a problem is the norm of the% gradient, as reported in the last column of the output. Since% a minimizer of a smooth objective function must have zero gradient% (unless there are constraints involved) a large gradient norm% may indicate that the optimization has terminated too soon.% (This value is available to the user in output.firstorderopt% from all nonlinear solvers.)%% This is not always true for optimizations which hit the limit% on maximum iterations, but I would always consider whether the% iteration limit may have been set too low on any problem which% hits this boundary.%%% TolX: The default value for TolX is 1.e-6. We can change TolX% to see what happens.options = optimset('disp','iter','LargeScale','off','TolX',.001);xyfinal=fminunc(rosen,xystart,options)% Remember that the true minimizer is [1,1]. By greatly widening% the tolerance TolX to 1.e-3, the optimization has stopped very% early. The message that we see is an indication that fminunc% is stopping for this reason, although it is NOT true that fminunc% is confident that we are within 1.e-3 of the solution. %%% TolFun: fminunc treats TolFun as a tolerance on the gradient.% its default value is also 1e-6. Now widen TolFun to 1e-3.options = optimset('disp','iter','LargeScale','off','TolFun',.001);xyfinal=fminunc(rosen,xystart,options)% Again, the optimization terminates early and far away from the% true solution.%%% However, there is NO exact relationship between these tolerances% and the final error in your result.%% In fact, tolerances are the cause of many errors of understanding% among new users of optimization tools. A parameter like TolX does% not ensure that the final estimated solution is within TolX of the% true value, or even within TolX of the global optimum.%% Likewise, reducing the value of TolFun need not reduce the error% of the fit. If an optimizer has converged to its global optimum,% reducing these tolerances cannot produce a better fit. Blood cannot% be obtained from a rock, no matter how hard one squeezes. The rock% may become bloody, but the blood came from your own hand.%%%{---------------------------------------------------------------10. Common optimization problems & mistakesInternal discretizationOptimizations that have one or more of their variables as discreteentities cannot be done using solvers (such as fminsearch, fminunc,fmincon, etc.) that assume continuous variables. Sometimes however(imaging applications are especially likely to do this) there isalso an internal discretization in the model. Anything thatintroduces such step discontinuities into an objective functionwill also invalidate the use of those same optimizers. You maybe limited to algorithms (genetic algorithms, simulated annealing,etc.) which are tolerant of such behaviors.Discontinuities, derivative discontinuitiesAs with the case of internal discretization, even the use of anabsolute value function inside an objective function can causea slope discontinuity of the objective function. Remember that anoptimizer treats its objective like a black box. It can interrogatethe box with various sets of parameters to see the result, but itdoes not truly understand anything about its objective function,and it does not know about or expect non-smooth behavior in thatobjective. Penalty function methods are another source of such problems, ifthe objective is set to an arbitrary large value when the parameterslie beyond some boundary.Problems with unavoidable discontinuities are best left to thedomain of "simpler" tools, such as fminsearch, or even to theprobabilistic tools such as genetic algorithms or simulatedannealing.Complex objectivesOptimizing over variables in the complex plane is best done bysplitting the variables into their real and imaginary parts.Remember that a complex variable is really a vector quantity.In the case of complex objectives, the codes in the optimizationtoolbox all assume a real valued objective.In some cases an objective function suddenly becomes unintentionallycomplex, almost always due to a variable exceeding some naturalmathematical boundary. log(x) or sqrt(x), where x is unconstrained,are common causes of this problem. The solution here is to employconstraints on the optimizer to prevent overrunning that boundary.A nice tool for debugging the occurrence of Inf, NaN, or complexnumbers in an objective is the FunValCheck property. When set to'on', it will stop the optimization with an error message indicatingwhat happened. Since fminsearch and fminbnd can handle a value ofinf, FunValCheck will not catch inf for those optimizers.Singular matricesMost of the optimizers in the optimization toolbox use linearalgebra for their inner workings. The main exception of courseis fminsearch, which uses a Nelder/Mead polytope algorithm. While that algorithm will never generate a singular matrixwarning, it still will be harmed by the same problem flawsthat induce these warnings in other methods.The singular matrix or rank deficiency warnings that appearfrom matlab optimizers are due to a variety of reasons. Usuallyit is a sign of a potential problem.(More depth to come on this subject.)%}%{---------------------------------------------------------------11. Partitioned least squares estimationYou can find a description of partitioned or separable leastsquares in the book by Seber and Wild, "Nonlinear Regression".This url also describes the technique: http://64.233.161.104/search?q=cache:KmOn6r0gmn0J:www.statsci.org/smyth/pubs/eoe_nr.pdf+watts+partitioned+least+squares&hl=en&client=safariLets return to that favorite model of ours - the exponential.We'll be creative and add in a constant term. y = a + b*exp(c*t) + noiseThere are three parameters to estimate in this model, but theyare of two distinct fundamental classes. The parameters a andb are what we will call intrinsically linear parameters. Theyenter into the model in a linear fashion, in fact, this sectionwill show you how to use linear regression to estimate them.The third parameter, c, would be called intrinsically nonlinear.It enters into the model in a nonlinear fashion. Lets make upsome more data.%}%%t = sort(rand(100,1)*2-1);y = 2 - 1*exp(-1.5*t) + randn(size(t))/10;closeplot(t,y,'o')title 'Variable partitioning data'xlabel 't'ylabel 'y'%%% and fit with lsqcurvefitfun = @(coef,t) coef(1) + coef(2)*exp(coef(3)*t)start = [1 2 3];coef0 = lsqcurvefit(fun,start,t,y)yhat = fun(coef0,t);plot(t,y,'bo',t,yhat,'r-')title 'Convergence to a bad spot, due to a poor choice of starting values'xlabel 't'ylabel 'y'%%% We talked about poor starting values before. Here is a good% example. When we started off with the wrong sign for coef(3)% above, lsqcurvefit did not converge to the correct solution.% Lets try it again, just changing the sign on that third% coefficient. I'll set the display option to 'iter', so we% can see the number of iterations and function evals.start = [1 2 -3];options = optimset('disp','iter');coef0 = lsqcurvefit(fun,start,t,y,[],[],options)yhat = fun(coef0,t);plot(t,y,'bo',t,yhat,'r-')title 'A good fit, due to a better choice of starting values'xlabel 't'ylabel 'y'% This time it should have worked nicely. Its a common problem% with the sign of a parameter, especially common in the case of% exponential "rate" parameters. %%% Suppose however, I told you the value of the intrinsically% nonlinear parameter that minimized the overall sums of squares% of the residuals. If c is known, then all it takes is a simple% linear regression to compute the values of a and b, GIVEN the% "known" value of c. In essence, we would compute a and b% conditionally on the value of c. We will let an optimizer% choose the value of c for us, but our objective function will% compute a and b. You should recognize that since the linear% regression computes a sum of squares minimizer itself, that% when the optimization has converged for the intrinsically% nonlinear parameter that we will also have found the overall% solution.% % The function pleas is a wrapper around the lsqnonlin function,% it takes a list of functions, one for each intrinsically% linear parameter in the model. This model is of the form%% y = a*1 + b*exp(c*t)%% So there are two linear parameters (a,b), and one nonlinear% parameter (c). This means there will be a pair of functions% in funlist, the scalar constant 1, and exp(c*t)funlist = {1, @(coef,t) exp(coef*t)}; NLPstart = -3;options = optimset('disp','iter');[INLP,ILP] = pleas(funlist,NLPstart,t,y,options)% Note how much more rapidly pleas/lsqnonlin has converged% (with only 1 nonlinear parameter to estimate) compared to% lsqcurvefit (with 3 parameters it had to estimate.)%% Also note that the final sum of squares for pleas was slightly% lower than we saw for lsqcurvefit. This is may be due to % the use of a linear regression to estimate better values for% the intrinsically linear parameters.% plot the curveyhat = fun([ILP;INLP],t);plot(t,y,'bo',t,yhat,'r-')title 'Pleas is faster, and robust against poor starting values'xlabel 't'ylabel 'y'%%% We can verify that pleas is more robust to poor starting% values too. Start it with the rate parameter with a% different sign.NLPstart = 3;[INLP,ILP] = pleas(funlist,NLPstart,t,y,options)%%%{---------------------------------------------------------------12. Errors in variables regressionKnowing your data is important in any modeling process. Wheredoes the error arise? The traditional regression model assumesthat the independent variable is known with no error, and thatany measurement variability lies on the dependent variable.We might call this scenario "errors in y". Our model is of theform y = f(x,theta) + noisewhere theta is the parameter (or parameters) to be estimated.In other cases, the independent variable is the one with noisein it. y = g(x + noise,theta)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -