⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 optimtips_21_36.m

📁 New users and old of optimization in MATLAB will find useful tips and tricks in this document, as we
💻 M
📖 第 1 页 / 共 3 页
字号:
%% 21. Mixed integer/discrete problems%{Many problems are inherently discrete. I won't get into that classof problems at all. However there are also problems of mixed class.Here one or more of the variables in the optimization are limitedto a discrete set of values, while the rest live in a continuousdomain. There are mixed solvers available, both on Matlab centraland from other sources.What do you do when there are only a few possible discrete statesto investigate? Probably best here is to simply fix the discretevariables at each possible state, then solve for the continuousvariables. Then choose the solution which is best overall.%}%% 22. Understanding how they work%{While I will not even try to give a full description of how anyoptimizer works, a little bit of understanding is worth a tremendousamount when there are problems. True understanding can only comefrom study in some depth of an algorithm. I can't offer that here.Instead, I'll try to show the broad differences between some of themethods, and suggest when one might use one method over another.Previously in this text I've referred to optimizers as tools thatoperate on a black box, or compared them to a blindfolded individualon a hillside. These are useful analogies but these analogies don'treally tell enough to visualize how the tools work. We'll starttalking about the method used in fminsearch, since it may wellbe one of the tools which is used most often.Fminsearch is a Nelder/Mead polytope algorithm. Some call it asimplex method, but this may confuse things with linear programming.It works by evaluating the objective function over a polytope ofpoints in the parameter space. In 2-dimensions, this will be atriangle, in n-dimensions, the polytope (or simplex) will becomposed of n+1 points. The basic algorithm is simple. Comparethese n+1 points, and choose the WORST one to delete. Replace thisbad point with its reflection through the remaining points inthe polytope. You can visualize this method as flopping a trianglearound the parameter space until it finds the optimum. Whereappropriate, the polytope can shrink or grow in size.I'll note that this basic Nelder/Mead code is simple, it requiresno gradient information at all, and can even survive an occasionalslope or function discontinuity. The downside of a polytope methodis how slowly it will converge, especially for higher dimensionalproblems. I will happily use fminsearch for 2 or 3 variable problems,especially if its something I will need to use infrequently. I willrarely if ever consider fminsearch for more than 5 or 6 variables.(Feel free to disagree. You may have more patience than I do.)The other point to remember about a polytope method is the stoppingcriterion. These methods are generally allowed to terminate whenall the function values over the polytope have the same valuewithin the function tolerance. (I have also seen the standarddeviation of the function values over the polytope used to compareto the function tolerance.)We can contrast the polytope algorithm to the more traditionalNewton-like family of methods, combined with a line search. Thesemethods include many of the optimizers in the optimization toolbox,such as fminunc, lsqcurvefit, fsolve, etc. Whereas a polytope methodlays down a polytope on the objective function surface, then movesthe polytope around, these line-search methods attempt to be moreintelligent in their operation. The basic idea is to form a locallylinear approximation to the problem at hand, then solve the linearizedproblem exactly. This determines a direction to search along, plusa predicted step length in that direction from your current pointin the parameter space. (I already discussed line searches in section9.) This is why these tools require a gradient (or Jacobian asappropriate) of the objective. This derivative information is usedto compute the necessary locally linear approximation.The other feature of interest about these methods is how they "learn"about the surface in question. The very first iteration taken mighttypically be a steepest descent step. After the first iteration, theoptimizer can gradually form an adaptive approximation to the localHessian matrix of the objective. This, in effect tells the optimizerwhat the local curvature of the surface looks like.%}%%% As a comparison, we wll look again to the Rosenbrock function.rosen = @(X) (1-X(:,1)).^2 + 105*(X(:,2)-X(:,1).^2).^2;% first, generate a contour surface[xc,yc] = meshgrid(-7:.05:7);zc = rosen([xc(:),yc(:)]);zc = reshape(zc,size(xc));close allcontour(xc,yc,zc,[.1 1 4 16 64 256 1024 4096])hold on% now do an optimization using optimplot to plot the pointsopts = optimset('fminsearch');opts.OutputFcn = @optimplot;opts.Display = 'iter';Xfinal = fminsearch(rosen,[-6,4],opts);hold off%%% try the same optimization using fminuncclosecontour(xc,yc,zc,[.1 1 4 16 64 256 1024 4096])hold on% now do an optimization using optimplot to plot the pointsopts = optimset('fminunc');opts.OutputFcn = @optimplot;opts.Display = 'iter';Xfinal = fminunc(rosen,[-6,4],opts);hold off% Note the difference between these two optimizers. Its most obvious% at the start, when fminsearch started out with a relatively large% simplex, which more slowly flopped down to the valley than did% fminunc. Fminsearch also clearly overshoots the valley at first,% then recovers. Also note that when it did hit the bottom of the% valley, fminunc was able to take at least a few large initial steps,% until the valley became too curved and too flat that it also was% forced into short steps too.%% 23. Wrapping an optimizer around quad%{Nested optimizations or nesting an integration inside anoptimization are similar problems. Both really just a problemof managing parameter spaces. I.e., you must ensure that eachobjective function or integrand sees the appropriate set ofparameters. In this example, I'll formulate an optimization which has anintegration at its heart. %}%%% Choose some simple function to integrate. % %   z = (coef(1)^2)*x^2 + (coef(2)^2)*x^4%% I picked this one since its integral over the% interval [-1 1] is clearly minimized at coef = [0,0].K = @(x,coef) coef(1).^2*x.^2 + coef(2).^2*x.^4;% As far as the integration is concerned, coef is a constant% vector. It is invariant. And the optimization really does% not want to understand what x is, as far as the optimizer% is concerned, its objective is a function of only the% parameters in coef.% We can integrate K easily enough (for a given set of% coefficients (coef) viacoef = [2 3];quad(K,-1,1,1e-13,[],coef)% We could also have embedded coef directly into the% anonymous function K, as I do below.%%% As always, I like to make sure that any objective function% produces results that I'd expect. Here we know what to% expect, but it never hurts to test our knowledge. I'll% change the function K this time to embed coef inside it.% If we try it at [0 0], we see the expected result, i.e., 0.coef = [0 0];K = @(x) coef(1).^2*x.^2 + coef(2).^2*x.^4;quad(K,-1,1,1e-13)%%% I used a tight integration tolerance because we will% put this inside an optimizer, and we want to minimize% any later problems with the gradient estimation.% Now, can we minimize this integral? Thats easy enough.% Here is an objective function, to be used with a% minimizer. As far as the optimizer is concerned, obj is% a function only of coef.obj = @(coef) quad(@(x) coef(1).^2*x.^2+coef(2).^2*x.^4,-1,1,1e-13);% Call fminsearch, or fminuncfminsearch(obj,[2 3],optimset('disp','iter'))%%% orfminunc(obj,[2 3],optimset('disp','iter','largescale','off'))% Both will return a solution near [0 0].%% 24. Graphical tools for understanding sets of nonlinear equations%{Pick some simple set of nonlinear equations. Perhaps this pairwill do as well as any others.  x*y^3 = 2  y-x^2 = 1We will look for solutions in the first quadrant of the x-y plane,so x>=0 and y>=0.%}%%% Now consider each equation independently from the other. In the x-y% plane, these equations can be thought of as implicit functions, thus% ezplot can come to our rescue.close allezplot('x*y.^3 - 2',[0 5])hold onezplot('y - x.^2 - 1',[0 5])hold offgrid ontitle 'Graphical (ezplot) solution to a nonlinear system of equations'xlabel 'x'ylabel 'y'% The intersection of the two curves is our solution, zooming in% will find it quite nicely.%%% An alternative approach is to think of the problem in terms of% level sets. Given a function of several variables, z(x,y), a% level set of that function is the set of (x,y) pairs such that% z is constant at a given level.% In Matlab, there is a nice tool for viewing level sets: contour.% See how it allows us to solve our system of equations graphically.% Form a lattice over the region of interest in the x-y plane[x,y] = meshgrid(0:.1:5);% Build our functions on that lattice. Be careful to use .*, .^ and% ./ as appropriate.z1 = x.*y.^3;z2 = y - x.^2;% Display the level sets contour(x,y,z1,[2 2],'r')hold oncontour(x,y,z2,[1 1],'g')hold offgrid ontitle 'Graphical (contour) solution to a nonlinear system of equations'xlabel 'x'ylabel 'y'%% 25. Optimizing non-smooth or stochastic functions%{A property of the tools in the optimization toolbox is theyall assume that their objective function is a continuous,differentiable function of its parameters. (With the exceptionof bintprog.)If your function does not have this property, then lookelsewhere for optimization. One place might be the GeneticAlgorithms and Direct Search toolbox.http://www.mathworks.com/products/gads/description4.htmlSimulated annealing is another tool for these problems. Youwill find a few options on the file exchange.%}%% 26. Linear equality constraints%{This section is designed to help the reader understand how onesolves linear least squares problems subject to linear equalityconstraints. Its written for the reader who wants to understandhow to solve this problem in terms of liner algebra. (Some mayask where do these problems arise: i.e., linear least squares withmany variables and possibly many linear constraints. Spline modelsare a common example, functions of one or several dimensions.Under some circumstances there may also be linear inequalityconstraints.) We'll start off with a simple example. Assume a model of  y = a + b*x1 + c*x2with the simple equality constraint  c - b = 1How would we solve this problem on paper? (Yes, lsqlin solvesthese problems in its sleep. We'll do it the hard way instead.)Simplest is to eliminate one of the unknowns using the constraint.Replace c with 1+b in the model.   c = 1 + bSo  y = a + b*x1 + (1+b)*x2Rearrange terms to get this  (y - x2) = a + b*(x1+x2)Estimate the unknowns (a,b), then recover c once b is known.A point to note is I suggested we eliminate C in the model.As easily, we could have eliminated b, but we could not havech0osen to eliminate a, because a did not enter into ourconstraint at all. Another thing to think about is if theconstraint had some coefficients with values very near zero.Elinimation of those variables would then cause significantinstabilities in the computations. Its similar to the reasonwhy pivoting is useful in the solution of systems of equations.Lets try it on some data:%}%%n = 500;x1 = rand(n,1);x2 = rand(n,1);% yes, I know that these coefficients do not actually satisfy the% constraint above. They are close though.y = 1 + 2*x1 + pi*x2 + randn(n,1)/5;% solve the reduced problem aboveab = [ones(n,1), x1+x2]\(y - x2)% recover cc = 1 + ab(2)% Note that the coefficients were originally (1,2,pi) but% application of the constraint c - b = 1 %%% Had we never applied any constraint, the (1,2,pi) values will be% closer than for the constrained solution. This is easy to verify.[ones(n,1),x1,x2]\y%%% We may use lsqlin to verify our constrained solutionlsqlin([ones(n,1),x1,x2],y,[],[],[0 -1 1],1,[],[],[],optimset('largescale','off'))% As expected, the two constrained solutions agree.%%% The real question in this section is not how to solve a linearly% constrained problem, but how to solve it programmatically, and% how to solve it for possibly multiple constraints. % Start with a completely random least squares problem.n = 20; % number of data pointsp = 7;  % number of parameters to estimateA = rand(n,p);% Even generate random coefficients for our ground truth.coef0 = 1 + randn(p,1)y = A*coef0 + randn(n,1);% Finally, choose some totally random constraintsm = 3;  % The number of constraints in the modelC = randn(m,p);D = C*coef0;%%% Again, compute the simple (unconstrained) linear regression% estimates for this modelcoef1 = A\y% verify that the unconstrained model fails to satisfy the% constraints, but that the original (ground truth) coefficients% do satisfy them. D-C*coef should be zero.[D-C*coef0,D-C*coef1]%%% How does one solve the constrained problem? There are at least% two ways to do so (if we choose not to resort to lsqlin.) For% those devotees of pinv and the singular value distribution,% one such approach would involve a splitting of the solution to% A*x = y into two components: x = x_u + x_c. Here x_c must lie% in the row space of the matrix C, while x_u lies in its null% space. The only flaw with this approach is it will fail for% sparse constraint matrices, since it would rely on the singular% value decomposition.%% I'll discuss an approach that is based on the qr factorization% of our constraint matrix C. It is also nicely numerically stable,% and it offers the potential for use on large sparse constraint% matrices.[Q,R,E]= qr(C,0)% First, we will ignore the case where C is rank deficient (high% quality numerical code would not ignore that case, and the QR% allows us to identify and deal with that event. It is merely a% distraction in this discussion however.)%% We transform the constraint system C*x = D by left multiplying% by the inverse of Q, i.e., its transpose. Thus, with the pivoting% applied to x, the constraints become%%  R*x(E) = Q'*D%% In effect, we wanted to compute the Q-less QR factorization,% with pivoting.%% Why did we need pivoting? As I suggested above, numerical% instabilities may result otherwise. % % We will reduce the constraints further by splitting it into% two fragments. Assuming that C had fewer rows than columns,% then R can be broken into two pieces:% %  R = [R_c, R_u]R_c = R(:,1:m);R_u = R(:,(m+1):end);% Here R_c is an mxm, upper triangular matrix, with non-zero% diagonals. The non-zero diagonals are ensured by the use of% pivoting. In effect, column pivoting provides the means by % which we choose those variables to eliminate from the regression% model.%% The pivoting operation has effectively split x into two pieces% x_c and x_u. The variables x_c will correspond to the first m% pivots identified in the vector E.%% This split can be mirrored by breaking the matrices into pieces%%  R_c*x_c + R_u*X_u = Q'*D% % We will use this version of our constraint system to eliminate% the variables x_c from the least squares problem. Break A into% pieces also, mirroring the qr pivoting:A_c = A(:,E(1:m));A_u = A(:,E((m+1):end));%%% So the least squares problem, split in terms of the variable% as we have reordered them is:%%  A_c*x_c + A_u*x_u = y%% We can now eliminate the appropriate variables from the linear% least squares.%%  A_c*inv(R_c)*(Q'*D - R_u*x_u) + A_u*x_u = y%% Expand and combine terms. Remember, we will not use inv()% in the actual code, but instead use \. The \ operator, when% applied to an upper triangular matrix, is very efficient% compared to inv().%%  (A_u - A_c*R_c\R_u) * x_u = y - A-c*R_c\(Q'*D)x_u = (A_u - A_c*(R_c\R_u)) \ (y - A_c*(R_c\(Q'*D)))%%% Finally, we recover x_c from the constraint equationsx_c = R_c\(Q'*D - R_u*x_u)%%% And we put it all together in the unpivoted solution vector x:xfinal = zeros(p,1);xfinal(E(1:m)) = x_c;xfinal(E((m+1):end)) = x_u%%% Were we successful? How does this result compare to lsqlin?% The two are identical (as usual, only to within floating% point precision irregularities.)lsqlin(A,y,[],[],C,D,[],[],[],optimset('largescale','off'))%%% Verify that the equality constraints are satisfied to within% floating point tolerances.C*xfinal - D%% 27. Sums of squares surfaces and the geometry of a regression%{I'll admit in advance that this section has no tips or tricksto look for. But it does have some pretty pictures, and I hopeit leads into the next two sections, where I talk about confidencelimits and error estimates for parameters.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -