📄 optimization.java
字号:
/* * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. *//* * Optimization.java * Copyright (C) 2003 Xin Xu * */package weka.core;import java.util.*;import java.io.*;/** * Implementation of Active-sets method with BFGS update * to solve optimization problem with only bounds constraints in * multi-dimensions. In this implementation we consider both the lower and higher * bound constraints. <p> * * Here is the sketch of our searching strategy, and the detailed description * of the algorithm can be found in the Appendix of Xin Xu's MSc thesis:<p> * Initialize everything, incl. initial value, direction, etc.<p> * LOOP (main algorithm):<br> * * 1. Perform the line search using the directions for free variables<br> * 1.1 Check all the bounds that are not "active" (i.e. binding variables) * and compute the feasible step length to the bound for each of them<br> * 1.2 Pick up the least feasible step length, say \alpha, and set it as * the upper bound of the current step length, i.e. 0<\lambda<=\alpha<br> * 1.3 Search for any possible step length<=\alpha that can result the * "sufficient function decrease" (\alpha condition) AND "positive definite * inverse Hessian" (\beta condition), if possible, using SAFEGUARDED polynomial * interpolation. This step length is "safe" and thus * is used to compute the next value of the free variables .<br> * 1.4 Fix the variable(s) that are newly bound to its constraint(s).<p> * * 2. Check whether there is convergence of all variables or their gradients. * If there is, check the possibilities to release any current bindings of * the fixed variables to their bounds based on the "reliable" second-order * Lagarange multipliers if available. If it's available and negative for one * variable, then release it. If not available, use first-order Lagarange * multiplier to test release. If there is any released variables, STOP the loop. * Otherwise update the inverse of Hessian matrix and gradient for the newly * released variables and CONTINUE LOOP.<p> * * 3. Use BFGS formula to update the inverse of Hessian matrix. Note the * already-fixed variables must have zeros in the corresponding entries * in the inverse Hessian.<p> * * 4. Compute the new (newton) search direction d=H^{-1}*g, where H^{-1} is the * inverse Hessian and g is the Jacobian. Note that again, the already- * fixed variables will have zero direction.<p> * * ENDLOOP<p> * * A typical usage of this class is to create your own subclass of this class * and provide the objective function and gradients as follows: * <code> * class MyOpt extends Optimization{ * // Provide the objective function * protected double objectiveFunction(double[] x){ * // How to calculate your objective function... * // ... * } * * // Provide the first derivatives * protected double[] evaluateGradient(double[] x){ * // How to calculate the gradient of the objective function... * // ... * } * * // If possible, provide the index^{th} row of the Hessian matrix * protected double[] evaluateHessian(double[] x, int index){ * // How to calculate the index^th variable's second derivative * // ... * } * } * * // When it's the time to use it, in some routine(s) of other class... * MyOpt opt = new MyOpt(); * * // Set up initial variable values and bound constraints * double[] x = new double[numVariables]; * // Lower and upper bounds: 1st row is lower bounds, 2nd is upper * double[] constraints = new double[2][numVariables]; * ... * * // Find the minimum, 200 iterations as default * x = opt.findArgmin(x, constraints); * while(x == null){ // 200 iterations are not enough * x = opt.getVarbValues(); // Try another 200 iterations * x = opt.findArgmin(x, constraints); * } * * // The minimal function value * double minFunction = opt.getMinFunction(); * ... * </code> * It is recommended that Hessian values be provided so that the second-order * Lagrangian multiplier estimate can be calcluated. However, if it is not provided, * there is no need to override the <code>evaluateHessian()</code> function.<p> * * REFERENCES:<br> * The whole model algorithm is adapted from Chapter 5 and other related chapters in * Gill, Murray and Wright(1981) "Practical Optimization", Academic Press. * and Gill and Murray(1976) "Minimization Subject to Bounds on the Variables", NPL * Report NAC72, while Chong and Zak(1996) "An Introduction to Optimization", * John Wiley & Sons, Inc. provides us a brief but helpful introduction to the method. <p> * * Dennis and Schnabel(1983) "Numerical Methods for Unconstrained Optimization and * Nonlinear Equations", Prentice-Hall Inc. and Press et al.(1992) "Numeric Recipe in C", * Second Edition, Cambridge University Press. are consulted for the polynomial * interpolation used in the line search implementation. <p> * * The Hessian modification in BFGS update uses Cholesky factorization and two rank-one * modifications:<br> * Bk+1 = Bk + (Gk*Gk')/(Gk'Dk) + (dGk*(dGk)'))/[alpha*(dGk)'*Dk]. <br> * where Gk is the gradient vector, Dk is the direction vector and alpha is the * step rate. <br> * This method is due to Gill, Golub, Murray and Saunders(1974) ``Methods for Modifying * Matrix Factorizations'', Mathematics of Computation, Vol.28, No.126, pp 505-535. * <p> * * @author Xin Xu (xx5@cs.waikato.ac.nz) * @version $Revision: 1.1.1.1 $ */public abstract class Optimization{ protected double m_ALF = 1.0e-4; protected double m_BETA = 0.9; protected double m_TOLX = 1.0e-6; protected double m_STPMX = 100.0; protected int m_MAXITS = 200; protected static boolean m_Debug = false; // function value protected double m_f; // G'*p private double m_Slope; // Test if zero step in lnsrch private boolean m_IsZeroStep = false; // Used when iteration overflow occurs private double[] m_X; // Compute machine precision protected static double m_Epsilon, m_Zero; static { m_Epsilon=1.0; while(1.0+m_Epsilon > 1.0){ m_Epsilon /= 2.0; } m_Epsilon *= 2.0; m_Zero = Math.sqrt(m_Epsilon); if (m_Debug) System.err.print("Machine precision is "+m_Epsilon+ " and zero set to "+m_Zero); } /* * Subclass should implement this procedure to evaluate objective * function to be minimized * * @param x the variable values * @return the objective function value */ protected abstract double objectiveFunction(double[] x); /* * Subclass should implement this procedure to evaluate gradient * of the objective function * * @param x the variable values * @return the gradient vector */ protected abstract double[] evaluateGradient(double[] x); /* * Subclass is recommended to override this procedure to evaluate second-order * gradient of the objective function. If it's not provided, it returns * null. * * @param x the variables * @param index the row index in the Hessian matrix * @return one row (the row #index) of the Hessian matrix, null as default */ protected double[] evaluateHessian(double[] x, int index){ return null; } /** * Get the minimal function value * * @return minimal function value found */ public double getMinFunction(){ return m_f;} /** * Set the maximal number of iterations in searching (Default 200) * * @param it the maximal number of iterations */ public void setMaxIteration(int it){ m_MAXITS=it; } /** * Set whether in debug mode * * @param db use debug or not */ public void setDebug(boolean db){ m_Debug = db; } /** * Get the variable values. Only needed when iterations exceeds * the max threshold. * * @return the current variable values */ public double[] getVarbValues(){ return m_X; } /** * Find a new point x in the direction p from a point xold at which the * value of the function has decreased sufficiently, the positive * definiteness of B matrix (approximation of the inverse of the Hessian) * is preserved and no bound constraints are violated. Details see "Numerical * Methods for Unconstrained Optimization and Nonlinear Equations". * "Numeric Recipes in C" was also consulted. * * @param xold old x value * @param gradient gradient at that point * @param direct direction vector * @param stpmax maximum step length * @param isFixed indicating whether a variable has been fixed * @param nwsBounds non-working set bounds. Means these variables are free and * subject to the bound constraints in this step * @param wsBdsIndx index of variables that has working-set bounds. Means * these variables are already fixed and no longer subject to * the constraints * @return new value along direction p from xold, null if no step was taken * @exception Exception if an error occurs */ public double[] lnsrch(double[] xold, double[] gradient, double[] direct, double stpmax, boolean[] isFixed, double[][] nwsBounds, FastVector wsBdsIndx) throws Exception { int i, j, k,len=xold.length, fixedOne=-1; // idx of variable to be fixed double alam, alamin; // lambda to be found, and its lower bound // For convergence and bound test double temp,test,alpha=Double.POSITIVE_INFINITY,fold=m_f,sum; // For cubic interpolation double a,alam2=0,b,disc=0,maxalam=1.0,rhs1,rhs2,tmplam; double[] x = new double[len]; // New variable values // Scale the step for (sum=0.0,i=0;i<len;i++){ if(!isFixed[i]) // For fixed variables, direction = 0 sum += direct[i]*direct[i]; } sum = Math.sqrt(sum); if (m_Debug) System.err.println("fold: "+Utils.doubleToString(fold,10,7)+"\n"+ "sum: "+Utils.doubleToString(sum,10,7)+"\n"+ "stpmax: "+Utils.doubleToString(stpmax,10,7)); if (sum > stpmax){ for (i=0;i<len;i++) if(!isFixed[i]) direct[i] *= stpmax/sum; } else maxalam = stpmax/sum; // Compute initial rate of decrease, g'*d m_Slope=0.0; for (i=0;i<len;i++){ x[i] = xold[i]; if(!isFixed[i]) m_Slope += gradient[i]*direct[i]; } if (m_Debug) System.err.print("slope: " + Utils.doubleToString(m_Slope,10,7)+ "\n"); // Slope too small if(Math.abs(m_Slope)<=m_Zero){ if (m_Debug) System.err.println("Gradient and direction orthogonal -- "+ "Min. found with current fixed variables"+ " (or all variables fixed). Try to release"+ " some variables now."); return x; } // Err: slope > 0 if(m_Slope > m_Zero){ if(m_Debug) for(int h=0; h<x.length; h++) System.err.println(h+": isFixed="+isFixed[h]+", x="+ x[h]+", grad="+gradient[h]+", direct="+ direct[h]); throw new Exception("g'*p positive! -- Try to debug from here: line 263."); } // Compute LAMBDAmin and upper bound of lambda--alpha test=0.0; for(i=0;i<len;i++){ if(!isFixed[i]){// No need for fixed variables temp=Math.abs(direct[i])/Math.max(Math.abs(x[i]),1.0); if (temp > test) test=temp; } } if(test>m_Zero) // Not converge alamin = m_TOLX/test; else{ if (m_Debug) System.err.println("Zero directions for all free variables -- "+ "Min. found with current fixed variables"+ " (or all variables fixed). Try to release"+ " some variables now."); return x; } // Check whether any non-working-set bounds are "binding" for(i=0;i<len;i++){ if(!isFixed[i]){// No need for fixed variables double alpi; if((direct[i]<-m_Epsilon) && !Double.isNaN(nwsBounds[0][i])){//Not feasible alpi = (nwsBounds[0][i]-xold[i])/direct[i]; if(alpi <= m_Zero){ // Zero if (m_Debug) System.err.println("Fix variable "+i+ " to lower bound "+ nwsBounds[0][i]+ " from value "+ xold[i]); x[i] = nwsBounds[0][i]; isFixed[i]=true; // Fix this variable alpha = 0.0; nwsBounds[0][i]=Double.NaN; //Add cons. to working set wsBdsIndx.addElement(new Integer(i)); } else if(alpha > alpi){ // Fix one variable in one iteration alpha = alpi; fixedOne = i; } } else if((direct[i]>m_Epsilon) && !Double.isNaN(nwsBounds[1][i])){//Not feasible alpi = (nwsBounds[1][i]-xold[i])/direct[i]; if(alpi <= m_Zero){ // Zero if (m_Debug) System.err.println("Fix variable "+i+ " to upper bound "+ nwsBounds[1][i]+ " from value "+ xold[i]); x[i] = nwsBounds[1][i]; isFixed[i]=true; // Fix this variable alpha = 0.0; nwsBounds[1][i]=Double.NaN; //Add cons. to working set wsBdsIndx.addElement(new Integer(i)); } else if(alpha > alpi){ alpha = alpi; fixedOne = i; } } } } if (m_Debug){ System.err.println("alamin: " + Utils.doubleToString(alamin,10,7));
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -