📄 optimization.java
字号:
/* * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. *//* * Optimization.java * Copyright (C) 2003 University of Waikato, Hamilton, New Zealand * */package weka.core;import weka.core.TechnicalInformation.Field;import weka.core.TechnicalInformation.Type;/** * Implementation of Active-sets method with BFGS update to solve optimization * problem with only bounds constraints in multi-dimensions. In this * implementation we consider both the lower and higher bound constraints. <p/> * * Here is the sketch of our searching strategy, and the detailed description * of the algorithm can be found in the Appendix of Xin Xu's MSc thesis:<p/> * Initialize everything, incl. initial value, direction, etc.<p/> * LOOP (main algorithm):<br/> * * 1. Perform the line search using the directions for free variables<br/> * 1.1 Check all the bounds that are not "active" (i.e. binding variables) * and compute the feasible step length to the bound for each of them<br/> * 1.2 Pick up the least feasible step length, say \alpha, and set it as * the upper bound of the current step length, i.e. * 0<\lambda<=\alpha<br/> * 1.3 Search for any possible step length<=\alpha that can result the * "sufficient function decrease" (\alpha condition) AND "positive * definite inverse Hessian" (\beta condition), if possible, using * SAFEGUARDED polynomial interpolation. This step length is "safe" and * thus is used to compute the next value of the free variables .<br/> * 1.4 Fix the variable(s) that are newly bound to its constraint(s).<p/> * * 2. Check whether there is convergence of all variables or their gradients. * If there is, check the possibilities to release any current bindings of * the fixed variables to their bounds based on the "reliable" second-order * Lagarange multipliers if available. If it's available and negative for * one variable, then release it. If not available, use first-order * Lagarange multiplier to test release. If there is any released * variables, STOP the loop. Otherwise update the inverse of Hessian matrix * and gradient for the newly released variables and CONTINUE LOOP.<p/> * * 3. Use BFGS formula to update the inverse of Hessian matrix. Note the * already-fixed variables must have zeros in the corresponding entries * in the inverse Hessian.<p/> * * 4. Compute the new (newton) search direction d=H^{-1}*g, where H^{-1} is the * inverse Hessian and g is the Jacobian. Note that again, the already- * fixed variables will have zero direction.<p/> * * ENDLOOP<p/> * * A typical usage of this class is to create your own subclass of this class * and provide the objective function and gradients as follows:<p/> * <pre> * class MyOpt extends Optimization { * // Provide the objective function * protected double objectiveFunction(double[] x) { * // How to calculate your objective function... * // ... * } * * // Provide the first derivatives * protected double[] evaluateGradient(double[] x) { * // How to calculate the gradient of the objective function... * // ... * } * * // If possible, provide the index^{th} row of the Hessian matrix * protected double[] evaluateHessian(double[] x, int index) { * // How to calculate the index^th variable's second derivative * // ... * } * } * </pre> * * When it's the time to use it, in some routine(s) of other class... * <pre> * MyOpt opt = new MyOpt(); * * // Set up initial variable values and bound constraints * double[] x = new double[numVariables]; * // Lower and upper bounds: 1st row is lower bounds, 2nd is upper * double[] constraints = new double[2][numVariables]; * ... * * // Find the minimum, 200 iterations as default * x = opt.findArgmin(x, constraints); * while(x == null){ // 200 iterations are not enough * x = opt.getVarbValues(); // Try another 200 iterations * x = opt.findArgmin(x, constraints); * } * * // The minimal function value * double minFunction = opt.getMinFunction(); * ... * </pre> * * It is recommended that Hessian values be provided so that the second-order * Lagrangian multiplier estimate can be calcluated. However, if it is not * provided, there is no need to override the <code>evaluateHessian()</code> * function.<p/> * * REFERENCES (see also the <code>getTechnicalInformation()</code> method):<br/> * The whole model algorithm is adapted from Chapter 5 and other related * chapters in Gill, Murray and Wright(1981) "Practical Optimization", Academic * Press. and Gill and Murray(1976) "Minimization Subject to Bounds on the * Variables", NPL Report NAC72, while Chong and Zak(1996) "An Introduction to * Optimization", John Wiley & Sons, Inc. provides us a brief but helpful * introduction to the method. <p/> * * Dennis and Schnabel(1983) "Numerical Methods for Unconstrained Optimization * and Nonlinear Equations", Prentice-Hall Inc. and Press et al.(1992) "Numeric * Recipe in C", Second Edition, Cambridge University Press. are consulted for * the polynomial interpolation used in the line search implementation. <p/> * * The Hessian modification in BFGS update uses Cholesky factorization and two * rank-one modifications:<br/> * Bk+1 = Bk + (Gk*Gk')/(Gk'Dk) + (dGk*(dGk)'))/[alpha*(dGk)'*Dk]. <br/> * where Gk is the gradient vector, Dk is the direction vector and alpha is the * step rate. <br/> * This method is due to Gill, Golub, Murray and Saunders(1974) ``Methods for * Modifying Matrix Factorizations'', Mathematics of Computation, Vol.28, * No.126, pp 505-535. <p/> * * @author Xin Xu (xx5@cs.waikato.ac.nz) * @version $Revision: 1.8 $ * @see #getTechnicalInformation() */public abstract class Optimization implements TechnicalInformationHandler { protected double m_ALF = 1.0e-4; protected double m_BETA = 0.9; protected double m_TOLX = 1.0e-6; protected double m_STPMX = 100.0; protected int m_MAXITS = 200; protected static boolean m_Debug = false; /** function value */ protected double m_f; /** G'*p */ private double m_Slope; /** Test if zero step in lnsrch */ private boolean m_IsZeroStep = false; /** Used when iteration overflow occurs */ private double[] m_X; /** Compute machine precision */ protected static double m_Epsilon, m_Zero; static { m_Epsilon=1.0; while(1.0+m_Epsilon > 1.0){ m_Epsilon /= 2.0; } m_Epsilon *= 2.0; m_Zero = Math.sqrt(m_Epsilon); if (m_Debug) System.err.print("Machine precision is "+m_Epsilon+ " and zero set to "+m_Zero); } /** * Returns an instance of a TechnicalInformation object, containing * detailed information about the technical background of this class, * e.g., paper reference or book this class is based on. * * @return the technical information about this class */ public TechnicalInformation getTechnicalInformation() { TechnicalInformation result; TechnicalInformation additional; result = new TechnicalInformation(Type.MASTERSTHESIS); result.setValue(Field.AUTHOR, "Xin Xu"); result.setValue(Field.YEAR, "2003"); result.setValue(Field.TITLE, "Statistical learning in multiple instance problem"); result.setValue(Field.SCHOOL, "University of Waikato"); result.setValue(Field.ADDRESS, "Hamilton, NZ"); result.setValue(Field.NOTE, "0657.594"); additional = result.add(Type.BOOK); additional.setValue(Field.AUTHOR, "P. E. Gill and W. Murray and M. H. Wright"); additional.setValue(Field.YEAR, "1981"); additional.setValue(Field.TITLE, "Practical Optimization"); additional.setValue(Field.PUBLISHER, "Academic Press"); additional.setValue(Field.ADDRESS, "London and New York"); additional = result.add(Type.TECHREPORT); additional.setValue(Field.AUTHOR, "P. E. Gill and W. Murray"); additional.setValue(Field.YEAR, "1976"); additional.setValue(Field.TITLE, "Minimization subject to bounds on the variables"); additional.setValue(Field.INSTITUTION, "National Physical Laboratory"); additional.setValue(Field.NUMBER, "NAC 72"); additional = result.add(Type.BOOK); additional.setValue(Field.AUTHOR, "E. K. P. Chong and S. H. Zak"); additional.setValue(Field.YEAR, "1996"); additional.setValue(Field.TITLE, "An Introduction to Optimization"); additional.setValue(Field.PUBLISHER, "John Wiley and Sons"); additional.setValue(Field.ADDRESS, "New York"); additional = result.add(Type.BOOK); additional.setValue(Field.AUTHOR, "J. E. Dennis and R. B. Schnabel"); additional.setValue(Field.YEAR, "1983"); additional.setValue(Field.TITLE, "Numerical Methods for Unconstrained Optimization and Nonlinear Equations"); additional.setValue(Field.PUBLISHER, "Prentice-Hall"); additional = result.add(Type.BOOK); additional.setValue(Field.AUTHOR, "W. H. Press and B. P. Flannery and S. A. Teukolsky and W. T. Vetterling"); additional.setValue(Field.YEAR, "1992"); additional.setValue(Field.TITLE, "Numerical Recipes in C"); additional.setValue(Field.PUBLISHER, "Cambridge University Press"); additional.setValue(Field.EDITION, "Second"); additional = result.add(Type.ARTICLE); additional.setValue(Field.AUTHOR, "P. E. Gill and G. H. Golub and W. Murray and M. A. Saunders"); additional.setValue(Field.YEAR, "1974"); additional.setValue(Field.TITLE, "Methods for modifying matrix factorizations"); additional.setValue(Field.JOURNAL, "Mathematics of Computation"); additional.setValue(Field.VOLUME, "28"); additional.setValue(Field.NUMBER, "126"); additional.setValue(Field.PAGES, "505-535"); return result; } /** * Subclass should implement this procedure to evaluate objective * function to be minimized * * @param x the variable values * @return the objective function value * @throws Exception if something goes wrong */ protected abstract double objectiveFunction(double[] x) throws Exception; /** * Subclass should implement this procedure to evaluate gradient * of the objective function * * @param x the variable values * @return the gradient vector * @throws Exception if something goes wrong */ protected abstract double[] evaluateGradient(double[] x) throws Exception; /** * Subclass is recommended to override this procedure to evaluate second-order * gradient of the objective function. If it's not provided, it returns * null. * * @param x the variables * @param index the row index in the Hessian matrix * @return one row (the row #index) of the Hessian matrix, null as default * @throws Exception if something goes wrong */ protected double[] evaluateHessian(double[] x, int index) throws Exception{ return null; } /** * Get the minimal function value * * @return minimal function value found */ public double getMinFunction() { return m_f; } /** * Set the maximal number of iterations in searching (Default 200) * * @param it the maximal number of iterations */ public void setMaxIteration(int it) { m_MAXITS=it; } /** * Set whether in debug mode * * @param db use debug or not */ public void setDebug(boolean db) { m_Debug = db; } /** * Get the variable values. Only needed when iterations exceeds * the max threshold. * * @return the current variable values */ public double[] getVarbValues() { return m_X; } /** * Find a new point x in the direction p from a point xold at which the * value of the function has decreased sufficiently, the positive * definiteness of B matrix (approximation of the inverse of the Hessian) * is preserved and no bound constraints are violated. Details see "Numerical * Methods for Unconstrained Optimization and Nonlinear Equations". * "Numeric Recipes in C" was also consulted. * * @param xold old x value * @param gradient gradient at that point * @param direct direction vector * @param stpmax maximum step length * @param isFixed indicating whether a variable has been fixed * @param nwsBounds non-working set bounds. Means these variables are free and * subject to the bound constraints in this step * @param wsBdsIndx index of variables that has working-set bounds. Means * these variables are already fixed and no longer subject to * the constraints * @return new value along direction p from xold, null if no step was taken * @throws Exception if an error occurs */ public double[] lnsrch(double[] xold, double[] gradient, double[] direct, double stpmax, boolean[] isFixed, double[][] nwsBounds, DynamicIntArray wsBdsIndx) throws Exception { int i, k,len=xold.length, fixedOne=-1; // idx of variable to be fixed double alam, alamin; // lambda to be found, and its lower bound // For convergence and bound test double temp,test,alpha=Double.POSITIVE_INFINITY,fold=m_f,sum; // For cubic interpolation double a,alam2=0,b,disc=0,maxalam=1.0,rhs1,rhs2,tmplam; double[] x = new double[len]; // New variable values // Scale the step for (sum=0.0,i=0;i<len;i++){ if(!isFixed[i]) // For fixed variables, direction = 0 sum += direct[i]*direct[i]; } sum = Math.sqrt(sum); if (m_Debug) System.err.println("fold: "+Utils.doubleToString(fold,10,7)+"\n"+ "sum: "+Utils.doubleToString(sum,10,7)+"\n"+ "stpmax: "+Utils.doubleToString(stpmax,10,7)); if (sum > stpmax){ for (i=0;i<len;i++) if(!isFixed[i]) direct[i] *= stpmax/sum; } else maxalam = stpmax/sum; // Compute initial rate of decrease, g'*d m_Slope=0.0; for (i=0;i<len;i++){ x[i] = xold[i]; if(!isFixed[i]) m_Slope += gradient[i]*direct[i]; } if (m_Debug) System.err.print("slope: " + Utils.doubleToString(m_Slope,10,7)+ "\n"); // Slope too small if(Math.abs(m_Slope)<=m_Zero){ if (m_Debug) System.err.println("Gradient and direction orthogonal -- "+ "Min. found with current fixed variables"+ " (or all variables fixed). Try to release"+ " some variables now."); return x; } // Err: slope > 0 if(m_Slope > m_Zero){ if(m_Debug) for(int h=0; h<x.length; h++) System.err.println(h+": isFixed="+isFixed[h]+", x="+ x[h]+", grad="+gradient[h]+", direct="+ direct[h]); throw new Exception("g'*p positive! -- Try to debug from here: line 327."); } // Compute LAMBDAmin and upper bound of lambda--alpha test=0.0; for(i=0;i<len;i++){ if(!isFixed[i]){// No need for fixed variables temp=Math.abs(direct[i])/Math.max(Math.abs(x[i]),1.0); if (temp > test) test=temp; } } if(test>m_Zero) // Not converge alamin = m_TOLX/test; else{ if (m_Debug) System.err.println("Zero directions for all free variables -- "+ "Min. found with current fixed variables"+ " (or all variables fixed). Try to release"+ " some variables now."); return x; } // Check whether any non-working-set bounds are "binding" for(i=0;i<len;i++){ if(!isFixed[i]){// No need for fixed variables double alpi; if((direct[i]<-m_Epsilon) && !Double.isNaN(nwsBounds[0][i])){//Not feasible alpi = (nwsBounds[0][i]-xold[i])/direct[i]; if(alpi <= m_Zero){ // Zero if (m_Debug) System.err.println("Fix variable "+i+ " to lower bound "+ nwsBounds[0][i]+ " from value "+ xold[i]); x[i] = nwsBounds[0][i]; isFixed[i]=true; // Fix this variable alpha = 0.0; nwsBounds[0][i]=Double.NaN; //Add cons. to working set wsBdsIndx.addElement(i); } else if(alpha > alpi){ // Fix one variable in one iteration alpha = alpi; fixedOne = i; } } else if((direct[i]>m_Epsilon) && !Double.isNaN(nwsBounds[1][i])){//Not feasible alpi = (nwsBounds[1][i]-xold[i])/direct[i]; if(alpi <= m_Zero){ // Zero if (m_Debug) System.err.println("Fix variable "+i+ " to upper bound "+ nwsBounds[1][i]+ " from value "+ xold[i]); x[i] = nwsBounds[1][i];
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -