📄 optimization.java
字号:
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
/*
* Optimization.java
* Copyright (C) 2003 Xin Xu
*
*/
package weka.core;
/**
* Implementation of Active-sets method with BFGS update
* to solve optimization problem with only bounds constraints in
* multi-dimensions. In this implementation we consider both the lower and higher
* bound constraints. <p>
*
* Here is the sketch of our searching strategy, and the detailed description
* of the algorithm can be found in the Appendix of Xin Xu's MSc thesis:<p>
* Initialize everything, incl. initial value, direction, etc.<p>
* LOOP (main algorithm):<br>
*
* 1. Perform the line search using the directions for free variables<br>
* 1.1 Check all the bounds that are not "active" (i.e. binding variables)
* and compute the feasible step length to the bound for each of them<br>
* 1.2 Pick up the least feasible step length, say \alpha, and set it as
* the upper bound of the current step length, i.e. 0<\lambda<=\alpha<br>
* 1.3 Search for any possible step length<=\alpha that can result the
* "sufficient function decrease" (\alpha condition) AND "positive definite
* inverse Hessian" (\beta condition), if possible, using SAFEGUARDED polynomial
* interpolation. This step length is "safe" and thus
* is used to compute the next value of the free variables .<br>
* 1.4 Fix the variable(s) that are newly bound to its constraint(s).<p>
*
* 2. Check whether there is convergence of all variables or their gradients.
* If there is, check the possibilities to release any current bindings of
* the fixed variables to their bounds based on the "reliable" second-order
* Lagarange multipliers if available. If it's available and negative for one
* variable, then release it. If not available, use first-order Lagarange
* multiplier to test release. If there is any released variables, STOP the loop.
* Otherwise update the inverse of Hessian matrix and gradient for the newly
* released variables and CONTINUE LOOP.<p>
*
* 3. Use BFGS formula to update the inverse of Hessian matrix. Note the
* already-fixed variables must have zeros in the corresponding entries
* in the inverse Hessian.<p>
*
* 4. Compute the new (newton) search direction d=H^{-1}*g, where H^{-1} is the
* inverse Hessian and g is the Jacobian. Note that again, the already-
* fixed variables will have zero direction.<p>
*
* ENDLOOP<p>
*
* A typical usage of this class is to create your own subclass of this class
* and provide the objective function and gradients as follows:<p>
* <code>
* class MyOpt extends Optimization{ <br>
* // Provide the objective function <br>
* protected double objectiveFunction(double[] x){<br>
* // How to calculate your objective function...<br>
* // ...<br>
* }<p>
*
* // Provide the first derivatives<br>
* protected double[] evaluateGradient(double[] x){<br>
* // How to calculate the gradient of the objective function...<br>
* // ...<br>
* } <p>
*
* // If possible, provide the index^{th} row of the Hessian matrix<br>
* protected double[] evaluateHessian(double[] x, int index){<br>
* // How to calculate the index^th variable's second derivative<br>
* // ... <br>
* }<br>
* } <p>
*
* // When it's the time to use it, in some routine(s) of other class...<br>
* MyOpt opt = new MyOpt();<p>
*
* // Set up initial variable values and bound constraints<br>
* double[] x = new double[numVariables];<br>
* // Lower and upper bounds: 1st row is lower bounds, 2nd is upper<br>
* double[] constraints = new double[2][numVariables];<br>
* ...<p>
*
* // Find the minimum, 200 iterations as default<br>
* x = opt.findArgmin(x, constraints); <br>
* while(x == null){ // 200 iterations are not enough<br>
* x = opt.getVarbValues(); // Try another 200 iterations<br>
* x = opt.findArgmin(x, constraints);<br>
* }<p>
*
* // The minimal function value<br>
* double minFunction = opt.getMinFunction();<br>
* ...<p>
* </code>
* It is recommended that Hessian values be provided so that the second-order
* Lagrangian multiplier estimate can be calcluated. However, if it is not provided,
* there is no need to override the <code>evaluateHessian()</code> function.<p>
*
* REFERENCES:<br>
* The whole model algorithm is adapted from Chapter 5 and other related chapters in
* Gill, Murray and Wright(1981) "Practical Optimization", Academic Press.
* and Gill and Murray(1976) "Minimization Subject to Bounds on the Variables", NPL
* Report NAC72, while Chong and Zak(1996) "An Introduction to Optimization",
* John Wiley & Sons, Inc. provides us a brief but helpful introduction to the method. <p>
*
* Dennis and Schnabel(1983) "Numerical Methods for Unconstrained Optimization and
* Nonlinear Equations", Prentice-Hall Inc. and Press et al.(1992) "Numeric Recipe in C",
* Second Edition, Cambridge University Press. are consulted for the polynomial
* interpolation used in the line search implementation. <p>
*
* The Hessian modification in BFGS update uses Cholesky factorization and two rank-one
* modifications:<br>
* Bk+1 = Bk + (Gk*Gk')/(Gk'Dk) + (dGk*(dGk)'))/[alpha*(dGk)'*Dk]. <br>
* where Gk is the gradient vector, Dk is the direction vector and alpha is the
* step rate. <br>
* This method is due to Gill, Golub, Murray and Saunders(1974) ``Methods for Modifying
* Matrix Factorizations'', Mathematics of Computation, Vol.28, No.126, pp 505-535.
* <p>
*
* @author Xin Xu (xx5@cs.waikato.ac.nz)
* @version $Revision$
*/
public abstract class Optimization{
protected double m_ALF = 1.0e-4;
protected double m_BETA = 0.9;
protected double m_TOLX = 1.0e-6;
protected double m_STPMX = 100.0;
protected int m_MAXITS = 200;
protected static boolean m_Debug = false;
// function value
protected double m_f;
// G'*p
private double m_Slope;
// Test if zero step in lnsrch
private boolean m_IsZeroStep = false;
// Used when iteration overflow occurs
private double[] m_X;
// Compute machine precision
protected static double m_Epsilon, m_Zero;
static {
m_Epsilon=1.0;
while(1.0+m_Epsilon > 1.0){
m_Epsilon /= 2.0;
}
m_Epsilon *= 2.0;
m_Zero = Math.sqrt(m_Epsilon);
if (m_Debug)
System.err.print("Machine precision is "+m_Epsilon+
" and zero set to "+m_Zero);
}
/*
* Subclass should implement this procedure to evaluate objective
* function to be minimized
*
* @param x the variable values
* @return the objective function value
*/
protected abstract double objectiveFunction(double[] x) throws Exception;
/*
* Subclass should implement this procedure to evaluate gradient
* of the objective function
*
* @param x the variable values
* @return the gradient vector
*/
protected abstract double[] evaluateGradient(double[] x) throws Exception;
/*
* Subclass is recommended to override this procedure to evaluate second-order
* gradient of the objective function. If it's not provided, it returns
* null.
*
* @param x the variables
* @param index the row index in the Hessian matrix
* @return one row (the row #index) of the Hessian matrix, null as default
*/
protected double[] evaluateHessian(double[] x, int index) throws Exception{
return null;
}
/**
* Get the minimal function value
*
* @return minimal function value found
*/
public double getMinFunction(){ return m_f;}
/**
* Set the maximal number of iterations in searching (Default 200)
*
* @param it the maximal number of iterations
*/
public void setMaxIteration(int it){ m_MAXITS=it; }
/**
* Set whether in debug mode
*
* @param db use debug or not
*/
public void setDebug(boolean db){ m_Debug = db; }
/**
* Get the variable values. Only needed when iterations exceeds
* the max threshold.
*
* @return the current variable values
*/
public double[] getVarbValues(){ return m_X; }
/**
* Find a new point x in the direction p from a point xold at which the
* value of the function has decreased sufficiently, the positive
* definiteness of B matrix (approximation of the inverse of the Hessian)
* is preserved and no bound constraints are violated. Details see "Numerical
* Methods for Unconstrained Optimization and Nonlinear Equations".
* "Numeric Recipes in C" was also consulted.
*
* @param xold old x value
* @param gradient gradient at that point
* @param direct direction vector
* @param stpmax maximum step length
* @param isFixed indicating whether a variable has been fixed
* @param nwsBounds non-working set bounds. Means these variables are free and
* subject to the bound constraints in this step
* @param wsBdsIndx index of variables that has working-set bounds. Means
* these variables are already fixed and no longer subject to
* the constraints
* @return new value along direction p from xold, null if no step was taken
* @exception Exception if an error occurs
*/
public double[] lnsrch(double[] xold, double[] gradient,
double[] direct, double stpmax,
boolean[] isFixed, double[][] nwsBounds,
FastVector wsBdsIndx)
throws Exception {
int i, j, k,len=xold.length,
fixedOne=-1; // idx of variable to be fixed
double alam, alamin; // lambda to be found, and its lower bound
// For convergence and bound test
double temp,test,alpha=Double.POSITIVE_INFINITY,fold=m_f,sum;
// For cubic interpolation
double a,alam2=0,b,disc=0,maxalam=1.0,rhs1,rhs2,tmplam;
double[] x = new double[len]; // New variable values
// Scale the step
for (sum=0.0,i=0;i<len;i++){
if(!isFixed[i]) // For fixed variables, direction = 0
sum += direct[i]*direct[i];
}
sum = Math.sqrt(sum);
if (m_Debug)
System.err.println("fold: "+Utils.doubleToString(fold,10,7)+"\n"+
"sum: "+Utils.doubleToString(sum,10,7)+"\n"+
"stpmax: "+Utils.doubleToString(stpmax,10,7));
if (sum > stpmax){
for (i=0;i<len;i++)
if(!isFixed[i])
direct[i] *= stpmax/sum;
}
else
maxalam = stpmax/sum;
// Compute initial rate of decrease, g'*d
m_Slope=0.0;
for (i=0;i<len;i++){
x[i] = xold[i];
if(!isFixed[i])
m_Slope += gradient[i]*direct[i];
}
if (m_Debug)
System.err.print("slope: " + Utils.doubleToString(m_Slope,10,7)+ "\n");
// Slope too small
if(Math.abs(m_Slope)<=m_Zero){
if (m_Debug)
System.err.println("Gradient and direction orthogonal -- "+
"Min. found with current fixed variables"+
" (or all variables fixed). Try to release"+
" some variables now.");
return x;
}
// Err: slope > 0
if(m_Slope > m_Zero){
if(m_Debug)
for(int h=0; h<x.length; h++)
System.err.println(h+": isFixed="+isFixed[h]+", x="+
x[h]+", grad="+gradient[h]+", direct="+
direct[h]);
throw new Exception("g'*p positive! -- Try to debug from here: line 327.");
}
// Compute LAMBDAmin and upper bound of lambda--alpha
test=0.0;
for(i=0;i<len;i++){
if(!isFixed[i]){// No need for fixed variables
temp=Math.abs(direct[i])/Math.max(Math.abs(x[i]),1.0);
if (temp > test) test=temp;
}
}
if(test>m_Zero) // Not converge
alamin = m_TOLX/test;
else{
if (m_Debug)
System.err.println("Zero directions for all free variables -- "+
"Min. found with current fixed variables"+
" (or all variables fixed). Try to release"+
" some variables now.");
return x;
}
// Check whether any non-working-set bounds are "binding"
for(i=0;i<len;i++){
if(!isFixed[i]){// No need for fixed variables
double alpi;
if((direct[i]<-m_Epsilon) && !Double.isNaN(nwsBounds[0][i])){//Not feasible
alpi = (nwsBounds[0][i]-xold[i])/direct[i];
if(alpi <= m_Zero){ // Zero
if (m_Debug)
System.err.println("Fix variable "+i+
" to lower bound "+ nwsBounds[0][i]+
" from value "+ xold[i]);
x[i] = nwsBounds[0][i];
isFixed[i]=true; // Fix this variable
alpha = 0.0;
nwsBounds[0][i]=Double.NaN; //Add cons. to working set
wsBdsIndx.addElement(new Integer(i));
}
else if(alpha > alpi){ // Fix one variable in one iteration
alpha = alpi;
fixedOne = i;
}
}
else if((direct[i]>m_Epsilon) && !Double.isNaN(nwsBounds[1][i])){//Not feasible
alpi = (nwsBounds[1][i]-xold[i])/direct[i];
if(alpi <= m_Zero){ // Zero
if (m_Debug)
System.err.println("Fix variable "+i+
" to upper bound "+ nwsBounds[1][i]+
" from value "+ xold[i]);
x[i] = nwsBounds[1][i];
isFixed[i]=true; // Fix this variable
alpha = 0.0;
nwsBounds[1][i]=Double.NaN; //Add cons. to working set
wsBdsIndx.addElement(new Integer(i));
}
else if(alpha > alpi){
alpha = alpi;
fixedOne = i;
}
}
}
}
if (m_Debug){
System.err.println("alamin: " + Utils.doubleToString(alamin,10,7));
System.err.println("alpha: " + Utils.doubleToString(alpha,10,7));
}
if(alpha <= m_Zero){ // Zero
m_IsZeroStep = true;
if (m_Debug)
System.err.println("Alpha too small, try again");
return x;
}
alam = alpha; // Always try full feasible newton step
if(alam > 1.0)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -