📄 lbfgs.c
字号:
/*# proc: lbfgs - Limited Memory BFGS Method for Large Scale Optimization by# proc: Jorge Nocedal# proc: lb1 - Prints monitoring information w.r.t. lbfgs optimization.# proc: mcsrch - Finds a step which satisfies a sufficient decrease condition# proc: and a curvature condition.# proc: mcstep - Computes a safeguarded step for a linesearch and to update an# proc: interval of uncertainty for a minimizer of the function.*//* This is a C version of Nocedal's LBFGS routine, Limited-Memory BFGSoptimizer, and its supporting routines LB1, MCSRCH, and MCSTEP. Itwas translated by gtc from a provided Fortran version (which hadalready been modified by jlb), by a combination of editing, andrunning the f2c Fortran-to-C converter program.(The parm EPS which was in the Fortran version of LBFGS() is notpresent in this version. It was removed because, in the Fortranversion (as modified by jlb), it is not used.) */#include <stdio.h>#include <math.h>#include <mlp/defs.h>#include <mlp/macros.h>#include <mlp/blas.h>/* To activate surveying, uncomment following line: *//* #define SURVEY *//*******************************************************************/ /* -------------- function lbfgs -------------- Limited Memory BFGS Method for Large Scale Optimization Jorge Nocedal *** July 1990 *** This function solves the unconstrained minimization problem min f(x), x = (x[0],x[1],...,x[n-1]), using the limited memory BFGS method. The routine is especially effective on problems involving a large number of variables. In a typical iteration of this method an approximation Hk to the inverse of the Hessian is obtained by applying m BFGS updates to a diagonal matrix Hk0, using information from the previous m steps. The user specifies the number m, which determines the amount of storage required by the routine. The user may also provide the diagonal matrices Hk0 if not satisfied with the default choice. The algorithm is described in "On the limited memory BFGS method for large scale optimization", by D. Liu and J. Nocedal, Mathematical Programming B 45 (1989) 503-528. The user is required to calculate the function value f and its gradient g. In order to allow the user complete control over these computations, reverse communication is used. The routine must be called repeatedly under the control of the parameter iflag. The steplength is determined at each iteration by means of the line search routine mcsrch, which is a slight modification of the routine CSRCH written by More' and Thuente. The calling statement is: lbfgs(n, m, x, f, g, diagco, diag, iprint, xtol, work, iflag, info, fp_mon, fp_err, gtol, stpmin, stpmax, itopt, iter, ierr, stp); where: [(input) int] n is an int variable that must be set by the user to the number of variables. It is not altered by the routine. Restriction: n > 0. [(input) int] m is an int variable that must be set by the user to the number of corrections used in the bfgs update. It is not altered by the routine. Values of m less than 3 are not recommended; large values of m will result in excessive computing time. 3 <= m <= 7 is recommended. Restriction: m > 0. [float vector] x is a float array of length n. On initial entry it must be set by the user to the values of the initial estimate of the solution vector. On exit with *iflag = 0, it contains the values of the variables at the best point found (usually a solution). [(input) float] f is a float variable. Before initial entry and on a re-entry with *iflag = 1, it must be set by the user to contain the value of the function f at the point x. [float vector] g is a float array of length n. Before initial entry and on a re-entry with *iflag = 1, it must be set by the user to contain the components of the gradient at the point x. [(input) int] diagco is an int variable that must be set to TRUE if the user wishes to provide the diagonal matrix Hk0 at each iteration. Otherwise it should be set to FALSE, in which case lbfgs will use a default value described below. If diagco is set to TRUE the routine will return at each iteration of the algorithm with *iflag = 2, and the diagonal matrix Hk0 must be provided in the array diag. [float vector] diag is a float array of length n. If diagco is TRUE, then on initial entry or on re-entry with *iflag = 2, diag must be set by the user to contain the values of the diagonal matrix Hk0. Restriction: all elements of diag must be positive. [int vector] iprint is an int array of length two which must be set by the user as follows: iprint[0] specifies the frequency of the output: iprint[0] < 0 : no output is generated. iprint[0] = 0 : output only at first and last iteration. iprint[0] > 0 : output every iprint[0] iterations. iprint[1] specifies the type of output generated: iprint[1] = 0 : iteration count, number of function evaluations, function value, norm of the gradient, and steplength. iprint[1] = 1 : same as iprint[1] = 0, plus vector of variables and gradient vector at the initial point. iprint[1] = 2 : same as iprint[1] = 1, plus vector of variables. iprint[1] = 3 : same as iprint[1] = 2, plus gradient vector. [(input) float] xtol is a positive float variable that must be set by the user to an estimate of the machine precision (e.g. 10^(-7) on a SUN station 3/60). The line search routine will terminate if the relative width of the interval of uncertainty is less than xtol. [float vector] work is a float array of length n(2m+1)+2m used as workspace for lbfgs. This array must not be altered by the user. [(input/output) int-pointer] iflag is an int-pointer, used for input and output of an int defined in the calling routine; *iflag must be 0 on initial entry to lbfgs. A return with *iflag < 0 indicates an error, and *iflag = 0 indicates that the routine has terminated without detecting errors. On a return with *iflag = 1, the user must compute the function value f and gradient g. On a return with *iflag = 2, the user must provide the diagonal matrix Hk0. The following negative values of *iflag, detecting an error, are possible: *iflag = -1 The line search routine mcsrch failed. The output parameter "info" provides more detailed information (see also the documentation of mcsrch): *info = 0 Improper input parameters. *info = 2 Relative width of the interval of uncertainty is at most xtol. *info = 3 More than maxfev function evaluations were required at the present iteration. *info = 4 The step is too small. *info = 5 The step is too large. *info = 6 Rounding errors prevent further progress. There may not be a step which satisfies the sufficient decrease and curvature conditions. Tolerances may be too small. *iflag = -2 The i-th diagonal element of the diagonal inverse Hessian approximation, given in diag, is not positive. *iflag = -3 Improper input parameters for lbfgs (n or m is not positive). [Parms info through stpmax added by gtc: info was always returned by mcsrch but was not returned by lbfgs, and fp_mon, fp_err, gtol, stpmin, and stpmax were global variables in the LB3 common block in the Fortran version.] [(output) int-pointer] info returns the info value returned by mcsrch. This is of interest only if *iflag = -1. [(input) FILE-pointer] fp_mon will be used for the writing of monitoring information, as controlled by iprint. [(input) FILE-pointer] fp_err will be used for the writing of error messages. This writing may be suppressed by setting fp_err to (FILE *)NULL. [(input) float] gtol controls the accuracy of the line search routine mcsrch. If the function and gradient evaluations are inexpensive with respect to the cost of the iteration (which is sometimes the case when solving very large problems) it may be advantageous to set gtol to a small value. A typical small value is 0.1. Restriction: gtol should be greater than 1.e-04. [(input) float] stpmin and stpmax are nonnegative variables which specify lower and upper bounds for the step in the line search. The recommended values (in original Fortran version, default values) are 1.e-20 and 1.e+20. These values need not be modified unless the exponents are too large for the machine being used, or unless the problem is extremely badly scaled (in which case the exponents should be increased). [Remaining parms added by jlb:] [int-pointer] itopt is ? [This is just passed to optchk as its second arg, which is supposed to be how many iterations the optimization has used so far. Should eventually be changed to an int, not int-pointer.] [int-pointer] iter is ? [(output) int-pointer] ierr returns the ierr value returned by optchk(). [Note that depending on *ierr, *iflag can (?) be set differently than indicated in its comment above, which therefore should be modified.] [float-pointer] stp is ? Machine Dependencies: The only variables that are machine-dependent are xtol, stpmin and stpmax. General Information: Other routines called directly: saxpy (BLAS), sdot (BLAS), lb1 (code in this file), mcsrch (code in this file). Input/Output: no input; diagnostic messages to fp_mon and error messages to fp_err. */voidlbfgs(n, m, x, f, g, diagco, diag, iprint, xtol, work, iflag, info, fp_mon, fp_err, gtol, stpmin, stpmax, itopt, iter, ierr, stp)int n, m, diagco, *iprint, *iflag, *info, *itopt, *iter, *ierr;FILE *fp_mon, *fp_err;float *x, f, *g, *diag, xtol, *work, gtol, stpmin, stpmax, *stp;{#ifdef SURVEY char str[50]; void fsaso(), survey();#endif static int inmc, iscn, nfev, iycn, nfun, ispt, iypt, i, bound, point, cp, finish, maxfev, npt, one = 1; float r1; static float beta, ftol, gnorm, xnorm, sq, yr, ys, yy, pctmin_junk, stp1; void mcsrch(), optchk(), lb1(); /* parameter adjustments */ --work; --iprint; --diag; --g; --x; /* Initialize ---------- */ if(*iflag == 0) goto L10; switch ((int)(*iflag)) { case 1: goto L172; case 2: goto L100; } L10: *iter = 0; if(n <= 0 || m <= 0) goto L196; if(gtol <= 1.e-4) { if(fp_err != (FILE *)NULL) fprintf(fp_err, "\n gtol is less than or equal to 1.e-04 ;\n\ it has been reset to 9.e-01 .\n"); gtol = .9; } nfun = 1; point = 0; finish = FALSE; if(diagco) { for(i = 1; i <= n; i++) if(diag[i] <= 0.) goto L195; } else for(i = 1; i <= n; i++) diag[i] = 1.; /* The work vector w is divided as follows: ---------------------------------------- Locations 1,...,n are used to store the gradient and other temporary information. Locations n+1,...,n+m store the scalars rho. Locations n+m+1,...,n+2m store the numbers alpha used in the formula that computes h*g. Locations n+2m+1,...,n+2m+nm store the last m search steps. Locations n+2m+nm+1,...,n+2m+2nm store the last m gradient differences. The search steps and gradient differences are stored in a circular order controlled by the parameter point. */ ispt = n + (m << 1); iypt = ispt + n * m; for(i = 1; i <= n; i++) work[ispt + i] = -g[i] * diag[i]; gnorm = snrm2_(&n, &g[1], &one); stp1 = 1. / gnorm; /* parameters for line search routine */ /* ftol = 1.e-4; */ /* maxfev = 20; */ /* jlb: */ ftol = 1.e-10; maxfev = 5; if(iprint[1] >= 0) lb1(&iprint[1], *iter, nfun, gnorm, n, m, &x[1], f, &g[1], *stp, finish, fp_mon); /* ------------------- Main iteration loop ------------------- */ L80: (*iter)++; *info = 0; bound = *iter - 1; if(*iter == 1) goto L165; if(*iter > m) bound = m; ys = sdot_(&n, &work[iypt + npt + 1], &one, &work[ispt + npt + 1], &one); if(!diagco) { yy = sdot_(&n, &work[iypt + npt + 1], &one, &work[iypt + npt + 1], &one); for(i = 1; i <= n; i++) diag[i] = ys / yy; } else { *iflag = 2; return; } L100: if(diagco) for(i = 1; i <= n; i++) if(diag[i] <= 0.) goto L195; /* ----------------------------------------------------------- Compute -H*G using the formula given in: Nocedal, J. 1980, "Updating quasi-Newton matrices with limited storage", Mathematics of Computation, Vol. 24, No. 151, pp. 773-782. ----------------------------------------------------------- */ cp = point; if(point == 0) cp = m; work[n + cp] = 1. / ys; for(i = 1; i <= n; i++) work[i] = -g[i]; cp = point; for(i = 1; i <= bound; i++) { --cp; if(cp == -1) cp = m - 1; sq = sdot_(&n, &work[ispt + cp * n + 1], &one, &work[1], &one); inmc = n + m + cp + 1; iycn = iypt + cp * n; work[inmc] = work[n + cp + 1] * sq;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -