📄 macopt.c
字号:
/* macopt library release 1.1 gradient-based optimizer Copyright (c) 2002 David J.C. MacKay This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA GNU licenses are here : http://www.gnu.org/licenses/licenses.html Author contact details are here : http://www.inference.phy.cam.ac.uk/mackay/c/macopt.html mackay@mrao.cam.ac.uk If you find macopt useful, please feel free to make a donation to support David MacKay's research group.*//* #include <stdio.h>#include <math.h> */#include "../newansi/r.h" /* #include "../ansi/nrutil.h" *//* #include "../newansi/mynr.h" */#include "../newansi/macopt.h"/* Please do not use macopt without understanding a little about how it works; there are some control parameters which the user MUST set! This optimizer should only be used for functions whose first derivative is continuous. David MacKay's optimizer, based on conjugate gradient ideas, but using bracketing of the zero of the inner product (gradient).(line_search_direction) to do the line minimization. Only derivative calculations are required. The length of the first step in the line search (often set to "1.0" in other code) is adapted here so that, if 0.00001 is a better step size, it soon cottons on to that and saves ~log(10000) bracketing operations. The result is that (with 'rich' set to 0) the program can use as few as 2 derivatives per line search. (If rich is set to 1, it does an extra derivative calculation at the beginning of each line search making a minimum of 3 per line search. Set rich=0 if you think that the surface is locally quite quadratic.) If the program does average 2 derivatives per line search then it must be superior to most cg methods including use of Rbackprop (which costs 2 derivatives straight off) A possible modification: where the function can be returned at same time as the dfunction --- there is nothing clever to do with the value, but it could be used as a sanity check and a convergence criterion. See http://131.111.48.24/mackay/c/macopt.html for further discussion. NB: The value of "tol" is totally arbitrary and must be set by you to a value that works well for your problem. It depends completely on the typical value of the gradient / step size. Tol specifies a magnitude of gradient at which a halt is called. or a step size. This program MINIMIZES a function. ================ NEW ====================== ======== macoptIIc ======== =========================================== 99 02 05: macoptc is the new version which is a 'covariant' optimizer - covariant with respect to some reparameterizations. Traditional conjugate gradients assumes a metric that is I (identity). In macoptIIc, the user gets to specify this metric. For speed, the metric is still forced to be diagonal. But in some problems, where we expect the gradient with respect to some parameters to be much larger than others, say, a wise user will be able to make good use of this feature. The metric allows you to anticipate a good rescaling of the gradients to define a sensible line search direction. For an example of the covariance this produces, see: test_macIIc < example10c test_macIIc < example and contrast with test_macIIc < example10 This example doesn't show anything particularly bad happens, (though the first line search makes very little progress in the 2 direction because the gradient wrt 1 is so big) --- it just allows you to confirm that when you use the covariant routine with the right metric, you can change variables and the optmizer behaves exactly the same if you start from the same initial conditions.*/ /* macoptIIc returns a status variable: 1/2 = normal end states -1 = overran itmax */int macoptIIc (double *p, /* starting vector */ int n, /* number of dimensions */ void (*dfunc)(double *,double *, void *), /* evaluates the gradient of the optimized function */ void *dfunc_arg, /* arguments that get passed to dfunc */ macopt_args *a /* structure in which optimizer arguments stored */ ) /* Note, (*func)(double *,void *) is not used */{ int j , actual_itmax ; double gg , gam , dgg ; double *g , *h , *xi , *mg , *m ; int end_if_small_grad = 1 - a->end_if_small_step ; double step , tmpd ; /* p is provided when the optimizer is called pt is used by the line minimizer as the temporary vector. this could be cut out with minor rewriting, using p alone g, h and xi are used by the cg method as in NR - could one of these be cut out? mg is the natural gradient for covariant optmizer m is the metric ( set by the user ) the line minimizer uses an extra gx and gy to evaluate two gradients.*/ if ( !(a->metric) ) { fprintf(stderr,"warning, macoptIIc requires metric, continuing, assuming it is there\n"); a->metric = 1 ; } macopt_allocate ( a , n ) ; mg = a->mg ; m = a->m ; g = a->g ; h = a->h ; xi = a->xi ; (*dfunc)( p , xi , dfunc_arg ); for ( j = 1 ; j <= n ; j ++ ) mg[j] = - m[j] * xi[j] ; /* macoptIIc */ macopt_restart ( a , 1 ) ; actual_itmax = ( n > 1 ) ? a->itmax : 1 ; for ( a->its = 1 ; a->its <= actual_itmax ; a->its ++ ) { for ( gg = 0.0 , j = 1 ; j <= n ; j ++ ) /* g is minus the gradient, so is mg */ gg += g[j]*mg[j]; /* find the magnitude of the old gradient */ a->gtyp = sqrt ( gg / (double)(n) ) ; if ( a->verbose > 0 ) printf ( "mac_it %d of %d : gg = %6.3g tol = %6.3g: ", a->its , a->itmax , gg , a->tol ) ; if ( ( end_if_small_grad && ( gg <= a->tol ) ) || ( gg <= a->grad_tol_tiny ) ) { macopt_free ( a ) ; if ( a->verbose > 0 ) printf ("\n"); return (1) ; /* normal end caused by small grad */ } /* The following option allows you to call a subroutine which records or reports the current value of the parameter vector. This is the beginning of a new line search */ if ( a->do_newitfunc ) { (*(a->newitfunc))(p, a->its, a->newitfuncarg); } step = maclinminII ( p , dfunc , dfunc_arg , a ) ; if ( a->restart == 0 ) { if ( a->verbose > 1 ) printf (" (step %9.5g)",step); if ( a->verbose > 0 ) printf ("\n"); if ( ( a->end_if_small_step && ( step <= a->tol ) ) || ( step <= a->step_tol_tiny ) ) { macopt_free ( a ) ; return (2) ; /* normal end caused by small step */ } } /* if we are feeling rich, evaluate the gradient at the new `minimum'. alternatively, linmin has already estimated this gradient by linear combination of the last two evaluations and left it in xi */ if ( a->its < actual_itmax ) { /* i.e. if it is worth thinking any more.... */ if ( a->rich || a->restart ) { (*dfunc)( p , xi , dfunc_arg ) ; for ( j = 1 ; j <= n ; j ++ ) mg[j] = - m[j] * xi[j] ; /* macoptIIc */ } if ( a->restart ) { fprintf(stderr,"Restarting macopt (1)\n" ) ; macopt_restart ( a , 0 ) ;/* this is not quite right should distinguish whether there was an overrun indicating that the value of lastx needs to be bigger / smaller; in which case resetting lastx to default value may be a bad idea, giving an endless loop of resets */ } else { dgg=0.0; for ( j = 1 ; j <= n ; j ++ ) { dgg -= ( xi[j] + g[j] ) * mg[j] ; /* sign uncertainty!! */ } gam = dgg / gg ; for ( tmpd = 0.0 , j = 1 ; j <= n ; j ++ ) { g[j] = -xi[j]; /* g stores (-) the most recent gradient */ xi[j] = h[j] = mg[j] + gam * h[j] ; /* h stores xi, the current line direction */ /* check that the inner product of gradient and line search is < 0 */ tmpd -= xi[j] * g[j] ; } if ( tmpd > 0.0 || a->verbose > 2 ) { fprintf(stderr,"new line search has inner prod %9.4g\n", tmpd ) ; } if ( tmpd > 0.0 ) { if ( a->rich == 0 ) { fprintf (stderr, "macoptIIc - Setting rich to 1; " ) ; a->rich = 1 ; } a->restart = 2 ; /* signifies that g[j] = -xi[j] is already done */ fprintf(stderr,"Restarting macopt (2)\n" ) ; /* is mg correct here? */ macopt_restart ( a , 0 ) ; } } } } if ( actual_itmax > 1 ) fprintf(stderr,"Reached iteration limit (%d) in macopt; continuing.\n",a->itmax); macopt_free ( a ) ; return ( actual_itmax > 1 ) ? (-1) : (0) ; /* abnormal end caused by itn limit - BAD, unless n=1 */} /* NB this leaves the best value of p in the p vector, but the function has not been evaluated there if rich=0 */void macoptII (double *p, /* starting vector */ int n, /* number of dimensions */ void (*dfunc)(double *,double *, void *), /* evaluates the gradient of the optimized function */ void *dfunc_arg, /* arguments that get passed to dfunc */ macopt_args *a /* structure in which optimizer arguments stored */ ) /* Note, (*func)(double *,void *) is not used */{ int j , actual_itmax ; double gg , gam , dgg ; double *g , *h , *xi ; int end_if_small_grad = 1 - a->end_if_small_step ; double step , tmpd ; /* A total of 7 double * 1..n are used by this optimizer. p is provided when the optimizer is called pt is used by the line minimizer as the temporary vector. this could be cut out with minor rewriting, using p alone g, h and xi are used by the cg method as in NR - could one of these be cut out? the line minimizer uses an extra gx and gy to evaluate two gradients. */ macopt_allocate ( a , n ) ; g = a->g ; h = a->h ; xi = a->xi ; (*dfunc)( p , xi , dfunc_arg ); macopt_restart ( a , 1 ) ; actual_itmax = ( n > 1 ) ? a->itmax : 1 ; for ( a->its = 1 ; a->its <= actual_itmax ; a->its ++ ) { for ( gg = 0.0 , j = 1 ; j <= n ; j ++ ) gg += g[j]*g[j]; /* find the magnitude of the old gradient */ a->gtyp = sqrt ( gg / (double)(n) ) ; if ( a->verbose > 0 ) printf ( "mac_it %d of %d : gg = %6.3g tol = %6.3g: ", a->its , a->itmax , gg , a->tol ) ; if ( ( end_if_small_grad && ( gg <= a->tol ) ) || ( gg <= a->grad_tol_tiny ) ) { macopt_free ( a ) ; if ( a->verbose > 0 ) printf ("\n"); return; /* (1) ; normal end caused by small grad */ } /* The following option allows you to call a subroutine which records or reports the current value of the parameter vector. This is the beginning of a new line search */ if ( a->do_newitfunc ) { (*(a->newitfunc))(p, a->its, a->newitfuncarg); } step = maclinminII ( p , dfunc , dfunc_arg , a ) ; if ( a->restart == 0 ) { if ( a->verbose > 1 ) printf (" (step %9.5g)",step); if ( a->verbose > 0 ) printf ("\n"); if ( ( a->end_if_small_step && ( step <= a->tol ) ) || ( step <= a->step_tol_tiny ) ) { macopt_free ( a ) ; return; /* (2) ; normal end caused by small step */ } } /* if we are feeling rich, evaluate the gradient at the new `minimum'. alternatively, linmin has already estimated this gradient by linear combination of the last two evaluations and left it in xi */ if ( a->its < actual_itmax ) { /* i.e. if it is worth thinking any more.... */ if ( a->rich || a->restart ) { (*dfunc)( p , xi , dfunc_arg ) ; } if ( a->restart ) { fprintf(stderr,"Restarting macoptII (1)\n" ) ; macopt_restart ( a , 0 ) ;/* this is not quite right should distinguish whether there was an overrun indicating that the value of lastx needs to be bigger / smaller; in which case resetting lastx to default value may be a bad idea, giving an endless loop of resets */ } else { dgg=0.0; for ( j = 1 ; j <= n ; j ++ ) { dgg += ( xi[j] + g[j] ) * xi[j] ; } gam = dgg / gg ; for ( tmpd = 0.0 , j = 1 ; j <= n ; j ++ ) { g[j] = -xi[j]; /* g stores (-) the most recent gradient */ xi[j] = h[j] = g[j] + gam * h[j] ; /* h stores xi, the current line direction */ /* check that the inner product of gradient and line search is < 0 */ tmpd -= xi[j] * g[j] ; } if ( tmpd > 0.0 || a->verbose > 2 ) { fprintf(stderr,"new line search has inner prod %9.4g\n", tmpd ) ; } if ( tmpd > 0.0 ) { if ( a->rich == 0 ) { fprintf (stderr, "macoptII - Setting rich to 1; " ) ; a->rich = 1 ; } a->restart = 2 ; /* signifies that g[j] = -xi[j] is already done */ fprintf(stderr,"Restarting macoptII (2)\n" ) ; macopt_restart ( a , 0 ) ; }
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -