📄 asa-notes
字号:
the user_cost_function(). This can be very useful when a partial
calculation of the cost function suffices to apply an acceptance
criteria.
You can define an alternative to the ASA generating function (or
whatever algorithm is required) using USER_GENERATING_FUNCTION. For
example, mild modifications to the ASA distribution can be useful,
e.g., slowing down the annealing schedule by taking a fractional root
of the current temperature.
In the current implementation, only one current optimal point is kept
at a time. I do not see the utility of keeping more than one optimal
point at a time. For example, some people have asked if starting with
several random seeds would help the efficiency of the code. I do not
think so: The fat tail of the ASA distribution results in a fairly
high generated to accepted ratio, and in practice this accounts for a
fairly robust sampling. However, their is much merit in considering
calculating blocks of generating points. As I mention in
asa92_mnn.pdf, this can be extremely helpful in a parallelized
version. There are ASA_PARALLEL hooks in the present code to do this,
as explained in the ASA-README.
I do not think some comments about SA theory not having any practical
value, given finite machine precision, a popular argument to be sure,
is too relevant. Most complex systems exist at multiple scales; in
fact, for most physical systems, the very concept of "noise" is really
the the introduction of new variables (e.g., in an appropriate Ito or
Stratonovich representations, etc.) to represent some statistical
aggregation over "fast" variables. In this context, sampling a cost
function/system should be understood by the researcher as typically
being appropriate to some scale(s). For many systems, machine
precision suffices at the appropriate scale(s) to effectively sample
the parameter space, and here I think it relevant that SA techniques
can offer a better guide (_especially_ in the absence of other info
about the system, e.g., the existence of convexity, etc.) to
effectively sample the space (if indeed that is required) than other
algorithms.
========================================================================
@@Parameter-Temperature Scales
If indeed you need to have many generated states, then increasing
OPTIONS->Temperature_Ratio_Scale (by way of lowering the argument of
the power of 10) is a good idea, as this lowers
m = -log(OPTIONS->Temperature_Ratio_Scale)
which lowers
c_i = m_i exp(-n_i/D)
which permits slower annealing via
T_i(k_i) = T_0i exp(-c_i k_i^(1/D))
so that you still might have some moderate (not too small) temperatures
out at high numbers of generated states. If
RATIO_TEMPERATURE_SCALES is set to TRUE, then
m_i = m OPTIONS->User_Temperature_Ratio[i].
Also note that
n_i = log(OPTIONS->Temperature_Anneal_Scale)
and it is not really necessary to have an OPTIONS to set these
independently as one can always use m_i for this purpose.
Of course, both m_i and n could have been aggregated into one OPTIONS
c_i. However, as explained in the first VFSR paper and as outlined in
the ASA-README, there is a good rationale for keeping m_i and n, with their
different effects on c_i, as they usefully model the approximate
"expected" ratio of final to initial temperatures and the associated
numbers of generated states amassed during annealing, respectively.
========================================================================
@@Equality Constraints
13 Dec 94
If you have equality constraints that can only be enforced as actual
equations in the code (e.g., you can't numerically use them to
substitute in other expressions), you will have problems. This is
simply because you are constraining the search on the surface of some
volume, and the entire volume is being sampled. This will be the case
when using any true sampling algorithm.
For example, if you have a cost function with n parameters,
C(p_1, p_2, ..., p_n), and an equality constraint between parameters
p_n and p_n-1, then solve this equation for p_n, numerically or
algebraically, redefining your cost function to one with n-1
parameters, C'. If the solution to this equation, or perhaps a set of
m such equality constraints to reduce the number of parameters actually
processed by ASA to n-m, is not simply written down, then you will of
course have to solve such constraints with other algorithms within your
cost function. If the solution of these equality constraints are so
difficult that by themselves they cannot be approached with
quasi-Newton algorithms, then you could use the recursive properties of
ASA to solve these equations, appropriately defined by another cost
function within your original cost function.
However, if there are branches of multiple solutions of these equality
constraints, then you could use these as a discrete or continuous set
of parameter values within your cost function, instead of reducing the
parameter dimension processed by ASA, e.g., perhaps using
OPTIONS->Sequential_Parameters to delay generating a choice among the
roots of the equality constraints until the other independent
parameters are given new generated values; see the ASA-README for the use
of parameter_minimum[] and parameter_maximum[] which may be required
for such cases.
========================================================================
@@Number of Generated States Required
The question often arises as to how to guess the required time required
to get the global minimum, which likely is best measured by the
required number of generated states. While there are quite a few
papers published on this important topic, in general it is quite
difficult to give a categorical answer, basically because (a) nonlinear
systems typically are quite different, (b) many nonlinear systems have
different "terrain" at different scales -- essentially being different
systems at these different scales, and (c) results can be very
dependent on the global optimization algorithm used.
Of course, if ASA already has given you an optimal state, this can be
considered a tentative bound, and then you can explore possibilities to
get this same optimal state with fewer generated states, e.g., using
SELF_OPTIMIZE if you do not enough information about your system to
make some educate guesses for further tuning of the OPTIONS.
Otherwise, you can use plots of generated states versus current the
best cost_function value mentioned above in General Comments, to
extrapolate the number of generated states required to achieve future
values of the cost_function. Since experience has shown that many
systems exhibit at least three different regions with quite different
shapes -- (1) a quasi-linear region during the initial broad search,
(2) a quasi-linear region during the final search, and (3) a quite
nonlinear region between (1) and (2) -- you would have to be fairly
certain that you are in region (2) in order to consider any such
extrapolation even a crude guess to the number of required generated
states. Furthermore, you can perform such plots with several values of
selected OPTIONS, to help extrapolate the required number of generated
states as a function of these OPTIONS.
Note that some of my previous ASA publications illustrate comparisons
of such log-log plots with other global optimization algorithms. Even
the general shape, not just the end result, of the plots can differ
depending on the algorithm used. This is further evidence that using
general theoretical guides, as mentioned in (c) above, can be quite
misleading.
========================================================================
========================================================================
@@TUNING FOR SOME SYSTEMS
========================================================================
@@Tuning
Nonlinear systems are typically not typical, and so it is difficult if
not impossible to give guidelines for ASA defaults similar to what you
might expect for "canned" quasi-linear systems. I have tried to
prepare the ASA-README to give some guidelines, and if all else fails you
could experiment a bit using a logical approach with the SELF_OPTIMIZE
OPTIONS. I still advise some experimentation that might yield a bit of
insight about a particular system. In many case, the best approach is
probably a "blend": Make a guess or two, then fine-tune the guesses
with SELF_OPTIMIZE in some rather finer range of the parameter(s). The
reason this is slow is because ASA does what you expect it to do: It
truly samples the space. When SELF_OPTIMIZE is turned on, for each
call of the top-level ASA parameters selected, the "inner" shell of
your system's parameters are optimized, and this is performed for an
optimization of the "outer" top-level shell of ASA parameters. If you
find that indeed this is a necessary and valuable approach to your
problem, then one possible short cut might be to turn on Quenching for
the outer shell.
The ASA proof of statistical convergence to a global optimal point
gives sufficient, _not_ necessary, conditions. This still is a pretty
strong statement since one can only importance-sample a large space in
a finite time. Note that some spaces would easily require CPU times
much greater than the lifetime of the universe to sample all points.
If you "tucked away" a "pathological" singular optimal point in an
otherwise "smooth" space, indeed ASA might have to run "forever." If
the problem isn't quite so pathological, you might have to slow down
the annealing, to permit ASA to spend more time at each scale to
investigate the finer scales; then, you would have to explore some
other OPTIONS. This could be required if your problem looks different
at different scales, for then you can often get trapped in local
optima, and thus ASA could fail just as any other "greedy" quasi-Newton
algorithm.
Because of its exponential annealing schedule, ASA does converge at the
end stages of runs quite well, so if you start with your setup akin to
this stage, you will search for a very long time (possibly beyond your
machine's precision to generate temperatures) to get out. Or, if you
start with too broad a search, you will spin your wheels at first
before settling down to explore multiple local optima.
ASA has demonstrated many times that it is more efficient and gets the
global point better than other importance-sampling techniques, but this
still can require "tuning" some ASA OPTIONS. E.g., as mentioned in the
ASA-README, a quasi-Newton algorithm should be much more efficient than ASA
for a parabolic system.
========================================================================
@@Some Tuning Guidelines
21 Jan 00
Here are some crude guidelines that typically have been useful to tune
many systems. At least ASA has a formal proof of convergence to the
global minimum of your system. However, no sampling proof is general
enough for all systems to guarantee this will take place within your
lifetime. This is where the true power of ASA comes into play as the
code provides many tuning OPTIONS, most that can be applied adaptively
at any time in the run, to give you tools to tune your system to provide
reasonably efficient optimizations. Depending on your system, this may
be easy or hard, possibly taxing anyone's intuitive and analytic capabilities.
In general, respect the optimization process as a simulation in
parameter space. The behavior of a system in this space typically is
quite different from the system defined by other variables in the system.
(a) Three Stages of Optimization
It is useful to think of the optimization process as having three main
stages: initial, middle and end. In the initial stage you want to be sure
that ASA is jumping around a lot, visiting all regions of the parameter
space within the bounds you have set. In the end stage you want to be
sure that the cost function is in the region of the global minimum, and
that the cost function as well as the parameter values are being honed to
as many significant figures as required. The middle stage typically can
require the most tuning, to be sure it smoothly takes the optimization
from the initial to the end stage, permitting plenty of excursions to
regularly sample alternative regions/scales of the parameter space.
(b) Tuning Information
Keep ASA_PRINT_MORE set to TRUE during the tuning process to gather
information in asa_out whenever a new accepted state is encountered.
If you have ASA_PIPE and/or ASA_PIPE_FILE set to TRUE, additional
information (in relatively larger files) is gathered especially for
purposes of graphing key information during the run. Graphical aids
can be indispensable for gaining some intuition about your system.
If ASA_SAVE_OPT is set to TRUE then you have the ability to restart runs
from intermediate accepted states, without having to reproduce a lot of
the original run each time you wish to adaptively change some OPTIONS
after a given number of accepted or generated states.
(c) Parameter Temperatures
As discussed above in the section Parameter-Temperature Scales,
the temperature schedule is determined by {T_0i, c_i, k_i, Q_i, D}.
The default is to have all these the same for each parameter temperature.
See below for a discussion on sensitivities with respect to dimension D.
Note that the sensitivity of the default parameter distributions to
the parameter temperatures is logarithmic. Therefore, middle stage
temperatures of 10^-6 or 10^-7 still permit very large excursions from the
last local minima to visit new generated states. Typically (of course
depending on your system), values of 10^-10 are appropriate for the end
stage of optimization.
It is advisable to start by changing the c_i to get a reasonable
temperature schedule throughout the run. If it becomes difficult to
do this across the 3 stages, work with the Q_i QUENCH_PARAMETERS as
these provide different sensitivities at different stages. Generally,
it is convenient to use the c_i to tune the middle stage, then add in
Q_i modifications for the end stage. As long as the sum Q_i <= 1, then
the sampling proof is intact. However, once you are sure of the region
of the global minima, it can be convenient to turn on actual quenching
wherein sum Q_i > 1.
Turning on Reanneal_Parameters can be very useful for some systems to
adaptively adjust the temperatures to different scales of the system.
(d) Cost Temperature
Note that the sensitivity of the default cost distribution to the cost
temperatures is exponential.
In general, you would like to see the cost temperatures throughout
the run be on the scale of the difference between the best and last
generated states, where the last generated state in the run is at the
last local minima from which new states are explored. Therefore, pay
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -