📄 asa-notes

📁 simulated annealing code ASA
💻
📖 第 1 页 / 共 4 页
字号:
careful attention to these values.  Note that the last generated state
is set to the most recently accepted state, and if the recently accepted
state also is the current best state then the last generated state will
be so reported.  Therefore, this sensitivity to the last generated state
works best during parts of the run where the code is sampling alternate
multiple minima.

The default is to baseline the cost temperature scale to the default
parameter temperature scale, using Cost_Parameter_Scale_Ratio (default
= 1).  It is advisable to first tune your parameter temperature schedule
using Temperature_Ratio_Scale, then to tune your cost temperature schedule
using Cost_Parameter_Scale_Ratio.  If it becomes difficult to do this across
the 3 stages, work with the Q QUENCH_COST as this provides a different
sensitivity at a different stage.  Generally, it is convenient to use
the c scale via Cost_Parameter_Scale_Ratio to tune the middle stage,
then add in Q modifications for the end stage.

Turning on Reanneal_Cost can be very useful for some systems to adaptively
adjust the temperature to different scales of the system.

(e) Large Parameter Dimensions
As the number of parameter dimensions D increases, you may see that your
temperatures are changing more than you would like with respect to D.
The default is to keep the parameter exponents of the k_i summed to 1
with each exponent set to 1/D.

The effective scale of the default exponential decay of the temperatures
is proportional to c k^(-Q/D), so smaller D gives smaller decay rates
for the same values of c, k and Q.  Modifications to this behavior of
the parameter and cost temperatures are easily made by altering the Q_i
and Q, resp., as Q_i, Q and D enter the code as Q_i/D and Q/D, resp.

The scales c are set as
   c = -log(Temperature_Ratio_Scale) exp[-log(Temperature_Anneal_Scale) Q/D]
Therefore, the sensitivity of c to D can be controlled by modifying
Temperature_Anneal_Scale or Q.

========================================================================
	@@Quenching

If you have a large parameter space, and if a "smart" quasi-local
optimization code won't work for you, then _any_ true global
optimization code will be faced with the "curse of dimensionality."
I.e., global optimization algorithms must sample the entire space, and
even an efficient code like ASA must do this.  As mentioned in the
ASA-README, there are some features to explore that might work for your
system.

Simulated "quenching" (SQ) techniques like genetic algorithms (GA)
obviously are important and are crucial to solving many systems in time
periods much shorter than might be obtained by standard SA.  In ASA, if
annealing is forsaken, and Quenching turned on, voiding the proof of
sampling, remarkable increases of speed can be obtained, apparently
sometimes even greater than other "greedy" algorithms.

In large D space, this can be especially useful if the parameters are
relatively independent of each other, by noting that the arguments
of the exponential temperature schedules are proportional to k^(Q/D).
Then, you might do better thinking of changing Q/D in fractional moves,
instead of only small deviations of Q from 1.

For example, in asa92_saga.pdf in the archive, along with 5 GA test
problems from the UCSD GA archive, another harder problem (the ASA_TEST
problem that comes with the ASA code) was used.  In asa93_sapvt.pdf in
this archive, Quenching was applied to this harder problem.  The resulting
SQ code was shown to speed up the search by as much as as factor of 86
(without even attempting to see if this could be increased further
with more extreme quenching).  In the asa_examples.txt file, even
more dramatic efficiencies were obtained.  This is a simple change of
one number in the code, turning it into a variant of SQ, and is not
equivalent to "tuning" any of the other many ASA options, e.g., like
SELF_OPTIMIZE, USER_COST_SCHEDULE, etc.  Note that SQ will not suffice
for all systems; several users of ASA reported that Quenching did not
find the global optimal point that was otherwise be found using the
"correct" SA algorithm.

As mentioned in the ASA-README, note that you also can use the Quenching
OPTIONS quite differently, to slow down the annealing process by
setting USER_OPTIONS->User_Quench_Param_Scale[] to values less than 1.
This can be useful in problems where the global optimal point is at a
quite different scale from other local optima, masking its presence.
This likely might be most useful for low dimensional problems where the
CPU time incurred by slower annealing might not be a major
consideration.

Once you decide you can quench, there are many more alternative
algorithms you might wish to choose for your system, e.g., creating a
hybrid global-local adaptive quenching search algorithm, e.g., using
USER_REANNEAL_PARAMETERS.  Note that just using the quenching OPTIONS
provided with ASA can be quite powerful, as demonstrated in the
asa_examples.txt file.

========================================================================
	@@Options for Large Spaces

5 Oct 94

For very large parameter-space dimensions, the following guide is
useful if you desire to speed up the search:

		Pre-Compile Options
add -DUSER_REANNEAL_PARAMETERS=TRUE to DEFINE_OPTIONS
add -DUSER_COST_SCHEDULE=TRUE to DEFINE_OPTIONS
add -DASA_PRINT_INTERMED=FALSE to DEFINE_OPTIONS
SMALL_FLOAT may have to be decreased
set QUENCH_PARAMETERS to TRUE [a risk that negates proper sampling if Q > 1]
set QUENCH_COST to TRUE
Perhaps set QUENCH_PARAMETERS_SCALE and QUENCH_COST_SCALE to FALSE

		Program Options
set Curvature_0 to TRUE
decrease Temperature_Ratio_Scale
increase Cost_Parameter_Scale_Ratio
increase Maximum_Cost_Repeat
decrease Acceptance_Frequency_Modulus
decrease Generated_Frequency_Modulus

		run time
use `nice -19 asa_run ...` as runs can be time- and CPU-intensive

If the parameter space dimension, D, is huge, e.g., 256x256=65536,
then the exponential of the generating or acceptance index to the 1/D
power hardly changes over even a few million cycles.  True annealing in
such huge spaces can become prohibitively slow as the temperatures will
hardly be diminished over these cycles.  This "curse of dimensionality"
will face any algorithm seeking to explore an unknown space.  Then,
the QUENCH_PARAMETERS and QUENCH_COST DEFINE_OPTIONS should be tried.

However, note that slowing down annealing sometimes can speed up the
search by avoiding spending too much time in some local optimal
regions.
========================================================================
	@@Shunting to Local Codes

I have always maintained in e-mails and in VFSR/ASA publications since
1987, that SA techniques are best suited for approaching complex
systems for which little or no information is available.  When the
range of a global optima is discovered, indeed it may be best to then
turn to another algorithm.  I have done this myself in several papers,
shunting over to a quasi-local search, the
Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, to "polish" off the
last 2 or 3 decimals of precision, after I had determined just what
final level of precision was acceptable.  In the problems where I
shunted to BFGS, I simply used something the value of Cost_Precision or
Limit_Acceptances (which were pretty well correlated in some problems)
to decide when to shunt over.  (I got terrible results if I shunted
over too quickly.)  However, that was before the days I added OPTIONS
like USER_COST_SCHEDULE and USER_ACCEPTANCE_TEST, and if and when I
redo some of those calcs I will first experiment adaptively using these
to account for different behaviors of my systems at different scales.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
29 Nov 97

When FITLOC is set to TRUE, three subroutines become active to perform
a local fit after leaving asa ().

========================================================================
	@@Judging Importance-Sampling

22 Nov 96

If the cost function is plotted simply as a function of decreasing
temperature(s), often the parameter space does appear to be continually
sampled in such a plot, but the plot is misleading.  That is, there
really is importance sampling taking place, and the proof of this is to
do a log-log plot of the cost function versus the number of generated
states.  Then you can see that if the temperature schedule is not
enforced you will have a poor search, if quenching is turned on you
will get a faster search (though you may miss the global minimum),
etc.  You can test these effects using quenching and "reverse
quenching" (slowing down the annealing); it likely would be helpful to
set QUENCH_COST and QUENCH_PARAMETERS to TRUE, QUENCH_PARAMETERS_SCALE
and QUENCH_COST_SCALE to FALSE, and perhaps NO_PARAM_TEMP_TEST and
NO_COST_TEMP_TEST to TRUE.

The point is that the ASA distribution is very fat-tailed, and the
"effective" widths of the parameters being searched change very slowly with
decreasing parameter temperatures; the trade-off is that the parameter
temperatures may decrease exponentially and still obey the sampling
proof.  Thus, the experience is that ASA finds global minimum when
other sampling techniques fail, and it typically finds the global
minimum faster than other sampling techniques as well.

Furthermore, the independence of cost and parameter temperatures
permits additional tuning of ASA in many difficult problems.  While the
decreasing parameter temperatures change the way the parameter states
are generated, the decreasing cost temperature changes the way the
generated states are accepted.  The sensitivity to the acceptance
criteria to the cost temperature schedule can be very important in many
systems.  An examination of a few runs using ASA_PRINT_MORE set to TRUE
can reveal premature holding onto local minimum or not enough holding
time, etc., requiring tuning of some ASA OPTIONS.
========================================================================
========================================================================
@@SPECIAL COMPILATIONS/CODE

========================================================================
	@@Tsallis Statistics

26 Feb 95

A recent paper claimed that a statistics whose parameterization permits
an asymptotic approximation to the exponential function used for the
Boltzmann of the standard SA acceptance test, Tsallis statistics, is
superior to the Boltzmann test, and an example was given comparing
standard SA to this new algorithm in the traveling salesman problem
(TSP).
	%A T.J.P. Penna
	%T Traveling salesman problem and Tsallis statistics
	%J Phys. Rev. E
	%V 50
	%N 6
	%P R1-R3
	%D 1994
There are two issues here, (a) the value of the Tsallis test vs the
Boltzmann test, and (b) the use of TSP for the confirmation of (a).

It seems very reasonable that the Tsallis test should be better than
the Boltzmann test for the SA acceptance test.  For example, if the
Boltzmann statistics did well on a given cost function $C$, then it
might be the case that for the cost function $C prime = exp ( C )$ a
more moderate test, such as obtained for some parameterizations of the
Tsallis statistics, would be more appropriate to avoid getting stuck in
local minima of $C prime$.  In fact, from its first inception VFSR and
ASA have included parameters to effect similar alternatives, and the
latest versions of ASA now have the Tsallis statistics as another
alternative that can be commented out.  I have not yet experienced any
advantages of this over the Boltzmann test when other ASA alternatives
are permitted to be used, but it seems likely that there do exist some
problems that might benefit by its use.

The use of TSP as a test for comparisons among SA techniques seems
quite inappropriate.  To quote another source,
	%A D.H. Wolpert
	%A W.G. Macready
	%T No free lunch theorems for search
	%R Report
	%I Santa Fe Institute
	%C Santa Fe, NM
	%D 1995
\*QAs an example of this, it is well known that generic methods (like
simulated annealing and genetic algorithms) are unable to compete with
carefully hand-crafted solutions for specific search problems.  The
Traveling Salesman (TSP) Problem is an excellent example of such a
situation; the best search algorithms for the TSP problem are
hand-tailored for it.\*U

========================================================================
	@@Dynamic Hill Climbing (DHC)

26 Feb 95

Michael de la Maza posted notices to public electronic bulletin boards,
e.g., as summarized in a public mailing list GA-List@AIC.NRL.NAVY.MIL,
that his new algorithm, dynamic hill climbing (DHC), clearly
outperformed genetic algorithms and ASA.  His code is available by
sending e-mail to dhc@ai.mit.edu.  Since DHC is a variant of a "greedy"
algorithm, it seemed appropriate to permit ASA to also enter its
quenching (SQ) domain.  The following excerpt is the reply posting in
the above bulletin board, also included above in the Quenching section.

\*QSQ techniques like GA obviously are important and are crucial to
solving many systems in time periods much shorter than might be
obtained by SA.  In ASA, if annealing is forsaken, and Quenching turned
on, voiding the proof of sampling, remarkable increases of speed can be
obtained, apparently sometimes even greater than other "greedy"
algorithms.  For example, in asa92_saga.pdf, along with 5 GA test
problems from the UCSD GA archive, another harder problem (the ASA_TEST
problem that comes with the ASA code) was used.  In asa93_sapvt.pdf,
Quenching was applied to this harder problem.  The resulting SQ code
was shown to speed up the search by as much as as factor of 86 (without
even attempting to see if this could be increased further with more
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -