📄 manual.tex

📁 贝叶斯算法：盲分离技术
💻 TEX
📖 第 1 页 / 共 5 页
字号:
      \hbox{\z\j\l\j\l\j\l\j\l\j\l\j\l\j\l\j\l\j\l\j\l\j\I\L\Z\I\L\Z\I\L\Z\I\l\z\j\l\z\j\l\z\j\l\z\j\l\z\j\l\z\j\l\z\j\j\l\j\l\j\l\j\l\j\l\j\l}
      \hbox{\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\n\O\Z\O\Z\O\I\Z\Z\O\Z\Z\O\Z\Z\O\Z\Z\O\Z\Z\O\z\z\o\z\z\o\z\z\o\z\z\o\z\z\o\j\z\o\z\o\z\o\z\o\z\o\z}
      \hbox{\n\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\o\I\I\L\I\I\L\I\Z\I\L\O\L\I\Z\I\L\O\L\I\Z\I\l\o\l\j\z\j\l\o\l\j\z\j\l\o\l\j\j\l\j\j\j\l\j\j\j\l\j}
      \hbox{\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\Z\I\L\O\Z\I\Z\O\L\O\Z\I\Z\O\I\Z\O\I\Z\O\L\O\z\j\z\o\j\z\o\j\z\o\l\o\z\j\z\o\l\o\z\j\l\o\z\j\l\o\z}
      \hbox{\z\z\z\j\l\o\z\j\l\z\z\j\l\o\z\j\L\Z\Z\I\L\O\I\L\Z\I\L\Z\I\I\L\I\L\I\I\L\Z\I\l\z\j\j\l\j\l\j\j\l\z\j\l\z\j\j\z\j\l\z\z\j\l\o\z\j\l}
      \hbox{\z\o\z\z\o\j\j\z\z\o\z\z\o\j\j\z\z\o\Z\Z\O\I\L\O\Z\Z\Z\O\Z\I\n\O\Z\O\I\Z\Z\O\z\z\o\j\z\o\z\o\l\o\z\z\z\o\z\j\j\z\z\o\z\z\o\j\j\z\z}
      \hbox{\z\j\l\o\l\j\l\j\z\j\l\o\l\j\l\j\z\j\L\O\L\I\Z\I\L\O\Z\I\L\O\n\L\I\I\L\I\Z\I\l\o\l\j\j\l\j\j\z\j\l\o\z\j\l\o\l\j\z\j\l\o\l\j\l\j\z}
      \hbox{\z\o\z\j\z\o\z\o\l\o\z\j\z\o\z\o\l\o\Z\I\Z\O\I\Z\O\I\I\Z\O\I\L\O\Z\I\Z\O\L\O\z\j\z\o\l\o\z\j\j\z\o\j\j\z\o\j\z\o\l\o\z\j\z\o\z\o\l}
      \hbox{\z\j\l\z\j\j\j\l\z\j\l\z\j\l\j\l\z\j\L\Z\I\I\L\I\L\I\L\I\L\I\Z\I\L\Z\I\L\Z\I\l\z\j\l\z\j\l\o\l\j\l\j\l\j\l\j\j\l\z\j\l\z\j\l\j\l\z}
      \hbox{\n\z\z\o\z\j\l\o\z\z\o\z\z\o\Z\Z\O\Z\Z\O\Z\I\Z\O\Z\O\Z\O\Z\O\L\O\Z\Z\O\Z\Z\O\z\z\o\z\z\o\z\j\z\o\z\o\z\o\z\o\l\o\z\z\o\z\z\o\z\z\o}
      \hbox{\z\j\z\j\l\o\z\j\l\o\l\j\z\j\L\O\L\I\Z\I\L\O\I\L\I\I\I\L\I\I\Z\I\L\O\L\I\Z\I\l\o\l\j\z\j\l\o\j\l\j\j\j\l\j\j\z\j\l\o\l\j\z\j\l\o\l}
      \hbox{\z\o\j\z\o\j\j\z\o\j\z\o\l\o\Z\I\Z\O\I\Z\O\I\L\O\Z\I\L\O\Z\I\I\Z\O\I\Z\O\L\O\z\j\z\o\j\z\o\j\l\o\z\j\l\o\z\j\j\z\o\j\z\o\l\o\z\j\z}
      \hbox{\n\j\l\j\l\j\l\j\l\j\j\l\z\j\L\Z\I\I\L\I\L\I\Z\I\L\Z\Z\I\L\O\L\I\L\I\I\L\Z\I\l\z\j\j\l\j\l\j\z\j\l\z\z\j\l\o\l\j\l\j\j\l\z\j\l\z\j}
      \hbox{\n\j\z\o\z\o\z\o\z\o\j\z\z\o\Z\Z\O\I\Z\O\Z\O\I\Z\Z\O\Z\Z\O\I\Z\O\Z\O\I\Z\Z\O\z\z\o\j\z\o\z\o\j\z\z\o\z\z\o\j\z\o\z\o\j\z\z\o\z\z\o}
      \hbox{\z\j\j\l\j\j\j\l\j\j\l\j\z\j\L\O\L\I\I\L\I\I\L\I\Z\I\L\O\L\I\I\L\I\I\L\I\Z\I\l\o\l\j\j\l\j\j\l\j\z\j\l\o\l\j\j\l\j\j\l\j\z\j\l\o\l}
      \hbox{\z\o\l\o\z\j\l\o\z\j\z\o\l\o\Z\I\Z\O\L\O\Z\I\Z\O\L\O\Z\I\Z\O\L\O\Z\I\Z\O\L\O\z\j\z\o\l\o\z\j\z\o\l\o\z\j\z\o\l\o\z\j\z\o\l\o\z\j\z}
      \hbox{\n\l\z\j\l\z\z\j\l\z\j\l\z\j\L\Z\I\L\Z\I\L\Z\I\L\Z\I\L\Z\I\L\Z\I\L\Z\I\L\Z\I\l\z\j\l\z\j\l\z\j\l\z\j\l\z\j\l\z\j\l\z\j\l\z\j\l\z\j}
      \hbox{\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z\o\z}
      \hbox{\n\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j\j\j\l\j}
      \hbox{\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z\j\l\o\z}
      \hbox{\z\z\z\j\l\o\z\j\l\z\z\j\l\o\z\j\l\z\z\j\l\o\z\j\l\z\z\j\l\o\z\j\l\z\z\j\l\o\z\j\l\z\z\j\l\o\z\j\l\z\z\j\l\o\z\j\l\z\z\j\l\o\z\j\l}
      \hbox{\z\o\z\z\o\j\j\z\z\o\z\z\o\j\j\z\z\o\z\z\o\j\j\z\z\o\z\z\o\j\j\z\z\o\z\z\o\j\j\z\z\o\z\z\o\j\j\z\z\o\z\z\o\j\j\z\z\o\z\z\o\j\j\z\z}
      \hbox{\z\j\l\o\l\j\l\j\z\j\l\o\l\j\l\j\z\j\l\o\l\j\l\j\z\j\l\o\l\j\l\j\z\j\l\o\l\j\l\j\z\j\l\o\l\j\l\j\z\j\l\o\l\j\l\j\z\j\l\o\l\j\l\j\z}
      \hbox{\z\o\z\j\z\o\z\o\l\o\z\j\z\o\z\o\l\o\z\j\z\o\z\o\l\o\z\j\z\o\z\o\l\o\z\j\z\o\z\o\l\o\z\j\z\o\z\o\l\o\z\j\z\o\z\o\l\o\z\j\z\o\z\o\l}
      \hbox{\n\j\j\n\j\j\j\j\n\j\j\n\j\j\j\j\n\j\j\n\j\j\j\j\n\j\j\n\j\j\j\j\n\j\j\n\j\j\j\j\n\j\j\n\j\j\j\j\n\j\j\n\j\j\j\j\n\j\j\n\j\j\j\j\n}
     }
\hskip -177.6pt \vbox{\hbox{$\bigcirc$ \hskip -11.5pt ${\scriptstyle R}$} \vskip116.2pt \hbox{ } }
\hskip  -86.4pt \vbox{\hbox{$\bigcirc$ \hskip -11.0pt ${\scriptstyle L}$} \vskip100.0pt \hbox{ } }
\hskip   58.9pt \vbox{\hbox{$\bigcirc$ \hskip -11.7pt ${\scriptstyle X}$} \vskip 80.2pt \hbox{ } }
\hskip  172pt
}  
$$

\vskip 1cm
$$
\vbox{
      \hbox{\tt ..o..oo....oo.oo.....oo.ooo....ooooooo..ooooo...ooo.o...o...o....o...}
      \hbox{
            \hskip 45.4pt    \hbox to 16.0pt{\leftarrowfill} $\!\!\mid\!\!$ \hbox to 16.0pt{\rightarrowfill}
            \hskip 30.3pt    \hbox to 16.0pt{\leftarrowfill} $\!\!\mid\!\!$ \hbox to 16.0pt{\rightarrowfill}
            \hskip 20.0pt    \hbox to 10.5pt{\leftarrowfill} $\!\!\mid\!\!$ \hbox to 10.5pt{\rightarrowfill}
            \hskip 15.0pt    \hbox to 10.5pt{\leftarrowfill} $\!\!\mid\!\!$ \hbox to 10.5pt{\rightarrowfill}
            \hskip 19.8pt    \hbox to 31.5pt{\leftarrowfill} $\!\!\mid\!\!$ \hbox to 31.5pt{\rightarrowfill}}
     }
$$

\vfill\eject
\footline={\tenrm\qquad John Skilling, Kenmare, Ireland\hfil\quad February 2004\qquad}

\centerline{\bigger BayeSys and MassInf}
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
{\bigger
\noindent To my intellectual ancestors the late Edwin T. Jaynes, and Steve Gull,  to my descendant Sibusiso Sibisi, 
and the many colleagues and friends over the past quarter-century who have inspired and encouraged the development of these ideas.}
\vskip 13cm
\noindent ${\rm BayeSys}^{TM}$ and ${\rm MassInf}^{TM}$ are trademarks of Maximum Entropy Data Consultants Ltd, 114c Milton Road, Cambridge, England.
Copyright of this manual and the accompanying source code modules listed in section 16 is assigned to Maximum Entropy Data Consultants Ltd.
These program modules are distributed in the public domain under the terms of the GNU Lesser General Public License (version 2.1) available from the 
Free Software Foundation Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.

\vfill\eject

\footline={\hss\tenrm\folio\hss}
\pageno=1
\centerline{CONTENTS}
\halign{    # \hfil& # \dotfill$\ldots\ldots\ldots\ldots\ldots\ $          &  # \cr
         PART 1.   &\ 1. Introduction                                      &\ 2 \cr
         OVERVIEW: &\ 2. Overview of Inference                             &\ 4 \cr
                   &\qquad 2.1. Inversion                                  &\ 4 \cr
                   &\qquad 2.2. Regularisation                             &\ 4 \cr
                   &\qquad 2.3. Probabilities                              &\ 5 \cr
                   &\qquad 2.4. Prior probabilities                        &\ 5 \cr
                   &\qquad\qquad 2.4.1. Atomic priors                      &\ 6 \cr
                   &\qquad\qquad 2.4.2. Coordinates                        &\ 7 \cr
                   &\qquad 2.5. Sampling                                   &\ 8 \cr
         PART 2.   &\ 3. Markov chain Monte Carlo (MCMC)                   &\ 9 \cr
         THEORY:   &\qquad 3.1. The number of atoms                        &\ 9 \cr
                   &\qquad 3.2. Coordinates                                & 10 \cr
                   &\qquad 3.3. R\^ole of likelihood                       & 12 \cr
                   &\qquad 3.4. Binary slice sampling                      & 13 \cr
                   &\ 4. Annealing                                         & 15 \cr
                   &\qquad 4.1. Selective annealing                        & 17 \cr
                   &\qquad\qquad 4.1.1. Imperfections                      & 18 \cr
                   &\qquad\qquad 4.1.2. Properties                         & 18 \cr
                   &\qquad 4.2. Comparison with statistical thermodynamics & 19 \cr
                   &\ 5. The BayeSys program                               & 21 \cr
                   &\qquad 5.1. The BayeSys prior                          & 22 \cr
                   &\qquad\qquad 5.1.1. Number of atoms                    & 22 \cr
                   &\qquad\qquad 5.1.2. Coordinates                        & 23 \cr
                   &\qquad 5.2. The BayeSys engines                        & 26 \cr
                   &\qquad\qquad 5.2.1. LifeStory1                         & 27 \cr
                   &\qquad\qquad 5.2.2. LifeStory2                         & 29 \cr
                   &\qquad\qquad 5.2.3. GuidedWalk                         & 30 \cr
                   &\qquad\qquad 5.2.4. Leapfrog1 and Leapfrog2            & 32 \cr
                   &\qquad\qquad 5.2.5. Chameleon1                         & 33 \cr
                   &\qquad\qquad 5.2.6. Chameleon2                         & 34 \cr
                   &\ 6. Massive Inference (MassInf)                       & 35 \cr
                   &\qquad 6.1. MassInf priors                             & 35 \cr
                   &\qquad 6.2. MassInf likelihood                         & 36 \cr
                   &\qquad\qquad 6.2.1.  Gaussian data                     & 37 \cr
                   &\qquad\qquad 6.2.2.  Poisson data                      & 37 \cr
                   &\qquad 6.3. MassInf flux unit                          & 37 \cr
                   &\qquad 6.4. MassInf fluxes                             & 38 \cr
                   &\ 7. Display of results                                & 39 \cr
         PART 3.   &\ 8. BayeSys prior parameters                          & 40 \cr
         PRACTICE: &\ 9. BayeSys algorithm parameters                      & 41 \cr
                   & 10. BayeSys structures                                & 42 \cr
                   & 11. User procedures                                   & 44 \cr
                   & 12. UserMonitor                                       & 45 \cr
                   & 13. MassInf prior parameters                          & 46 \cr
                   & 14. MassInf likelihood settings                       & 47 \cr
                   &\qquad 14.1  Gaussian data                             & 48 \cr
                   &\qquad 14.2  Poisson data                              & 48 \cr
                   &\qquad 14.3. MassInf with BayeSys                      & 48 \cr
                   & 15. Using BayeSys                                     & 49 \cr
                   & 16. Program files                                     & 50 \cr
                   & REFERENCES                                            & 51 \cr
                   & INDEX                                                 & 53 \cr
       }
\vfill\eject

\centerline{\bigger PART 1. OVERVIEW}
\bigskip
\noindent{$\underline{\hbox{\bf{Section 1. Introduction}}}$}
\bigskip

The second half of the $20^{\rm th}$ century saw a revolution in methods of inference. 
On the theoretical side was the rise of ``Bayesian'' probabilistic analysis,
and on the practical side was the development of computer hardware along with the exploration algorithms to use it.
At times in those decades, the theoretical arguments became heated to a degree more often associated with the darker side of organised religion,
with the guardians of doctrinal orthodoxy ranged against the Bayesian rationalists to the bemusement of ordinary scientists who simply wanted to analyse their data.
The dispute ought to have been settled by the paper of Richard Cox (1946), 
which proved that straightforward probabilities are the only allowable method of consistent inference.
But it was not.
Edwin Jaynes (2003) presents the Cox proof in compelling detail in chapter 2 of his book (posthumously edited by Larry Bretthorst),
and also gives the history (chapter 16) and pathology (chapter~17) of orthodox statistics, with the stylistic flair of the wartime correspondent that, in a way, he was.
He who engages with irrationality should expect an unreasoning response, 
and sadly that is what Jaynes and like-minded colleagues all too often received for their efforts.

Yet Nature has a way of educating us.
In inference as in engineering, practical power comes from obeying the laws, and working in sympathy with them.
Null hypotheses, confidence intervals and the like cannot cope with the complexity of modern problems.
Probabilistic analysis demonstrably does.
It is this wide-ranging power, more than logical argument, that is convincing scientists at large that the Bayesian approach should be used.
Sivia (1996) gives a good introduction to Bayesian data analysis from a scientist's perspective.
Though, to be fair to the orthodox school, it wasn't much use knowing what one should do if one didn't know how.
The practical development of Bayesian methods can be conveniently dated to Metropolis {\it et.al.} (1953), 
who presented a Monte Carlo exploration algorithm having a real prospect of dealing with large problems.
Hastings (1970) extended and generalised this work, and nowadays most serious inference algorithms follow the Metropolis-Hastings approach.
The manufacture of ever-larger computers has allowed application to ever-harder problems, 
which have catalysed the development of a bewildering array of algorithmic developments.
Brooks (1998) gives a clearly written review, and MacKay (2003) gives a professional introduction to Monte Carlo methods from a wider perspective.

The defining feature of large problems is that they have a large number of parameters, viewed geometrically as a dimensionality.
Large dimensionality has several awkward properties: most directions are nearly orthogonal to the one you want, 
the interesting domain you seek is usually exponentially tiny,
and it is likely to have a peculiar shape as well.
It is a lot harder to program in $n$ dimensions than in 1.
Yet $n$ dimensions can be mapped to 1 by using a space-filling curve.
Specifically, the Hilbert curve (1891) uniformly covers the interior of a $n$-dimensional cube, whilst preserving a useful degree of locality.
Using it, $n$ coordinates can be encoded into a single number, albeit one with extended precision so that accuracy is preserved.
Curves like this have been considered something of a curiosity, 
and have attracted only a few quirky applications (Abend, Hartley \& Kanal 1965, Bially 1969, Stevens, Lehar \& Preston 1983, Song \& Roussopoulos 2002).
Indeed, to anyone classically trained in differential calculus and continuum mathematics, space-filling curves do look rather odd.
However, from a computational point of view, using a Hilbert curve merely amounts to shuffling and re-defining the bits representing coordinate values, 
which is not particularly peculiar.

Of course, there is a price to pay.
A function that is smooth and straightforward in two, three or more dimensions will look sharp and jagged when stretched out along a Hilbert curve.
On the other hand, a function that is twisted and torn in several dimensions doesn't look appreciably worse.
So, if we can work in one dimension at all (and we can), we may hope to be able to work with a wide class of functions that have traditionally been regarded as difficult.
Indeed, I suggest that the ability to alter the dimensionality of a problem at will is a powerful and under-appreciated tool, 
which we should be able to use with advantage when exploring spaces of high dimension.
Such an approach implies a change of focus, away from the lines and curvatures of geometry, and towards the connected-ness of topology.

The BayeSys program (an acronym for {\bf Baye}sian {\bf Sys}tem, pronounced ``basis'') is built around this basic idea.
Its aim is to give you ``typical'' samples of your entire object of interest, 
which the program locates by using the object's probabilistic fit to your data, known as its likelihood.
It also calculates the numerical ``evidence'' that quantifies how well your modelling of objects managed to predict your data.
BayeSys supplies locations, you supply their likelihoods: it's just about as simple as that.
The program includes a variety of exploration procedures which I call ``engines'', all of which exploit Hilbert coding to reduce the dimensionality.
Using a space-filling curve as a central theme is unconventional and distinctive:
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -