⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 documentation.tex

📁 The library is a C++/Python implementation of the variational building block framework introduced in
💻 TEX
📖 第 1 页 / 共 3 页
字号:
%% This file is a part of the Bayes Blocks library%% Copyright (C) 2001-2003 Markus Harva, Antti Honkela, Alexander% Ilin, Tapani Raiko, Harri Valpola and Tomas 謘tman.%% This program is free software; you can redistribute it and/or modify% it under the terms of the GNU General Public License as published by% the Free Software Foundation; either version 2, or (at your option)% any later version.%% This program is distributed in the hope that it will be useful,% but WITHOUT ANY WARRANTY; without even the implied warranty of% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the% GNU General Public License (included in file License.txt in the% program package) for more details.%% $Id: Documentation.tex 4 2006-10-26 07:23:55Z ah $%\documentclass[a4paper]{article}\usepackage{url}\usepackage{tabularx}\usepackage{longtable}\usepackage{amsmath}\usepackage{graphicx}\newcommand{\PBS}[1]{\let\temp=\\#1\let\\=\temp}\newcommand{\vect}[1]{\mathbf{#1}}\newcommand{\vects}[1]{\boldsymbol{#1}}\title{Researcher's guide to Bayes Blocks library}\author{Markus Harva, Antti Honkela, Alexander Ilin, \\  Tapani Raiko, Harri Valpola and Tomas \"Ostman}\begin{document}\maketitle\thispagestyle{empty}\clearpage\pagenumbering{arabic}\tableofcontents\clearpage\parskip 1ex\parindent 0cm\section{Introduction}This package is supposed to be a flexible tool for building a widevariety of latent variable models by combining simple building blocks.After the connections have been defined, the computations areautomatic.  The theory is explained in \cite{valpola_raiko}.At the moment, Gaussian, rectified Gaussian, and mixture-of-Gaussian variables,summation, addition, two types of nonlinearity anddiscrete variables have been implemented.For computational efficiency there are vectorised nodes in addition toscalar nodes.  There is a delay to support connections ofdifferent time instances of vectorised nodes.  There is also supportfor on-line learning.\section{Restrictions on the structure}In general, the network has to be a directed acyclic graph (DAG).  Thedelay nodes are an exception because the past values of any node canbe the parents of any other nodes.  The violation is not real in thesense that if the vectors were split into scalar nodes, the resultingnetwork would be a DAG.For computational efficiency each node assumes its inputs to beindependent.  This means that a latent variable cannot propagate itsvalue to another variable (observed or latent) through more than onepath.  If there were multiple paths, the values through these pathswould be dependent.  This restriction can be overcome by addingintermediary variable nodes.Self-delays for vectorised latent variable nodes are forbidden forcomputational efficiency.  There are special nodes with built-inself-delays for overcoming this restriction.\section{Description of the nodes}\label{sec:nodes}Currently all nodes except Discrete and DiscreteDirichlet have continuous values.The following nodes are in use:\begin{itemize}\item  Constant, ConstantV,\item  Evidence, EvidenceV,\item  Prod, ProdV, Sum2, Sum2V, SumN, SumNV, Rectification, RectificationV, DelayV, Relay%  (computational nodes)\item  Gaussian, GaussianV, DelayGaussV, RectifiedGaussian, RectifiedGaussianV, MoG, MoGV, GaussNonlin, GaussNonlinV, GaussRect, GaussRectV,Discrete and DiscreteV,DiscreteDirichlet, DiscreteDirichletV,Dirichlet%  (variable nodes)\item  Memory, OLDelayS, OLDelayD (for on-line learning)\item  Proxy (building temporal structures that would seemingly violate  the DAG property)\end{itemize}\subsection {Constant nodes}Constant has no parents.  The output value is a scalar.  Posteriorvariance is zero.  The value is a double given for the constructor.ConstantV is similar but the value is a vector.\subsection{Evidence nodes}Evidence has one parent and no children.  It is meant for providingfading clamps for initialising the parameters of the model.  The nodemust be clamped with two values, one for mean and one for ``variance''of the observation.  The clamp is decayed by increasing the varianceparameter after subsequent iterations.  EvidenceV is similar but thevalue is a vector.\subsection{Computational nodes}\subsubsection{Prod}Prod has two scalar parents which are given for the constructor.  Theoutput value is scalar and is the product of the parents.\subsubsection{Sum2}Sum2 has two scalar parents which are given for the constructor.  Theoutput value is scalar and is the sum of the parents.\subsubsection{SumN}SumN is like Sum2 but has $N$ scalar parents which are given one by one bycalling the \texttt{AddParent} method of the node.\subsubsection{Rectification}This node is used in conjunction with GaussRect, which is itsonly parent given in the constructor.See section \ref{sec:gaussrect}.\subsubsection{DelayV}DelayV delays a vector.  The first value is a special case and isgiven separately as the first parent of the node.  It is a scalarvalue.  The other parent is the vector to be delayed.  To give it as aparameter to the constructor, you will probably want to use a\texttt{Proxy} node to create a structure that would seemingly createa cycle in the network.\subsubsection{Relay}Relay passes the value as it is.  The purpose of this node is tomake the role of a parent identifiable when a node is used formore than one purpose.Example: node X has mean Y and variance Y.  Node Y is now usedfor two purposes and computation of gradient does not work because therole of Y cannot be identified.  Y calls X twice and X should firstassume the mean is calling and second assume the variance is calling.Instead X assumes mean is calling both times.  This can be fixed byadding a relay node Z = Y and then telling X its mean is Y and varianceZ.  This could be automated later on: a relay is needed if a node hasidentical parents for different roles.Currently Relay can be used for scalar nodes only, but extension tovectors would be straight-forward.  To be added as soon as needed.\subsubsection{ProdV, Sum2V, SumNV, RectificationV}These nodes are like their counterparts without V but the output valueis a vector.  Parents can be either scalars or vectors.\subsection{Variable nodes}Any variable node can be latent (hidden, unknown). Some variablenodes can be observed in which case the observations can be set byclamping.\footnote{The term clamping is used in biological  neurosciences: clamping a potential of a cell means keeping the  potential fixed by injecting a suitable current.}When direct clamping is not supported (it doesn't make sense insome cases) similar effect can be achieved using evidence nodes.\subsubsection{Gaussian}Gaussian is a variable with a Gaussian prior distribution.  The meanand variance parents are given for the constructor (mean is first,variance second).  Variance is parametrised by $\operatorname{Var} = \exp(-v)$, where $v$ is the value of the second parent.  Output is a scalar.\subsubsection{RecfifiedGaussian}RectifiedGaussian is a Gaussian with zero probability masson the negative axis and with the positive axis scaled appropriately.\subsubsection{MoG}MoG is a variable consisting of a mixture of $K$ Gaussians.It has $2K + 1$ parents: one discrete variable, givenin the constructor and $K$ mean and variance parents.The component means and variances are given to the nodeafter construction using the \texttt{AddComponent} method.\subsubsection{GaussNonlin}GaussNonlin is like Gaussian, but the output is a nonlinear function$\exp(-s^2)$ of the variables internal value $s$. Warning: if theprior mean is zero, there is a local minimum at the internal meanzero.\subsubsection{GaussRect}\label{sec:gaussrect}GaussRect is like Gaussian but it can be followedby a rectification nonlinearity $f(s) = \max(s, 0)$.The node should have at least one children, the Rectificationnode, which is followed by the children that receivethe values of the Gaussian variable rectified.GaussRect can also have children that receive the values of the Gaussianvariable without rectification. In that case the GaussRect node is directlyused as the parent of the children. This is especially usefulwhen building dynamical models.\subsubsection{Discrete}Discrete is a variable with discrete distribution over small integers.Its prior is given by soft-max distribution of Gaussians $c_i$:$p(d = i) = \exp(c_i) / \sum_j \exp(c_j)$.\subsubsection{DiscreteDirichlet}DiscreteDirichlet is a discrete variable like Discrete, but withDirichlet prior for its weights. It has the Dirichlet variableas its only parent. When possible, this node should be favoured against the Discrete, since the posterior approximations are more accurate.\subsubsection{Dirichlet}Dirichlet variable is used as the parent for DiscreteDirichlet%. It takes a ConstantV as its parentwhich specifies the prior observation counts.\subsubsection{GaussianV, RectifiedGaussianV, MoGV, GaussNonlinV,GaussRectV, DiscreteV and DiscreteDirichletV}These nodes are like their counterparts without V but their valuesare vectors. All parents can be scalars or vectors.\subsubsection{DelayGaussV}DelayGaussV is like GaussianV but the node includes a self-delay.  Theparents are ($m$, $v$, $a$, $m_0$, $v_0$).  If the value of the nodeis $s(t)$, the mean is $m(t) + a(t) s(t-1)$ and variance is$\exp(-v(t))$ for $t > 1$.  The prior for the first value ($t = 1$) isgiven by $m_0$ and $v_0$.\subsection{Special nodes for on-line learning}\subsubsection{Memory nodes}In on-line learning, some of the nodes are time-dependent while othersare time-independent.  Memory node acts as an adapter between thesetwo.  It stores the gradients from time-dependent nodes.Note that time-dependent and time-independent nodes must not beconnected together without the memory node between them: thetime-independent node will be silently converted into a time-dependentnode and the conversion propagates to neighbours of the newlyconverted node.\subsubsection{OLDelayS, OLDelayD}The delay nodes are used to implement temporal models with on-linelearning.  They act as normal unit-delays, \texttt{OLDelayS} forscalars and \texttt{OLDelayD} for discrete values.The delay nodes have two parents.  The first tells the ``initial''value of the delayed value and it is only used when the node isinitially constructed.  The second parent is the normal delayed value.The two parents of the delay node may also both be the same node.  Youwill probably want to use a \texttt{Proxy} node as the second parentof a delay node.\subsection{Proxy node}\texttt{Proxy} is a special node that is allowed to seemingly violatethe DAG structure of the network.  Naturally these violations must notbe real, but rather have a \emph{delay} node somewhere in the loop.The proxy works as a placeholder parent for other (mainly delay) nodesand only connects to the true parent after the whole structure hasbeen created.When a \texttt{Proxy} is created, in addition to its own label, itonly takes the label of its future parent as a parameter.Before a proxy is connected to the true parent it blindly accepts allrequests for different values but keeps a record of what it haspromised.  Once the proxy connects to the parent, usually as a resultof a call to \texttt{ConnectProxies} in the \texttt{Net}, it checkswhether the true parent can provide all the values it has promised andcomplains if this is not the case.  After this, the proxy simplyforwards all the requests to its true parent.\section{C++ library}The library provides the basic operations such as the creation ofnodes, updates, (un)clamps, evaluations of the cost function, queriesof the values of nodes, saving (and loading in the future), pruningetc.Each node belongs to a network (class \texttt{Net}) which maintainsthe nodes.  Each node has a string label which can be used to get thepointer to the node.\subsection{Class \texttt{Net}}This is the class which contains the nodes and provides functions forupdating the net, evaluating the cost functions etc.  The constructoris given the dimension of the vector nodes used in the network.  Notethat all vectors in one network need to be of equal length.Net takes care of deleting the nodes.  When net is deleted it deletesall nodes it contains.  See \ref{sec:die} for information aboutremoving nodes.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -