📄 documentation.tex

📁 The library is a C++/Python implementation of the variational building block framework introduced in
💻 TEX
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
(double scalar set), \texttt{DV} (double vector), \texttt{DD}(discrete distribution), \texttt{VDD} (vectorised discretedistribution), \texttt{DVH} (double vector handle), \texttt{VDDH}(vectorised discrete density handle) and \texttt{DFlags} (doubleflags).\subsubsection{Class \texttt{DSSet}}\label{sec:DSSet}This is a set of scalar expectations.  Currently there are members\texttt{DSSet.mean}, \texttt{DSSet.var} and \texttt{DSSet.ex} whichstand for mean, variance and exponential.  \texttt{DSSet} is used forasking the scalar values of nodes.  It is also the structure used forthe gradient of scalar values.\subsubsection{Class \texttt{DV}}This is the basic data structure for double vectors.  The user mainlyneeds it for clamping the values of vectorised continuous valuedvariables.\subsubsection{Class \texttt{DVSet}}\label{sec:DVSet}This is the vectorised equivalent of \texttt{DSSet}.  It is used forthe gradients of scalar vectors and is usually not needed by the user.\subsubsection{Class \texttt{DD}}\label{sec:DD}The user needs this class for clamping discrete nodes.For now, \texttt{DD} is just a wrapper for a \texttt{DV} of suitablesize.  There are plans for adding support for sparse distributions,but the exact way of doing this is still unclear.  The most importantoperation provided by \texttt{DD} is indexing provided by the operator``[]''.\subsubsection{Class \texttt{VDD}}\label{sec:VDD}This is the vectorised equivalent of \texttt{DD}.It is needed for clamping vectorised discrete nodes.\subsubsection{Class \texttt{DVH}}\label{sec:DVH}This is a structure which can hold either \texttt{DSSet} or\texttt{DVSet}.  It is used for asking the value of vector nodes.This data structure makes it possible for vector nodes to have eithervector or scalar parents since scalar nodes can also be requested avector.  \texttt{DVH} can keep track whether the value it contains isa vector or scalar.  The value can be accessed by indexing methods\texttt{DVH.Mean(i)}, \texttt{DVH.Var(i)} and \texttt{DVH.Exp(i)}.  Ifthe value of \texttt{DVH} is scalar, these methods return the scalar,otherwise they return the appropriate element of the vector.  This waythe child does not need to know whether the parent was vector orscalar.\subsubsection{Class \texttt{VDDH}}\label{sec:VDDH}The relation between \texttt{VDDH} and \texttt{VDD} is mostly the sameas that between \texttt{DVH} and \texttt{DV}.  In \texttt{VDDH} alsothe scalar value is a pointer to \texttt{DD}.  Neither of thesestructures must be deallocated as they are internal to the node whosevalues they represent.\subsubsection{Class \texttt{DFlags}}\label{sec:DFlags}This is a set of boolean attributes, currently \texttt{DFlags.mean},\texttt{DFlags.var} and \texttt{DFlags.ex}.  It is used for indicatingwhich expectations are requested when asking the value of a continuousvalued node.\section{Python interface}The Python interface is generated directly from the C++ header filesby SWIG (Simple Wrapper and Interface Generator)\footnote{For more  information, see \url{http://www.swig.org}}.  The basic idea is thatall (or rather most) of the C++ classes have been wrapped so that theyappear standard Python classes and can be accessed as such.\subsection{class PyNet}This class is derived from the C++ class Net.  Useful functions havebeen added.\subsubsection{Sum trees}The methods \texttt{BuildSum2Tree} and \texttt{BuildSum2VTree} build asum tree from the list of nodes given as argument.  The otherarguments form the base for label.  The labeling works so that theroot node gets exactly the label given to the function and other nodesget labels of the form \texttt{base\_leaf(i0, i1, ..., in)}.\subsubsection{Pruning}The method \texttt{TryPruning} tests whether pruning a variable isbeneficial and removes it if the cost function is predicted to decrease.The function can handle a list of nodes to test.\subsubsection{Accessing nodes}The methods \texttt{GetNodes} and \texttt{GetVariables} return listsof nodes or variables which match the regular expression given as thesecond argument.  The first argument is the net.\subsubsection{Advanced updates}\label{sec:advanced-updates}The method \texttt{UpdateAllDebug} does the updates as standard\texttt{UpdateAll} but also checks that none of the node updatesincreases the value of the cost function.  This is useful fordebugging the update methods of the nodes.The method \texttt{UpdateAllHookeJeeves} does one round ofoptimisation according to the Hooke-Jeeves algorithm.  It first doesthe standard updates for all variables, but then takes another step inthe same direction defined by the individual updates.  The length ofthis additional step is defined by performing a line search on thecost function values in that direction.  The method also uses helperfunctions \texttt{RepeatAllSteps} and \texttt{FindOptimalStep}.\subsubsection{Miscellaneous}The method \texttt{GetConst0} returns a constant node with value zero.Its label is const0 and it is created if it does not already exist.\subsection{Helpers.py}This file contains useful functions.\subsubsection{Sum trees}The functions \texttt{AddSum2} and \texttt{AddSum2V} add a new entryto an existing sum tree or create a new tree.  The argument list isnet, rootnode, parind, newpar, label.  The sum tree is in noderootnode in parent number parind.  Newpar is the new node in the sumtree and label is the label for the \texttt{Sum2} of \texttt{Sum2V}node to be created.  The function \texttt{FindLeaf} is used forfinding the correct entry point.  Caution: there are currently nochecks for the DAG property.\subsubsection{Pruning}The function \texttt{CostDifference} returns a prediction of thechange in cost function if a given variable node is removed from thenetwork.  Negative value means the cost function will decrease andthus removal of the variable is beneficial.\subsubsection{Adding a weighted connection}The function \texttt{AddWeight} creates a new weighted connection fromone node and adds it to the sum tree of another vector node.  It thentests whether the newly created connection is useful using\texttt{TryPruning} and returns the new weight if it was found useful.The arguments are: net, input node, output node, label of the weight,mean of the weight, variance of the weight and optionally additionalcost.  It is often useful to use negative cost because the importanceof a new weight is easily under-estimated.\subsubsection{Loading data}The function \texttt{LoadAsciiData} returns a list of lists of thenumbers in the given data file.\subsubsection{Accessing nodes}List of parents and children of a node can be obtained by thefunctions \texttt{ParentList} and \texttt{ChildList}.  The values canbe queried using the functions \texttt{GetMean}, \texttt{GetVar} and\texttt{GetExp}.  They can handle lists of nodes.\subsection{Data types}SWIG does not understand operator overloading and therefore some ofthe indexing methods in the data types of C++ library do not alwayswork.  The indexing methods have been added to classes \texttt{DV} and\texttt{DD}.  Indexing of a VDD is not yet supported due todifficulties in multidimensional indexing.%% %% \subsection{Clamping}%% %% Python does not understand function overloading and therefore the C++%% library method \texttt{Clamp} comes in five flavors: \texttt{Clamp}%% (for doubles), \texttt{ClampDV}, \texttt{ClampDD}, \texttt{ClampVDD}%% and \texttt{ClampInt}.%% \section{Examples}There are two small examples of the usage of the library in the maindirectory:\begin{description}\item[\texttt{main.C}:] Factor analysis with dynamic model for factors\item[\texttt{main.py}:] Factor analysis with pruning\end{description}\section{Technicalities}\subsection{Delayed variable check using Proxy nodes}In principle all nodes check that their parents are of the requiredtype when they are created.  This meansthat the parents must be created before their children.  As thenetwork is a DAG, this should in theory always be possible.Unfortunately the situation is not as simple in practice as vectorisednodes can cause problems in temporal models.  In such cases oneelement of a vector somehow affects the model of the next element andthe vector needs a delayed version of itself as a parent.Such cases can be handled with the Proxy node.  A proxy can be createdwithout explicit reference to its parent, only a label of the futureparent.  When the proxy is further used as a parent to other nodes, itstores information of what it has been asked for.  When theconstruction of the network is complete, all the proxies can beconnected to their actual parents by calling the method\texttt{ConnectProxies} in the corresponding network.  As part of thisprocess, the proxies now check that their parents really can returnthe values they are required to.Using a proxy creates a loop in the otherwise acyclic network graph.The method \texttt{SortNodes} in Net handles this by ignoring theparents of proxies.\subsection{Persist}\label{sec:persist}The variable \texttt{persist} has flags which indicate the cases in whichthe node should die.  Setting a flag allows dying.  The dying canoccur only when the node gets a notification of the death of anothernode and it turns out to be either a parent or child.The flags have the following meanings:\begin{enumerate}\item[1] Die if a parent died\item[2] Die if there are no parents (all died or otherwise)\item[4] Die if there are no children (all died or otherwise)\item[8] Die if there is only one parent.  Replace yourself with the  parent in children.\end{enumerate}By default the constant has \texttt{persist = 0} which means it neverdies.  Functions have the value 5 (= 1 + 4) which means that they needat least one child and all parents which they initially had.Variables have the value 1 which means that none of their parents mustdie.  It may be useful to set the value to 5 for latent variables.Exceptions are the \texttt{Switch} function nodes which never die(value 0) and \texttt{Sum2} and \texttt{Sum2V} function nodes whichhave the value 12 (= 4 + 8).  They need at least one child and cut offif one of the parents dies.  Cutting off means that before dying thenode replaces the pointers to itself from its children by the pointersof its remaining parent.  In other words, A + B $\rightarrow$ C willbe replaced by A $\rightarrow$ C if B is killed.\subsection{Evidence node}The evidence node creates the virtual clamp by adding an artificialextra term to the cost function and providing gradients to its parentaccording to that.  The extra term is essentially the same as thatresulting from a observed Gaussian with single observation for theclamped value, mean given by the parent and variance given by theother value clamped to the node.  As the network is updated and thedecay increases the clamped variance, the clamp becomes weaker andgives smaller gradients.  After the clamp has decayed completely, theevidence node kills itself.  The time when the variance is decayedcan be controlled by hooking the node to the appropriate decay hookas described in Sec.~\ref{sec:decay-hooks}.\subsection{Memory node}\subsection{SWIG}\label{sec:swig}SWIG operates by taking an interface file (e.g.\ \texttt{Net.i}) whichresembles standard C or C++ header file but with some additionaldirectives.  The ``compiler'' turns this file into a C/C++ source file(e.g.\ \texttt{Net\_wrap.C}) and a corresponding Python source file(e.g.\ \texttt{Net.py}) containing the definitions of the wrappers andthe shadow classes.\section{Error messages}Most messages should be self explanatory, such as ``Double clamp notallowed'', ``Expected a vector parent in DelayGaussV'' althoughsometimes the nodes just complain ``Wrong type of parents inGaussNonlinV''.When \texttt{PyNet.UpdateAllDebug} is used, it will complain if theupdates of some of the nodes result in increase of the cost function.Sometimes there can be a cryptic error message as follows: (assumewe are updating a Gaussian node s(29))\begin{verbatim}s(29) var: M=2.43445; V=0.02442; GEX=3.43523; GVA=34.24; GME=-43: diff = 3.424\end{verbatim}It means that for some reason the minimisation of cost function failedin Gaussian node, but the node knows that.  The reason is an illbehaved gradient which is typically caused by\begin{enumerate}\item degenerate solution (e.g.\ variances have gone very small)\item invalid connections (e.g.\ multiple paths from a latent variable  to another variable)\item bug in gradient computation\end{enumerate}If options 1 and 2 can be ruled out, please send a bug report.The optimisation in Gaussian nodes is not fool-proof so it is possiblethat in some rare cases the computations genuinely fail, but so farthere has always been some other reason for the error message.  If youencounter one of these rare cases, we may improve the minimisation.The reason why it has not been improved so far is that this errormessage is often a useful indication of something else being suspicious.\begin{thebibliography}{10}\bibitem{valpola_raiko}H.~Valpola, T.~Raiko, and J.~Karhunen, ``Building blocks for hierarchical  latent variable models,'' in {\em Proc. Int.\ Conf.\ on Independent Component  Analysis and Signal Separation (ICA2001)}, (San Diego, USA), pp.~710--715,  2001.\end{thebibliography}\end{document}
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -