📄 documentation.tex

📁 The library is a C++/Python implementation of the variational building block framework introduced in
💻 TEX
📖 第 1 页 / 共 3 页
字号:
\subsubsection{Connecting the proxies}The method \texttt{ConnectProxies} connects all the proxies to theirtrue parents.  It can safely be called several times as it does notaffect previously connected proxies.\subsubsection{Updates}The method \texttt{UpdateAll} updates all variables of the net.  Asingle variable node can be updated by the method \texttt{Update} (see\ref{sec:nodeup}).For memory nodes the method \texttt{Update} propagates time.Therefore \texttt{UpdateAll} should not be used in on-line learning.See \ref{sec:online} for more details.\subsubsection{Decay hooks}\label{sec:decay-hooks}The network supports so called decay hooks which allow fine control ofdecay of variance of the Evidence nodes.  The hooks are called by namewhich is a string.  \texttt{Decayer}s, which means so far just\texttt{Evidence} and \texttt{EvidenceV}, can be hooked to the desiredhook with method \texttt{RegisterDecay}.There are standard hooks for \texttt{UpdateAll},\texttt{UpdateTimeDep}, \texttt{UpdateTimeInd} and \texttt{StepTime}which are processed at the end of corresponding functions.Additionally the user can specify his or her own hooks simply bygiving a new string as a name of the hook.  In this case, the hookmust be activated by hand by calling \texttt{ProcessDecayHook} withthe name of the hook as an argument.  Items can be removed from thehooks by methods \texttt{UnregisterDecayFromHook} for single hook and\texttt{UnregisterDecay} for removing an item from all the hooks, butthis is handled automatically when the evidence nodes die.\subsubsection{Evaluation of the cost function}The method \texttt{Cost} evaluates the cost function.  Each node has amethod which gives the cost for that node only.\subsubsection{Finding nodes and variables}The methods \texttt{GetNode} and \texttt{GetVariable} take a label asan argument and return the pointer to the node or variable with thematching label.The methods \texttt{GetNodeByIndex} and \texttt{GetVariableByIndex}take an index as an argument and return the pointer to thecorresponding node or variable.  The number of nodes and variables canbe obtained by the methods \texttt{NodeCount} and \texttt{VariableCount}.\subsubsection{Saving and loading}The method \texttt{SaveToXMLFile} saves the network in XML text formatwhile \texttt{SaveToMatFile} uses Matlab format (the second argumentis the name of the variable to be saved in the file).  A single nodecan be saved using \texttt{SaveNodeToXMLFile}.  This can be useful fordebugging purposes.  In addition to these, the function\texttt{SaveToPyObject} saves the network to a Python object.  Thiscan later be saved to a file using Pickle.\begin{verbatim}  void SaveToXMLFile(string fname);  void SaveToMatFile(string, string);  void SaveNodeToXMLFile(string fname, Node *node);  PyObject *SaveToPyObject();\end{verbatim}The library supports loading the network from a Matlab file or aPython object.  The function \texttt{LoadFromMatFile} loads thenetwork from given Matlab file (second string argument giving the nameof the network) while \texttt{CreateNetFromPyObject} ``loads'' thenetwork from given Python object.The functions using Python objects can also be used for cloning thenetwork:\begin{verbatim}temp = orignet.SaveToPyObject()copynet = CreateNetFromPyObject(temp)\end{verbatim}\subsubsection{Pruning}\label{sec:cleanup}When nodes are killed they need to be explicitly removed using themethod \texttt{CleanUp}.  See \ref{sec:die}.\subsubsection{Miscellaneous}The method \texttt{Time} returns the length of the vectors (usuallytime is the vectorised dimension).  The methods \texttt{NotifyDeath},\texttt{Save}, \texttt{AddNode} and \texttt{AddVariable} are mainlyfor internal use.  Methods \texttt{SetDebugLevel} and\texttt{GetDebugLevel} can be used to set the level of debuggingmessages required.  Higher levels mean more messages.  The defaultlevel is 0.\subsection{Class \texttt{NodeFactory}}The \texttt{NodeFactory} is used to create new nodes for the network.The constructors of different node classes can only be invoked by a\texttt{NodeFactory} that makes sure that the ownership of the nodesbelongs to the network the nodes are part of.  The\texttt{NodeFactory} has methods of type \texttt{GetXXX} for creatinga node of type XXX.  They generally take the label of the node andreferences to its parents as arguments and return a pointer to a newlycreated node.  The constructor of \texttt{NodeFactory} takes the\texttt{Net} it will create nodes to as a parameter.\subsection{Class \texttt{Node}}\subsubsection{Creation}A node is created by calling the appropriate \texttt{GetNode} methodof a \texttt{NodeFactory}.  The arguments areusually the label and the parents of thenode.  \texttt{Constant} and \texttt{ConstantV} nodes do not haveparents and instead the value is given.  The \texttt{Proxy} node isgiven only the label of its future parent that is connected later bycalling the method \texttt{ConnectProxies} from \texttt{Net}.\subsubsection{Clamping and unclamping}The value of a Gaussian or Discrete node can be set by the method \texttt{Clamp}.This tells the network that the node is observed.  The node can againbe made unobserved (latent, hidden, unknown variable) by the method\texttt{Unclamp}.In a factor analysis type latent variable models, for instance, it isuseful to initialise the linear mapping with random values in order tomake the learning start.  This can be done using \texttt{Clamp}.  Oncethe factors have some nonzero values the linear mapping can be\texttt{Unclamp}ed. Alternatively the initialisation can be doneusing evidence nodes. If learning would be started withoutinitialisations, the factors and linear mapping would all be zero andnone of them would adapt because the whole system would be trapped inthe local minimum where everything is zero.\subsubsection{Expectations}\label{sec:expectations}Each continuous node can be asked for its value using the method\texttt{GetReal} for scalars and \texttt{GetRealV} for vectors.  Thevalue is returned in the first argument which is \texttt{DSSet} for\texttt{GetReal} and \texttt{DVH} for \texttt{GetRealV} (see\ref{sec:DSSet} and \ref{sec:DVH}).  The second argument is of class\texttt{DFlags} and indicates which expectations are requested (see\ref{sec:DFlags}).These two methods are typically used by the children of a node toinquire the expectations required by the computation of gradients andcost function.NOTA BENE: the expectations which are not specifically requested (noflags set in \texttt{DFlags}) may be left to their original values.The corresponding methods for discrete nodes are \texttt{GetDiscrete}and \texttt{GetDiscreteV}.  They take only one argument as there areno parameters to the request.  The value is returned in the firstargument which is a pointer to \texttt{DD} for \texttt{GetDiscrete}and \texttt{VDDH} for \texttt{GetDiscreteV} (see \ref{sec:DD} and\ref{sec:VDDH}).NOTA BENE: the structure returned by GetDiscrete \emph{must not} bemodified or destroyed by the caller as it is shared with the nodereturning it.All the \texttt{Get*} methods return a boolean value indicating whetherthe requested expectation could be returned.\subsubsection{Gradients}\label{sec:gradients}A parent of a node asks the gradient w.r.t.\ the continuousexpectations using the methods \texttt{GradReal} or \texttt{GradRealV}depending whether the parent is a scalar or vector.  Both methods taketwo arguments.  The first one is \texttt{DSSet} for \texttt{GradReal}and \texttt{DVSet} for \texttt{GradRealV} (see \ref{sec:DSSet} and\ref{sec:DVSet}).  The second argument is the pointer of the parentwhich is asking the gradient.  It is used for identifying the role ofthe parent.The gradients of the children of a node can be asked by the methods\texttt{ChildGradReal} and \texttt{ChildGradRealV}.  They lack thesecond argument but are otherwise similar to the above methods.NOTA BENE: the methods which give gradients do not initialise thefirst argument which is used for returning the value.  Instead thegradient is added to the previous value.  If the vector valuedgradient requests do not touch a particular gradient w.r.t.\ anexpectation, the vector for this gradient will be uninitialised.The corresponding functions for the components of a discretedistribution are \texttt{GradDiscrete} and \texttt{GradDiscreteV}.Its first argument is a \texttt{DD} for \texttt{GradDiscrete} and\texttt{VDDH} for \texttt{GradDiscreteV} and the second argument isthe pointer of the parent asking the gradient.If a Node cannot provide a gradient of some sort, it should justsilently return.  This applies, for instance, to continuous nodes whenthey are asked for \texttt{GradDiscrete}.\subsubsection{Removing nodes}\label{sec:die}The method \texttt{Die} kills a node.  This may cause a chain reactionwhere other nodes will be killed if they are left orphan or withoutchildren.  This behaviour is controlled by the \texttt{persist}variable which can be accessed by the methods \texttt{GetPersist} and\texttt{SetPersist} (see \ref{sec:persist}).The nodes are not actually removed from the network before calling\texttt{CleanUp} (see \ref{sec:cleanup}).Do not try to \texttt{delete} a node: net has to know which nodes existand it takes care of the deletion also.\subsubsection{Updates}\label{sec:nodeup}A (variable) node can be updated by \texttt{Update}.  A node can beoutdated by \texttt{Outdate}.  This propagates to children incomputational nodes and outdates the cost function in variable nodes.Another way to update a node is to use the set of functions\texttt{SaveState}, \texttt{SaveStep} and \texttt{RepeatStep}.  Theseare used by \texttt{UpdateAllHookeJeeves} method in \texttt{PyNet}(see Sec.~\ref{sec:advanced-updates}).  \texttt{SaveState} saves thecurrent value of the variable to be used as a base point for futurechanges.  \texttt{SaveStep} saves the current change from the statesaved by \texttt{SaveState}.  \texttt{RepeatStep} can then later beused to repeat the same step multiplied by a constant, starting fromthe original state saved by \texttt{SaveState}.  These operations areat the moment only supported by \texttt{Gaussian} and\texttt{GaussianV} nodes.\subsubsection{Miscellaneous}\label{sec:misc}The label and type of node can be queried by the methods\texttt{GetLabel} and \texttt{GetType}.  The parents and children canbe obtained by the methods \texttt{GetParent} and \texttt{GetChild}.They take an index as an argument and return null pointer if the indexis out of range.\subsection{On-line learning}\label{sec:online}On-line learning works but is still experimental.\subsubsection{Time type}Each node is either\begin{description}\item[0:] time independent,\item[1:] time dependent or\item[2:] memory node\item[3:] delay node\end{description}The type of a node can be queried by \texttt{TimeType} method.Time dependent nodes are assumed to have different values at each timeinstant while time independent nodes have only one value which doesnot depend on time.  Memory nodes connect type 0 and type 1nodes and gather the gradients when time is stepped.\subsubsection{Updates}Type 0 nodes are updated by the \texttt{UpdateTimeInd} method of\texttt{Net} class.  Type 1 nodes are updated by the\texttt{UpdateTimeDep} method and memory nodes are instructed tosave the gradients by \texttt{StepTime} method.  This also instructsthe net to memorise the cost function from type 1 nodes.  Thememorised cost is automatically added to the cost when queried by the\texttt{Cost} method.  It can be directly accessed by \texttt{GetOldCost}and \texttt{SetOldCost} methods.Do not use the \texttt{UpdateAll} method in on-line learning: itupdates all nodes regardless of their type.\subsubsection{Controlling time type}Do not forget to use memory nodes.  If two nodes are connected and oneof them has type 1, then the other node is also converted to type 1.The conversion propagates so that the nodes connected to the newlyconverted node are also converted.  Memory node blocks the conversion.It is not necessary to give an explicit command about time typealthough the method \texttt{NotifyTimeType} could be used inprinciple.  Children of memory nodes will be automatically set to type1 and any node they are connected to will be set to type 1 asexplained above.\subsubsection{Decay}In on-line learning, it is impossible to change the inferences madeabout past data.  This means that the network could get stuck in falseinterpretations made in the early learning.  To prevent this, old datais gradually forgotten.  The decay rate is governed by a ratioparameter which is by default 0.5.  This means that effectively themodel uses only half of the data it has been given.  Currently thedecay rate is common to every memory node, but this may change in thefuture.  The decay is controlled by a\texttt{DecayCounter} object maintained by the \texttt{Net} class.The latest decay factor can be queried by the \texttt{Decay} method of\texttt{Net}.  Currently the the ratio parameter (and other relevantparameters) of \texttt{DecayCounter} need to be set by directlyaccessing the field \texttt{Net.dc.ratio}.\subsubsection{Usual operation}\begin{verbatim}Connect the networkIterate    Clamp time dependent nodes    Iterate        UpdateTimeDep    Sometimes        UpdateTimeInd    StepTime\end{verbatim}\subsection{Data structures}The network passes distributions from parents to children.  For realvalues the distributions are summarised by a set of expectations(currently mean, variance and expectations of exponential arepossible).  The distributions of discrete variables are passed as lists of probabilities for each possible value, that is there are no summaries.  The gradients of thequantities passed from parents to children are passed in the oppositedirection from children to parents.  Usually the user does not need toworry about the gradients.The basic data structures needed by the user are \texttt{DSSet}
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -