📄 userguide.tex
字号:
that particular data. The error on this particular data set will become verysmall, but when new data is presented to the network, the error ismuch larger. Thus, the network is trained in memorizing the trainingdata rather than in being a good estimator of $\vf(\vx)$ for all $\vx$. This is known as over-fitting. A common technique to minimize over-fitting is \emph{early stopping}\cite{fon93}.In early stopping, the available data set is divided in a training setand a validation set. In each iteration of the training process, thenetwork parameters are adjusted in order to minimize the error on the training set. In the first iterations, the error on the validation set will also decrease, however, when the network starts to over-fit,the validation error will increase. Training is stopped when thevalidation error increases for a specified number of iterations. Dividing the data set in a training set and a validation set can bedone in several ways. The simplest procedure is probably to put half of the patterns in the training set, leaving the other half out for thevalidation set. In another procedure, called \emph{bootstrapping}\cite{efr93}, thetraining set consists of $\MU$ patterns, which are drawn \emph{withreplacement} from the original data set of $\MU$ patterns. Thus, some ofthe data patterns will occur more than once in the training set. Thepatterns which do not occur in the training set make up the validation set.For large $\MU$, the probability that a pattern becomes part of thevalidation set is $(1 - 1/\MU)^{\MU} \approx 1/e \approx 0.37$.\section{Ensembles}Instead of training only one network, you can also perform thetraining procedure more then once. This will give you a number ofnetworks, each giving an estimator of the regression. When you usedifferent subdivisions of the data and different initial conditions,each estimator will be different. It appears that combining theseestimators can give you a new more robust estimator. There are severalways to combine these estimators. The most simple one is to just usethe network with the smallest error on the complete data set, this iscalled \emph{bumping} \cite{tib95}. For more elaborate techniques the new estimatedepends on the cost function. For the \tbcmd{wcf_snn}, which computes aweighted average over output error functions (see section \ref{wcf}), a technique called \emph{bagging} \cite{bre94} can be used, in which the newestimate depends on the output error functions. This techniqueweights all networks equally. With sum squared output errorfunctions, the new estimate is just the average over the outputs ofthe networks in the ensemble. Different error functions yield slightlydifferent averaging procedures.The \emph{balancing} \cite{hes96b} technique is an extension of bagging, in which a different weight is given to each network in the ensemble. The weight of each network is computed from the errors in the estimates. \section{Confidence intervals}Confidence intervals provide a way to quantify our confidence in theestimate $\vy$ of the regression $\vf$ \cite{hah91}. Asssuming some probabilitydistribution $p(\vf|\vy)$ for the true regression $\vf$ given ourestimate $\vy$, the confidence interval is defined to be the intervalaround $\vy$ with $\vf$ in it in $1 - \alpha$ percent of the cases,where $\alpha$ is the error level. For the standard error, $\alpha = 0.33$. To estimate the confidence interval, we estimate the maximum allowedoutput error given $\alpha$ and invert this error around $\vy$ to findupper and lower limits. The maximum allowed output error is estimated to be $c_{\mbox{\scriptsize confidence}} \sigma(\vx)$. In this, theaverage error $\sigma(\vx)$ is estimated using the variation in theoutputs of the networks in the ensemble. The value of the factor $c_{\mbox{\scriptsize confidence}}$ isestimated from $\alpha$ and training data \cite{hes96c}.\section{Prediction intervals}Confidence intervals deal with the accuracy of the prediction of theregression, i.e. of the mean of the target probability distribution.Prediction intervals consider the accuracy with which we can predictthe targets themselves, i.e. they are based on estimates of thedistribution $p(\vt|\vy)$ \cite{hah91}. Its definition is analogous to theconfidence interval.The estimated prediction interval is again the inverse of the outputerror function, with as maximum allowed error $c_{\mbox{\scriptsize prediction}} s(\vx)$, where $s(\vx)$ is an estimated prediction error, which iscalculated using a specially trained neural network to predict thenoise \cite{nix94, hes96c}. This network iscreated by using an averaged network of the networks in the ensemble.To train this network to predict errors, only the connectionweights and biases of the last layer in the network are adjusted. Thuswe create a network which is similar to the regression networksin the input-to-hidden weights (and thehidden-to-hidden weights), but is different in its hidden-to-outputweights. Both this network and $c_{\mbox{\scriptsize prediction}}$ are computedusing the training data and the desired $\alpha$.\section{Input relevance}To determine the relevance of each input to your network, you can lookat the explained variance in the outputs ofnetworks with and without each input. With $n$ inputs, you can make$2^n$ combinations of inputs for your network. Clearly, trainingnetworks for all these combinations to find the combination with thelargest explained variance is only feasible for small $n$. For large$n$, our approach is to start with a network with all inputs anditeratively remove the input that adds the least to the explainedvariance. After removal, we adjust the remaining network parameters$\vw$ (partial retraining) \cite{laa97,laa98b}, to obtain better parameters for the situation without the removed input. This adjustment is a one step update procedure andtherefor no complete retraining (hence the name partial retraining).After that, again we search for the input that adds the least to theexplained variance, etc. Thus we obtain a series of inputs from leastto most relevant.%---------------------------------------------------------------------\chapter{Training Neural networks with \toolboxname}\label{netpacktoolbox}%---------------------------------------------------------------------The \toolboxname\ toolbox provides various functions for trainingneural networks.\section{Before you begin}\subsection{Initialize}To use the toolbox in \matlab, you need to initializeit first. This should be done by running the command \tbcmd{init_netpack_snn}.To do this, start \matlab\ and at the \matlab\ prompt type \begin{example}run(fullfile('NETPACKROOT', 'netpack', 'init_netpack_snn'));\end{example}where for NETPACKROOT you substitute the installation path of the\toolboxname\ toolbox (e.g. \file{/home/user/netpack-toolbox-1.1} or \file{C:$\backslash$netpack-toolbox-1.1}).You can also add this command to your \matlab\ startup file (e.g. \file{\$HOME/matlab/startup.m}) to initialize automatically at startup.\subsection{Help}For (almost) all functions in the toolbox online help is available in\matlab\ by typing:\begin{example}help <function name>\end{example}where \verb|<function name>| is the name of the function you wantinformation on.\section{Create network structure}When you want to train a feedforward neural network, the first thingto do is to decide on the architecture of your network, i.e. the number of hidden layers in your network, the number of hidden units in eachlayer and the transfer function for each layer. Also, you must decidewhich cost function you will use, and what training algorithm you want to use to minimize this cost function.This toolbox stores all this information in a structure that can becreated with the command \tbcmd{net_struct_snn}, which takes four arguments: \begin{enumerate}\item An array of integers with the number inputs and the number of units in each network layer. \item A cell array with transfer function names for each layer. \item The name of a training algorithm to minimize the cost function.\item The name of the cost function.\end{enumerate}The last two arguments are optional, the default is to use\tbcmd{trainlm_snn} as training algorithm and \tbcmd{wcf_snn} as costfunction.For example, \begin{example}net = net_struct_snn([1 8 2], {'tansigtf_snn' 'lintf_snn'})\end{example}creates a network with an input layer with one input, one hidden layerwith 8 units and an output layer with two outputs. The hidden layerhas \tbcmd{tansigtf_snn} (a hyperbolic tangent sigmoid) as a transferfunction and the output layer has as transfer function\tbcmd{lintf_snn} (linear transfer). Possible transfer functions are linear transfer \tbcmd{lintf_snn}, tansigmoid transfer \tbcmd{tansigtf_snn}, log sigmoid transfer\tbcmd{logsigtf_snn}, exponential transfer \tbcmd{exptf_snn} andradial basis transfer \tbcmd{radbastf_snn}.As training algorithms you can choose from gradient descent\tbcmd{traingd_snn}, conjugate gradient with Polak-Ribi\'ere update\tbcmd{traincgp_snn} and Levenberg-Marquardt \tbcmd{trainlm_snn}. For small architectures (say upto 50 weights), we adviseLevenberg-Marquardt. For larger networks conjugate gradient givesbetter results with respect to training times. % dit is niks\section{The wcf\_snn cost function}\label{wcf}The \toolboxname\ toolbox is designed to allow you to write your owncost functions, but it is probably easier to use the default cost function \tbcmd{wcf_snn}, which is very flexible. The target output of a network for each pattern can be either ascalar or a vector of multiple output variables. With $\vtmu$ we denote a vector target output for pattern $\mu$. With $t_{i\mu}$ we denote the value of the $i$th variable of this target output. The cost function\tbcmd{wcf_snn} computes for each combination of $i$ and $\mu$ anoutput variable error function $e(y_{i\mu}, t_{i\mu})$, whichis a function that depends on the network output and the target outputvariable $i$ of pattern $\mu$. The function \tbcmd{wcf_snn} returnsa weighted average over $i$ and $\mu$ of the output error functions. For each $i$ you can specify a different output error function. Theweights in the average are the product of an output variable weight$a_i$ and a pattern weight $g_{\mu}$, which can also be specified. Additionally, for each combination of $i$ and $\mu$, you can specify not toinclude the output error function in the average. Mathematically this \tbcmd{wcf_snn} cost function is:\begin{equation}E(\vecay, \vecat) = \frac{1}{Z} \sum_{ \{i, \mu | \Delta_{i\mu} = 1\} } a_i g_{\mu} e^i(y_{i\mu}, t_{i\mu})\end{equation}with $Z$ a normalization factor:\begin{equation}Z = \sum_{ \{i, \mu | \Delta_{i\mu} = 1\} } a_i g_{\mu}\end{equation}\begin{tabular}{c l}$i$ & output variable index \\$\mu$ & pattern index \\$a_i$ & output variable weight \\$g_{\mu}$ & pattern weight \\$e^i$ & output error function for variable $i$ \\$y_{i\mu}$ & network output for variable $i$, pattern $\mu$ \\$t_{i\mu}$ & target output for variable $i$, pattern $\mu$ \\$\Delta_{i\mu}$ & mask controlling which combination of \\& output variables and patterns are included in the average \\\end{tabular}\subsection{Create data structure}To use \tbcmd{wcf_snn} (or any other cost function), you must store yourdata in a structure that the cost function expects. For\tbcmd{wcf_snn} there are two commands to create such a structure:\tbcmd{wcfdata_struct_snn} and \tbcmd{import_ascii_data_snn}. % dit is niks\subsubsection{import\_ascii\_data\_snn}Training data can be read from file using the command\tbcmd{import_ascii_data_snn}, which reads an ascii file. This filemust contain columns with input and target values. The command \tbcmd{import_ascii_data_snn} takes four arguments: \begin{enumerate}\item The name of the ASCII file. \item Column indices for the inputs. \item Column indices for the targets. \item (Optional) Column index for the pattern weights. \end{enumerate}For example, the command\begin{example}wcfdata = import_ascii_data_snn('file.asc', 1, [2 3])\end{example}reads training patterns from the file \file{file.asc}, with inputsfrom the first column of the file, and outputs from the second andthird column. % dit is niks\subsubsection{wcfdata\_struct\_snn}When the data is already in \matlab 's workspace, you can create a data structure for \tbcmd{wcf_snn} with \tbcmd{wcfdata_struct_snn}.This command takes four arguments:\begin{enumerate}\item Matrix of inputs $\vxmu$. Each column represents a pattern. Each row an input variable.\item Matrix of targets $\vtmu$. Each column represents a pattern. Each row an output variable\item (Optional) Row matrix of pattern weights $g_{\mu}$.\item (Optional) Matrix with mask $\Delta$ for weighted average. \end{enumerate}For example,\begin{example}P = rand(2,5); T = rand(1,5); gmu = [1 2 2 1 1];wcfdata = wcfdata_struct_snn(P, T, gmu)\end{example}% dit is niks\subsection{Set wcf\_snn parameters}When \tbcmd{wcf_snn} is computed, the values of $a_i$, $g_{\mu}$,$e_i$ and $\Delta_{i\mu}$ are evaluated. Of these parameters, we consider $g_{\mu}$ and $\Delta_{i\mu}$ part of the trainingdata and $a_i$ and $e_i$ part of the cost function of the network (although you may argue otherwise). Thus, $g_{\mu}$ and $\Delta_{i\mu}$ are part of the data structure, and $a_i$ and $e_i$ are part of thenetwork structure. \subsubsection{Output variable weights $a_i$}By default, the $a_i$'s are not set (empty), which interprets \tbcmd{wcf_snn}as to weigh all outputs equally. To set different weights, you can use\tbcmd{set_wcfnet_a_snn} on a network structure with \tbcmd{wcf_snn}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -