📄 reference.html

📁 一种新颖的SVM算法
💻 HTML
📖 第 1 页 / 共 4 页
字号:
         0. Exit</PRE><P><H3><A NAME="SECTION00036100000000000000">Setting the Bound on Alphas</A></H3><P>This options sets the upper bound <I>C</I> on the support vector coefficients (alphas). This is the free parameter which controlsthe trade off between minimizing the loss function (satisfying the constraints) and minimizing over the regularizer. The lower the value of <I>C</I>, the moreweight is given to the regularizer.<P>If <I>C</I> is set to infinity all the constraints must be satisfied.Typing 0 is equivalent to setting <I>C</I> to infinity. In the pattern recognition case this means that the training vectors must be classified correctly (they must be linearly separable in feature space).<P>Choosing the value of <I>C</I> needs care. Even if your data can beseparated without error, you may obtain better results by choosingsimpler decisions functions (to avoid over-fitting) by lowering the valueof <I>C</I>, although this is generally problem specific and dependent on theamount of noise in your data.<P>A good rule of thumb is to choose a value of <I>C</I> that is slightlylower than the largest coefficient or alpha value attained from trainingwith <IMG WIDTH=48 HEIGHT=13 ALIGN=BOTTOM ALT="tex2html_wrap_inline536" SRC="img2.gif">. Choosing a value higher than the largest coefficient willobviously have no effect as the box constraint will never be violated.Choosing a value of <I>C</I> that is too low (say close to 0) will constrainyour solution too much and you will end up with too simple a decision function.<P>Plotting a graph of error rate on the testing set against choice of parameter <I>C</I> will typicallygive a bowl shape, where the best value of <I>C</I> is somewhere in the middle.For inexperienced users who wish to get an intuitive grasp of how to choose<I>C</I> try playing with this value on toy problems using the RHUL SV appletat <TT>``http://svm.cs.rhbnc.ac.uk''</TT>.<P><H3><A NAME="SECTION00036200000000000000">Scaling</A></H3><P>Included in the support vector engine is a convenience functionwhich pre-scales your data before training. The programs automaticallyscale your data back again for output and error measures, allowing a quick way to pre-process your data suitably to ensure the dot products(results of your chosen kernel function) give reasonable values.For serious problems, it is recommended you do your own pre-processing,but this function is still a useful tool.<P>Scaling can be done either globally (i.e. all values arescaled by the same factor) or locally (each individual attribute is scaledby an independent factor).<P>As a guideline you may wish to think of it this way;if the attributes are all of the same type (e.g. pixel values) thenscale globally, if they are of different types (e,g, age, height,weight) then scale locally. When you select the scaling option, theprogram first asks if you want to scale the data, then it asks if allattributes are to be scaled with the same factor.  Answering Ycorresponds to global scaling and N corresponds to local scaling.  Youare then asked to specify the lower and upper bounds for the scaleddata e.g. -1 and 1, or 0 and 1.<P>The scaling of your data is important! Incorrect scaling can make the program appear not to be working, but in fact the training is sufferingbecause of lack of precision of the values of dot products in featurespace. Secondly certain kernels require their parameters to be withincertain ranges, for example the linear spline kernel requires that all attributes are positive, and the weak mode regularized Fourierkernel requires that <IMG WIDTH=126 HEIGHT=25 ALIGN=MIDDLE ALT="tex2html_wrap_inline548" SRC="img3.gif">.  For a full description of the requirements of each kernel function see the appendix.<P><H3><A NAME="SECTION00036300000000000000">Chunking</A></H3><P>This option chooses the type of optimizer training strategy. Note, the choice ofstrategy should not effect the learning ability of the SVM but rather the speedof training. If no chunking is selected the optimizer is invoked with all training points.  Only use the 'no chunking' option if the number of training points is small (less than 1000 points).<P>The optimizer requires half of an <I>n</I> by <I>n</I> matrix where <I>n</I> is the number of training points, so if the number of points islarge (<IMG WIDTH=48 HEIGHT=24 ALIGN=MIDDLE ALT="tex2html_wrap_inline556" SRC="img4.gif">) youwill probably just run out of memory and even if you don't it will be very slow.<P>If you have a large number of data points the training should consider the optimizationproblem as solving a sequence of sub-problems - which we call chunking. There are two types of chunking method implemented,  posh chunking and sporty chunking(out of respect to the Spice Girls)which follow the algorithms described in the papers [<A HREF="reference.html#edgar1">OFG97b</A>] and [<A HREF="reference.html#edgar2">OFG97a</A>] respectively.<P>Sporty chunking requires that you enter the chunk size. This represents the numberof training points that are added to the chunk per iteration. A typical value forthis parameter is 500.<P>Posh chunking requires that you enter the working set size and the pivoting size. The working set size is the number of vectors that are in each sub-problem, which is fixed (in sporty chunking this is variable). A typical value is 700.The pivoting size is the maximum number of vectors that can be moved out of the sub-problem and are replaced with fixed vectors. A typical value is 300.<P><H3><A NAME="SECTION00036400000000000000">Setting <IMG WIDTH=6 HEIGHT=7 ALIGN=BOTTOM ALT="tex2html_wrap_inline518" SRC="img1.gif"></A></H3><P><IMG WIDTH=6 HEIGHT=7 ALIGN=BOTTOM ALT="tex2html_wrap_inline518" SRC="img1.gif"> defines the <IMG WIDTH=6 HEIGHT=7 ALIGN=BOTTOM ALT="tex2html_wrap_inline518" SRC="img1.gif"> insensitive loss function. When <IMG WIDTH=48 HEIGHT=13 ALIGN=BOTTOM ALT="tex2html_wrap_inline536" SRC="img2.gif"> thisstipulates how far the training examples are allowed to deviate fromthe learnt function. As <I>C</I> tends to zero, the constraints become soft.<P><H3><A NAME="SECTION00036500000000000000">Setting the Multiclass Method</A></H3><P>This selects the multi-class method to use. If we have <I>n</I> classes,method 0 trains <I>n</I> machines, each classifying one class against therest.Method 1 trains <IMG WIDTH=42 HEIGHT=33 ALIGN=MIDDLE ALT="tex2html_wrap_inline572" SRC="img5.gif"> machines, each classifyingone class against one other class.For each machine there is a voting scheme that is explained in[<A HREF="reference.html#multiclasspaper">Knoen</A>].<P><H3><A NAME="SECTION00036600000000000000">Setting Multiclass Continuous Classes</A></H3><P>This setting is designed to speed up training in the multi-classSVM. If you know your classes are a sequence of continuous integers like2,3,4, then you can enter 1 here to speed things up. If you choose this option andthis is not the case the machine's behaviour is undefined. So if indoubt leave this setting at 0.<P><H2><A NAME="SECTION00037000000000000000">Setting Kernel Specific Parameters</A></H2><P>This menu option allows you to enter the free parameters of the specifickernel you have chosen. If the kernel has no free parameters then you willnot be prompted to enter anything, the program will just go back to the mainparameter menu.<P><H2><A NAME="SECTION00038000000000000000">Setting the Expert Parameters</A></H2><P>The expert parameter menu has the following options:<P><PRE>        Expert parameters        =================        Usually these are ok!         1. Optimizer (1=MINOS, 2=LOQO, 3=BOTTOU)       3         2. SV zero threshold                           1e-16         3. SV Margin threshold                         0.1         4. Objective function zero tolerance           1e-07         0. Exit</PRE><P>If you are an inexperienced user, you are advised not to alter thesevalues.<P><H3><A NAME="SECTION00038100000000000000">Optimizer</A></H3><P>There are three optimizers that can be currently beused with the RHUL SV package. These are used to solve the optimizationproblems required to learn decision functions. They are:<P><UL><LI> MINOS - a commercial optimization package written by the Department of Operations Research, Stanford University.<LI> LOQO - an implementation of an interior point method based on the LOQO  paper [<A HREF="reference.html#loqo">Van</A>] written by Alex J. Smola, GMD, Berlin.<LI> BOTTOU - an implementation of the conjugate gradient methodwritten by Leon Bottou, AT&amp;T research labs. </UL><P>Only the LOQO and BOTTOU optimizers are provided in the distributionof this package as the first is a commercial package. However, stubs are provided forMINOS, and  should the user acquire a license for MINOS, or if the user already has MINOS,all you have to do is to place the MINOS Fortran code into thedirectory minos.f, change the MINOS setting in <TT>Makefile.include</TT> and re-make.<P>The LOQO optimizer is not currently implemented for regression estimation problems.<P><H3><A NAME="SECTION00038200000000000000">SV zero threshold</A></H3><P>This value indicates the cut-off point when a double precision Lagrange multiplier is considered zero, in other words what numbers are not counted as support vectors.In theory support vectors are all vectors with a non-zero coefficient, however,in practice optimizers only deal with numbers to some precision, and as defaultvalues below 1e-16 are considered as zero. Note that for differentoptimizers and different problems this can change. An sv zero threshold that is too low can result in alarge number of support vectors and increased training time.<P><H3><A NAME="SECTION00038300000000000000">SV Margin threshold</A></H3><P>This value represents the ``virtual'' margin used in the posh chunkingalgorithm. The idea is that training vectors are not added to the chunk unless they are onthe wrong side of the virtual margin, rather than the real margin, wherethe virtual margin is at distance 1-<I>value</I> (default 1-0.1) from the decision hyperplane. This is used to remove the problem of slight losses inprecision that can cause vectors to cycle from being correctly to incorrectlyclassified in the chunking algorithm.<P><H3><A NAME="SECTION00038400000000000000">Objective function zero tolerance</A></H3><P>Both chunking algorithms terminate when the objective function does not improveafter solving an optimization sub-problem. To prevent precision problems ofthe objective continually being improved by extremely small amounts(again caused by precision problems with the optimizer) and the algorithm neverterminating, an improvement has to be larger than this value to berelevant.<P><H1><A NAME="SECTION00040000000000000000"><TT>sv</TT></A></H1> <A NAME="sv">&#160;</A><P>The SVM is run from the command line, and has the following syntax :<P><TT>sv &lt;Training File&gt; &lt;Test File&gt; [&lt;Parameter File&gt; [&lt;sv machine file&gt;]] </TT><P>For a description of the file format see the appendix or for a simple introduction, see ``<TT>sv/docs/intro/sv_user.tex</TT>''.<P>Specifying a parameter file is optional. If no parameter file isspecified, then the user will be presented with a set of menus, whichwill allow the user to define a set of parameters to be used.These menus are exactly the same as those used to enter parameters usingthe <TT>paragen</TT> program (see section <A HREF="reference.html#paragen">3</A>).<P>If a parameter file is included the learnt decision function canbe saved in with the file name of your choice. This can be reloadedusing the program <TT>loadsv</TT> (section <A HREF="reference.html#loadsv">5</A>) to test new dataat a later stage.<P>After calculating a decision rule based on the training set, each ofthe test examples are evaluated and the program outputs a list ofstatistics. If you do not want to test a test set you can specify``<TT>/dev/null</TT>''or an empty file as the second parameter. You can also specify the same fileas the training and testing set.<P>The output from the program will depend on whether the user is usingthe SV Machine for pattern recognition, regression estimation, or multi-class classification.  First of all the output from the optimizer is given,followed by  a list of which examples inthe training set are the support vectors<A NAME="tex2html1" HREF="#78"><IMG  ALIGN=BOTTOM ALT="gif" SRC="file:/usr/lib/latex2html/icons/foot_motif.gif"></A>.  Performance statistics involving the error on the training and testing sets are then given. Following this, each support vector is listed along with the value of its Lagrange multiplier (alpha value), and its deviation from the margin.<P><H2><A NAME="SECTION00041000000000000000">Output from the <TT>sv</TT> Program</A></H2><P><PRE>        SV Machine parameters        =====================        Pattern Recognition        Full polynomial        Alphas unbounded        Input values will be globally scaled between 0 and 1.        Training data will not be posh chunked.        Training data will not be sporty chunked.        Number of SVM: 1        Degree of polynomial: 2.        Kernel Scale Factor: 256.        Kernel Threshold: 1.----------------------------------------Positive SVs: 12 13 16 41 111 114 157 161 Negative SVs: 8 36 126 138 155 165 There are 14 SVs (8 positive and 6 negative).Max alpha size: 3.78932B0 is -1.87909Objective = 9.71091Training set: Total samples:          200Positive samples:       100of which errors:        0Negative samples:       100of which errors:        0----------------------------------------Test set: Total samples:          50Positive samples:       25of which errors:        0Negative samples:       25of which errors:        0There are 1 lagrangian multipliers per support vector.No.     alpha(0)        Deviation    8      1.86799      3.77646e-08    12     0.057789      6.97745e-08   13      2.75386                     16     0.889041      -1.63897e-08   36      1.53568      5.93671e-08   41     0.730079      3.91323e-09  111     0.359107      -1.38041e-07
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -