📄 index.html
字号:
same order as in the training set)</PRE></DIR><P>A more detailed description of the parameters and how they link to the respective algorithms is given in the appendix of [<A href="#References">Joachims, 2002a</A>]. </P><P>The input file <TT>example_file</TT> contains the training examples. The first lines may contain comments and are ignored if they start with #. Each of the following lines represents one training example and is of the following format: </P><DIR><TT><line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info></TT><BR><TT><target> .=. +1 | -1 | 0 | <float></TT> </TT><BR><TT><feature> .=. <integer> | "qid"</TT><BR><TT><value> .=. <float></TT><BR><TT><info> .=. <string></TT></DIR><P>The target value and each of the feature/value pairs are separated by a space character. Feature/value pairs MUST be ordered by increasing feature number. Features with value zero can be skipped. The string <TT><info></TT> can be used to pass additional information to the kernel (e.g. non feature vector data).</P><P>In classification mode, the target value denotes the class of the example. +1 as the target value marks a positive example, -1 a negative example respectively. So, for example, the line </P><blockquote> <P><tt>-1 1:0.43 3:0.12 9284:0.2 # abcdef</tt> </P></blockquote><P>specifies a negative example for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0. In addition, the string <tt>abcdef</tt> is stored with the vector, which can serve as a way of providing additional information for user defined kernels. A class label of 0 indicates that this example should be classified using transduction. The predictions for the examples classified by transduction are written to the file specified through the -l option. The order of the predictions is the same as in the training data. </P><P>In regression mode, the <target> contains the real-valued target value.</P><P>In ranking mode [<A href="#References">Joachims, 2002c</A>], the target value is used to generated pairwise preference constraints (see <a href="javascript:if(confirm('http://striver.joachims.org/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://striver.joachims.org/'" tppabs="http://striver.joachims.org/">STRIVER</a>). A preference constraint is included for all pairs of examples in the <TT>example_file</TT>, for which the target value differs. The special feature "qid" can be used to restrict the generation of constraints. Two examples are considered for a pairwise preference constraint only, if the value of "qid" is the same. For example, given the <TT>example_file</TT></P><BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"> <P><TT>3 qid:1 1:0.53 2:0.12<BR> 2 qid:1 1:0.13 2:0.1<BR> 7 qid:2 1:0.87 2:0.12 </TT></P></BLOCKQUOTE><P>a preference constraint is included only for the first and the second example(ie. the first should be ranked higher than the second), but not with the third example, since it has a different "qid".</P><P>In all modes, the result of <TT>svm_learn</TT> is the model which is learned from the training data in <TT>example_file</TT>. The model is written to <TT>model_file</TT>. To make predictions on test examples, <TT>svm_classify</TT> reads this file. <TT>svm_classify</TT> is called with the following parameters: </P><DIR><TT><P>svm_classify [options] example_file model_file output_file</P></TT></DIR><P>Available options are: </P><blockquote><PRE>-h Help. -v [0..3] Verbosity level (default 2).-f [0,1] 0: old output format of V1.0 1: output the value of decision function (default)</PRE></blockquote><P>The test examples in <TT>example_file</TT> are given in the same format as the training examples (possibly with 0 as class label). For all test examples in <TT>example_file</TT> the predicted values are written to <TT>output_file</TT>. There is one line per test example in <TT>output_file</TT> containing the value of the decision function on that example. For classification, the sign of this value determines the predicted class. For regression, it is the predicted value itself, and for ranking the value can be used to order the test examples. The test example file has the same format as the one for <TT>svm_learn</TT>. Again, <TT><class></TT> can have the value zero indicating unknown. </P><P>If you want to find out more, try this <A href="svm_light_faq.html" tppabs="http://www.cs.cornell.edu/People/tj/svm%5Flight/svm_light_faq.html">FAQ</A>. </P><H2>Getting started: some Example Problems</H2><H3>Inductive SVM</H3><P>You will find an example text classification problem at </P><DIR><P><a href="javascript:if(confirm('http://download.joachims.org/svm_light/examples/example1.tar.gz \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://download.joachims.org/svm_light/examples/example1.tar.gz'" tppabs="http://download.joachims.org/svm_light/examples/example1.tar.gz" target="_top">http://download.joachims.org/svm_light/examples/example1.tar.gz</a></P></DIR><P>Download this file into your svm_light directory and unpack it with </P><DIR><TT><P>gunzip -c example1.tar.gz | tar xvf -</P></TT></DIR><P>This will create a subdirectory <TT>example1</TT>. Documents are represented as feature vectors. Each feature corresponds to a word stem (9947 features). The task is to learn which <a TARGET="_top" HREF="javascript:if(confirm('http://www.daviddlewis.com/resources/testcollections/reuters21578/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.daviddlewis.com/resources/testcollections/reuters21578/'" tppabs="http://www.daviddlewis.com/resources/testcollections/reuters21578/">Reuters articles</a> are about "corporate acquisitions". There are 1000 positive and 1000 negative examples in the file <TT>train.dat</TT>. The file <TT>test.dat</TT> contains 600 test examples. The feature numbers correspond to the line numbers in the file <TT>words</TT>. To run the example, execute the commands: </P><DIR><TT><P>svm_learn example1/train.dat example1/model<BR></TT><TT>svm_classify example1/test.dat example1/model example1/predictions</P></TT></DIR><P>The accuracy on the test set is printed to stdout. </P><H3>Transductive SVM</H3><P>To try out the transductive learner, you can use the following dataset (see also <a href="javascript:if(confirm('http://sgt.joachims.org/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://sgt.joachims.org/'" tppabs="http://sgt.joachims.org/">Spectral Graph Transducer</a>). I compiled it from the same Reuters articles as used in the example for the inductive SVM. The dataset consists of only 10 training examples (5 positive and 5 negative) and the same 600 test examples as above. You find it at </P><DIR><P><a href="javascript:if(confirm('http://download.joachims.org/svm_light/examples/example2.tar.gz \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://download.joachims.org/svm_light/examples/example2.tar.gz'" tppabs="http://download.joachims.org/svm_light/examples/example2.tar.gz" target="_top">http://download.joachims.org/svm_light/examples/example2.tar.gz</a></P></DIR><P>Download this file into your svm_light directory and unpack it with </P><DIR><TT><P>gunzip -c example2.tar.gz | tar xvf -</P></TT></DIR><P>This will create a subdirectory <TT>example2</TT>. To run the example, execute the commands: </P><DIR><P><TT>svm_learn example2/train_transduction.dat example2/model</TT> <BR><TT>svm_classify example2/test.dat example2/model example2/predictions</TT></P></DIR><P>The classification module is called only to get the accuracy printed. The transductive learner is invoced automatically, since <TT>train_transduction.dat </TT>contains unlabeled examples (i. e. the 600 test examples). You can compare the results to those of the inductive SVM by running: </P><BLOCKQUOTE><TT>svm_learn example2/train_induction.dat example2/model</TT> <BR><TT>svm_classify example2/test.dat example2/model example2/predictions</TT></BLOCKQUOTE><P>The file <TT>train_induction.dat</TT> contains the same 10 (labeled) training examples as <TT>train_transduction.dat</TT>. </P><H3> Ranking SVM</H3><P>For the ranking SVM [<A href="#References">Joachims, 2002c</A>], I created a toy example. It consists of only 12 training examples in 3 groups and 4 test examples. You find it at </P><DIR><P><a href="javascript:if(confirm('http://download.joachims.org/svm_light/examples/example3.tar.gz \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://download.joachims.org/svm_light/examples/example3.tar.gz'" tppabs="http://download.joachims.org/svm_light/examples/example3.tar.gz" target="_top">http://download.joachims.org/svm_light/examples/example3.tar.gz</a></P></DIR><P>Download this file into your svm_light directory and unpack it with </P><DIR><TT><P>gunzip -c example3.tar.gz | tar xvf -</P></TT></DIR><P>This will create a subdirectory <TT>example3</TT>. To run the example, execute the commands: </P><DIR><P><TT>svm_learn -z p example3/train.dat example3/model</TT> <BR><TT>svm_classify example3/test.dat example3/model example3/predictions</TT></P></DIR><P>The output in the predictions file can be used to rank the test examples. If you do so, you will see that it predicts the correct ranking. The values in the predictions file do not have a meaning in an absolute sense. They are only used for ordering. </P><P>It can also be interesting to look at the "training error" of the ranking SVM. The equivalent of training error for a ranking SVM is the number of training pairs that are misordered by the learned model. To find those pairs, one can apply the model to the training file: </P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -