📄 index.html
字号:
<html> <head> <title>SVM-python</title> <link href="style.css" rel="stylesheet" type="text/css"> </head> <body><h1>SVM<sup><i>python</i></sup></h1><ul><li>High Level View<ul><li><a href="#overview">Introduction</a></li><li><a href="#building">Building</a></li><li><a href="#using">Using</a></li><li><a href="#learning">Overview of <code>svm_python_learn</code><li><a href="#classification">Overview of <code>svm_python_classify</code></a></ul><li>Low Level Reference<ul><li><a href="#objects">Objects</a></li><li><a href="#details">Details of User Functions</a></li><li><a href="#svmlight"><code>svmlight</code> Extension Module</a><li><a href="#parameters">Special Parameters</a><li><a href="#example">Example Module <code>multiclass</code></a></ul></ul><a class="bookmark" name="overview"><h2>Introduction</h2></a><p>Put simply, SVM<sup><i>python</i></sup> is <a href="http://svmlight.joachims.org/svm_struct.html">SVM<sup><i>struct</i></sup></a>, except that all of the C API functions that the user normally has to implement (except those dealing with C specific problems, most notably memory management) instead call a function of the same name in a <a href="http://www.python.org/">Python</a> module. You can write an SVM<sup><i>struct</i></sup> instance in Python without having to author any C code. SVM<sup><i>python</i></sup> tries to stay close to SVM<sup><i>struct</i></sup> in naming conventions and other behavior, but knowledge of SVM<sup><i>struct</i></sup>'s original C implementation is not required.</p><p>This document contains a general overview in the first few sections as well as a more detailed reference in later sections for SVM<sup><i>python</i></sup>. If you're already familiar with SVM<sup><i>struct</i></sup> and Python, it's possible to get a pretty good idea of how to use the package merely by browsing through <code>svmstruct.py</code> and <code>multiclass.py</code>. This document provides a more in depth view of how to use the package.</p><p>Note that this is not a conversion of SVM<sup><i>struct</i></sup> to Python. It is merely an <a href="http://docs.python.org/ext/embedding.html">embedding of Python</a> in existing C code. All code other than the user implemented API functions is still in C, including optimization.</p><p>SVM<sup><i>light</i></sup> is the basic underlying SVM learner, SVM<sup><i>struct</i></sup> a general framework to learn complex output spaces built upon SVM<sup><i>light</i></sup> for which one would write instantiations to learn in a particular setting, and SVM<sup><i>python</i></sup> extends SVM<sup><i>struct</i></sup> to allow such instantiations to be written in Python instead of in C. In SVM<sup><i>struct</i></sup>, the user implement various functions in the <code>svm_struct_api.c</code> file, which the underlying SVM<sup><i>struct</i></sup> code calls in order to learn a task. The intention of SVM<sup><i>struct</i></sup> is that the underlying code is constant, and all that a user needs to change is within <code>svm_struct_api.c</code> and <code>svm_struct_api_type.h</code>. SVM<sup><i>python</i></sup> works the same way, except all the functions that are to be implemented are instead implemented in a Python module (a <code>.py</code> file), and all these functions in <code>svm_struct_api.c</code> are instead glue code to call their embedded Python equivalents from the module, and all the types in <code>svm_struct_api_type.h</code> contain Python objects. The intention of SVM<sup><i>python</i></sup> is that is that the C code stays constant and the user writes new and modifies Python modules to implement specific tasks.<p>The primary advantages are that Python tends to be easier and faster to code than C, less resistant to change and code reorganization, tends to be <em>many</em> times more compact, there's no explicit memory management, and Python's object oriented-ness means that some tedious tasks in SVM<sup><i>struct</i></sup> can be easily replaced with default built in behavior.</p><p>My favorite example of this last point is that, since Python objects can be assigned any attribute, and since many Python objects are easily serializable with the <a href="http://docs.python.org/lib/module-pickle.html"><code>pickle</code></a> module, adding a field to the struct-model in Python code consists of a simple assignment like <code>sm.foo = 5</code> at some point, and that's it. If one were to use C code in SVM<sup><i>struct</i></sup>, one would add a field to the relevant struct, add an assignment, add code to write it to a model file, add code to parse it from a model file, and then test it to make sure all these little changes work well with each other.</p><p>The primary disadvantage to using SVM<sup><i>python</i></sup> is that it is slower than equivalent C code. For example, considering the time outside of SVM optimization, the Python implementation of multiclass classification takes 9 times the time as <a href="http://svmlight.joachims.org/svm_multiclass.html">SVM<sup><i>multiclass</i></sup></a>. However, on this task SVM optimization takes about 99.5% of the time anyway, so the increase is often negligible.</p><a class="bookmark" name="building"><h2>Building</h2></a>To build this, a simple <code>make</code> should do it, <em>unless</em> the Python library you want to use is not the library corresponding to the Python interpreter you get when you just type <code>python</code>.<p>You might want to modify the <code>Makefile</code> to modify the <code>PYTHON</code> variable to the path of the desired interpreter. When you install Python, you install a library and an interpreter. This interpreter is able to output where its corresponding library is stored. The <code>Makefile</code> calls the Python interpreter to get this information, as well as other important information relevant to building a C application with embedded Python. You can specify the path of your desired interpreter by setting <code>PYTHON</code> to something other than <code>python</code>.</p><p>When you build, the program will produce two executables, <code>svm_python_learn</code> for learning a model and <code>svm_python_classify</code> for classification with a learned model.</p><p>I have tried building SVM<sup><i>python</i></sup> with both Python 2.3 and 2.4 on OS X and Linux. Obviously, if the Python module you want to use usess features specific to Python 2.4 (like generator expressions or the long overdue <code>sorted</code>) you wouldn't be able to use the module with an SVM<sup><i>python</i></sup> built against the Python 2.3 library.<a class="bookmark" name="using"><h2>Using</h2></a><p>One annoying detail of embedded Python is that your <code>PYTHONPATH</code> environment variable has to contain "<code>.</code>" so the executable knows where to look for the module to load.</p><p>The file <code>svmstruct.py</code> is a Python module, and also contains documentation on all the functions which the C code may attempt to call. This is a good place to start reading if you are already familiar with SVM<sup><i>struct</i></sup> and want to get familiar with how to build a SVM<sup><i>python</i></sup> Python module. This describes what each function should do and, for non-required functions, describes the default behavior that happens if you <em>don't</em> implement them. The <code>multiclass.py</code> file is an example implementation of multiclass classification in Python.</p><p>Once you've written a Python module in the file <code>foo.py</code> based on <code>svmstruct.py</code> and you want to use SVM<sup><i>python</i></sup> with this module, you would use the following command line commands to learn a model and classify with a model respectively.</p><blockquote><code>./svm_python_learn --m foo [options] <train> <model><br>./svm_python_classify --m foo [options] <test> <model> <output></code></blockquote><p>Note that SVM<sup><i>python</i></sup> accepts the same arguments as SVM<sup><i>struct</i></sup> plus this extra <code>--m</code> option. If the <code>--m</code> option is omitted it is equivalent to including the command line arguments <code>--m svmstruct</code>. Note that though we put this command line option first, the <code>--m</code> option may occur anywhere in the option list.<a class="bookmark" name="learning"><h2>Overview of <code>svm_python_learn</code></h2></a><map name="learningmap"><area shape="rect" coords="39,116,175,161" href="#detail-print_struct_help"><area shape="rect" coords="13,330,200,374" href="#detail-find_most_violated_constraint"><area shape="rect" coords="14,397,201,439" href="#detail-psi"><area shape="rect" coords="30,462,187,505" href="#detail-loss"><area shape="rect" coords="276,58,449,100" href="#detail-parse_struct_parameters"><area shape="rect" coords="289,175,437,219" href="#detail-read_struct_examples"><area shape="rect" coords="307,242,419,286" href="#detail-init_struct_model"><area shape="rect" coords="292,308,433,350" href="#detail-init_struct_constraints"><area shape="rect" coords="280,491,446,536" href="#detail-print_struct_learning_stats"><area shape="rect" coords="301,558,425,600" href="#detail-write_struct_model"></map><img src="learning-tree.gif" alt="Flow Chart of the Learning Program" width="454" height="643" align="right" usemap="#learningmap"><p>Pictured is a diagram illustrating the flow of execution within <code>svm_python_learn</code>. This diagram also describes the SVM<sup><i>struct</i></sup> learning program pretty well, excepting the stuff particular to loading the Python module, and how structure parameters are <i>always</i> parsed to enable the program to load the Python module, and how everything is in C and all functions are required.</p><p>The <font color="red">red boxes</font> indicate things that are done wihin the underlying C code. The other boxes indicate functions to be implemented in the Python module. The <font color="blue">blue boxes</font> indicate functions that absolutely must be implemented or the program won't be able to execute. The <font color="green">green boxes</font> indicate functions that are not required, strictly speaking, because they have some default behavior. The <font color="#999900">yellow boxes</font> indicate functions that in the vast majority of cases are probably unnecessary to implement since the default behavior is probably acceptable.</p><p>The <code>svm_python_learn</code> program first checks whether the command line arguments are structured completely correctly. Whether they are or are not, it checks if a <code>--m</code> module is loaded and loads the Python module. If the arguments were not structured correctly, the Python module's help function is called to print out information to standard output, at which point the program exits. If, on the other hand, the arguments check out, the pattern-label example pairs are read from the indicated example file, some parameters for the learning model are set, some preliminary constraints are initialized, the learning model's structures are defined, and then the learning process begins.</p><p>This learning process repeatedly iterates over all training examples. For each example, the label associated with the most violated constraint for the pattern is found, the feature vector Ψ describing the relationship between the pattern and the label is computed, and the loss Δ is computed. From the Ψ and Δ, the program determines if the most violated constraint is violated <em>enough</em> to justify adding it to the model. If it is, then the constraint is added, and the program moves on to the next example. In the event that no constraints were added in an iteration, the algorithm either lowers its tolerance or, if minimum tolerance has been reached, ends the learning process.</p><p>Once learning has finished, statistics related to learning may be printed out, the model is written to a file, and the program exits.</p><a class="bookmark" name="classification"><h2>Overview of <code>svm_python_classify</code></h2></a><map name="classificationmap"><area shape="rect" coords="33,136,183,179" href="#detail-classify_struct_example"><area shape="rect" coords="49,201,169,246" href="#detail-write_label"><area shape="rect" coords="31,267,188,312" href="#detail-loss">
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -