📄 manual.html
字号:
<!doctype html public "-//w3c//dtd html 4.0//en"><html><head><link rev="made" href="mailto:yusuke@is.s.u-tokyo.ac.jp"><link rel="stylesheet" type="text/css" href="manual.css"><title>Amis</title></head><body><h1>Amis - A maximum entropy estimator for feature forests</h1><div class=author>May 14th, 2003<br>Yusuke Miyao<br>Department of Computer Science, University of Tokyo<br><a href="mailto:yusuke@is.s.u-tokyo.ac.jp">yusuke@is.s.u-tokyo.ac.jp</a></div><a href="manual.ja.html">Japanese version</a><hr><h2><a name="index">Contents</a></h2><ul> <li><a href="#introduction">Introduction</a> <li><a href="#requirements">Requirements</a> <ul> <li><a href="#hardware">Hardware</a> <li><a href="#software">Software</a> </ul> <li><a href="#installation">Installation and startup</a> <ul> <li><a href="#install">Installation</a> <li><a href="#running">Startup</a> </ul> <li><a href="#external">External specs</a> <li><a href="#input_output">Input/output</a> <ul> <li><a href="#amis_input">Input (Amis format)</a> <li><a href="#amistree_input">Input (AmisTree format)</a> <li><a href="#amis_output">Output</a> </ul> <li><a href="#example">Example</a> <li><a href="#internal">Internal specs</a> <li><a href="#others">Misc.</a> <li><a href="#references">References</a></ul><hr><h2><a name="introduction">Introduction</a></h2><p>This software is a parameter estimator for <ahref="#Berger1996">maximun entropy models [1]</a>. Given a set ofevents as training data, the program outputs parameters that maximizethe likelihood of the training data. The software supports thefollowing functions.</p><ul> <li><a href="#Miyao2002">Parameter estimation algorithm for feature forests [2]</a> <li>Parameter estimation algorithms GIS, <a href="#Pietra1997">IIS [3]</a>, and <a href="#Nocedal1980">limited-memory BFGS [4]</a> <li>MAP estimation using Gaussian prior distributions <li>Optimized IIS Algorithm <li>Selectable features: binary, integer, and real <li>Logging for files <li>Run-time configuation of various parameters</ul><p>A maximum entropy model gives a conditional probability <spanclass=math>p(x|y)</span> of an event <spanclass=math>e=<x,y></span>, where <span class=math>x</span> is atarget event and <span class=math>y</span> is a history event.Characteristics of an event are represented with a bundle of<em>feature functions</em> (or <em>feature</em> in short) <spanclass=math>f_i</span>. Each characteristic corresponds to each <spanclass=math>i</span>, and <span class=math>f_i(x|y)</span> takes thenon-zero positive value when an event <spanclass=math><x,y></span> has the <span class=math>i</span>-thcharacteristic.</p><p>Given an event <span class=math>e=<x,y></span> and featurefunctions <span class=math>f_i</span>, a maximum entropy model gives aprobability by the following formula.<blockquote><span class=math>p(x|y) = 1/(Z_y) exp( sum( l_i * f_i(x,y) )= 1/(Z_y) prod( a_i^f_i(x,y) )</span></blockquote><span class=math>l_i</span> (lambda) or <span class=math>a_i</span>(alpha) is a model parameter, and intuitively, it represents theweight of a feature <span class=math>f_i</span>. <spanclass=math>Z_y</span> is a normalizing factor for letting thesummation of probabilities <span class=math>p(x|y)</span> be 1 for allpossible <span class=math>x</span> under a given <spanclass=math>y</span>.</p><p>Given a set of feature functions and training data of observed eventsas an input, Amis computes and outputs the optimal parameters <spanclass=math>a_i</span>. The program supports several algorithms forparameter estimation: GIS, IIS, BFGS, and their modified version forsupporting feature forests.</p><hr><h2><a name="requirements">Requirements</a></h2><h3><a name="hardware">Hardware</a></h3><dl> <dt>CPU <dd>PentiumIII 500MHz or above <dt>Memory <dd>256 MB or above <dt>Hard disk <dd>50MB or above</dl><p>The program can be compiled and run on IA machines of the above specs,or SPARC machines of the equivalent specs. More memory/hard disk willbe required depending on the size of input data.</p><h3><a name="software">Software</a></h3><dl> <dt>OS <dd>Linux, Solaris <dt>Compiler <dd>g++ 3.2 or above (g++ 3.0.2 or above, g++ 2.95.3 or above) <dt>Library <dd>Standard C++ Library</dl><hr><h2><a name="installation">Installation and startup</a></h2><h3><a name="install">Installation</a></h3><p>Amis supports "configure" script, and is compiled and installed by thefollowing instructions. (In the following examples, % represents acommand prompt.)</p><ol> <li>Run "configure" script<br> <pre>% ./configure</pre> As a default, the executable is installed in /usr/local/. When you want to install in another directory ($DIR in the example), specify the option as following. <pre>% ./configure --prefix=$DIR</pre> Other than "--prefix", various options are supported. <table border> <tr><th>Option<th>Default<th>Valid values<th>Effect</tr> <tr><td>--enable-debug<td>no<td>0 - 5 or no <td>Specify whether debug messages are printed or not. The greater value is given, and more messages are printed.</tr> <tr><td>--enable-profile<td>no<td>0 - 5 or no <td>Specify whether profiling (measuring the execution time of each function) is enabled or not. The greater value is specified, and more functions are profiled.</tr> <tr><td>--enable-feature-lambda<td>no<td>no or yes <td>Whether to use lambda-values instead of alphas in computation of probabilities. Using alphas is faster, but using lambdas is robust. If the default (using alphas) causes parameters to be infinity or nan, try this option.</tr> </table> For other options, see the help message of configure script (printed by --help option) or manuals. <li>Compile the program by "make" command <pre>% make</pre> The executable "amis" is created. <li>Check whether the compilation is successful or not. <pre>% make check</pre> <li>Install the executable and manuals. <pre>% make install</pre></ol><p>The above instructions will install "/usr/local/bin/amis".</p><h3><a name="running">Startup</a></h3><p>To start up Amis, execute "amis" with an argument specifying aconfiguration file (<a href="#input_output">described later</a>).<pre>% amis [configuration file]</pre>When you omit the argument, "amis.conf" will be a configuration fileas a default. When the configuration file is not found (or cannot beread), the program stops with an error.</p><p>You can specify the following options at startup-time. Other optionswill be shown by "-h" or "--help" option.<table border><tr><th>Option<th>Default<th>Valid values<th>Effect</tr><tr><td>-h<td>nothing<td>nothing<td>Print help messages</tr><tr><td>-f<td>binary<td>[binary|integer|real] <td>Specify the type of features</tr><tr><td>-m<td>amis.model<td>file name <td>Specify the name of a model file (<a href="#input_output">described later</a>)</tr><tr><td>-e<td>amis.event<td>file name <td>Specify the name of an event file (<a href="#input_output">described later</a>)</tr><tr><td>-o<td>amis.output<td>file name <td>Specify the name of an output file (<a href="#input_output">described later</a>)</tr><tr><td>-l<td>amis.log<td>file name <td>Specify the name of a log file (<a href="#input_output">described later</a>)</tr><tr><td>-d<td>Amis<td>[Amis|AmisTree|AmisFix] <td>Specify the data file format</tr><tr><td>-a<td>GIS<td>[GIS|IIS|BFGS|BFGSMAP] <td>Specify the parameter estimation algorithm</tr><tr><td>-i<td>200<td>non-zero positive value <td>Specify the number of iterations</tr><tr><td>-n<td>200<td>non-zero positive value <td>Specify the number of iterations in Newton's method</tr><tr><td>-s<td>5<td>non-zero positive value <td>Specify the memory size for limited-memory BFGS</tr><tr><td>-r<td>1<td>non-zero positive value <td>Specify the interval of printing the progress of computation</tr><tr><td>-p<td>6<td>non-zero positive value <td>Specify the significant digits</tr><tr><td>-s<td>1.0<td>positive value <td>Specify the value of sigma used for MAP estimation</tr></table></p><p>The configuration is enabled in the following order.<ol> <li>Startup options <li>Configuration file <li>Default values</ol></p><hr><h2><a name="external">External specs</a></h2><p>This program is composed of the following modules<ul> <li>Data file management (Amis, AmisTree, AmisFix) <li>Model <li>Event space (Standard, Feature forest) <li>Vector of empirical expectations <li>Vector of model expectations <li>Algorithm (GIS, IIS, BFGS, BFGSMAP) <li>Property</ul>Model file and event file are processed by the responsible modules,and model object and event space object are created. Given model andevent space objects, the algorithm module computes parameterestimation and outputs the model data. The property module reads aconfiguration file and controlls the other modules. All datastructures are provided as a (template) library, and users can use theestimator in users' programs instead of using the "amis" executable.To see the details, see the "AmisDriver" class, which is using variousinterfaces of the amis library.</p><pre> Model file Event file Configuration file | | V V ----------------------- ---------------| | | || Data file manager |<---| Property || | | | ----------------------- --------------- | | | | V V | ----------- --------------- | | | | | | | Model |---->| | | | | | | | ----------- | Algorithm |---> Output model | --------------- | | | | | | | --->| Event space |---->| | | | | | --------------- --------------- | ----------------------- | Model expectation | | Empirical expectation | -----------------------</pre><ul> <li>Input<br> Configuration file, Model file, Event file <li>Output<br> Estimated model file</ul><hr><h2><a name="input_output">Input/output</a></h2><p>This section explains the data file format: Amis format and AmisTreeformat. AmisTree format is used for estimating maximum entropy modelsfor feature forests.</p><p>For each file, # to the end of line is a comment and ignored.Comments are treated as a space. Each token is separated by spaces ortabs, and "new line" represents the end of line. Colons (:) are aspecial character. When you want to use these special characters as apart of tokens, escape the character with a backslash (\). Abackslash itself is represented as \\.</p><h3><a name="amis_input">Input (Amis format)</a></h3><p>To use amis, <a href="#amis_config_file">configuration file</a>·<ahref="#amis_model_file">model file</a>, and <ahref="#amis_event_file">event file</a> should be prepared. Each ofthem is explained below.</p><h4><a name="amis_config_file">Configuration file</a></h4><p>In a configuration file, specify the name of an option and its value.See the following example.<pre>DATA_FORMAT AmisFEATURE_TYPE integerMODEL_FILE me.modelEVENT_FILE me.event
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -