📄 readme

📁 Amis - A maximum entropy estimator 一个最大熵模型统计工具
💻
字号:
Amis - A maximum entropy estimatorYusuke Miyao (yusuke@is.s.u-tokyo.ac.jp)Dec. 14, 2001Note: This document does not describe maximum entropy models forfeature forests.  For the details of them, see the manual.1. Introduction - A maximum entropy modelAmis is a parameter estimation tool for maximum entropy models.  Amaximum entropy model is one of probabilistic models which assigns aprobability p(e) to each _event_ e.  To cope with the data-sparsenessproblem, we classify events into some equivalent classes by _featurefunctions_ f(e), which map a set of events into an integer value.Suppose that we have an event e and a set of feature functions f_i(e),and a maximum entropy model is formulated as follows:         1  p(e) = - exp( sum( l_i * f_i(e) ) )         Z         1       = - prod( a_i^f_i(e) )         Zwhere l_i(lambda) or a_i (alpha) is a model parameter whichcorresponds to f_i, and Z is a normalizing factor i.e., the summationof p(e) for all possible events.That's all for a maximum entropy model, but we introduce somemodifications on the model.  Usually we have very large number of orinfinite number of events, and it is intractable to compute Z, whichrequires summation over all events.  In order to tackle with theproblem, our model gives a conditional probability instead of theabove one.            1  p(x|y) = --- prod( a_i^f_i(x,y) )           Z_yNote that e = (x,y), where x is called a _target_ and y a _history_.Since Z_y requires summation over all x, it can be tractable when x isfinite for every y.  For example in a part-of-speech tagger, x shouldbe a part-of-speech, and y should be various features of a targetword.  We should mention that it is not required that x be finite forany events, but x be finite for every y.2. Model and Event spaceTo estimate a maximum entropy model for a certain problem, we need twotypes of data: an initial model and an event space.  An initial modelis a starting point of parameter estimation, and is a set of featuresand corresponding initial alpha values.  An event space is a trainingdata which has a set of events, each of which consists of a list offeature values.  We give the data to Amis by two files called "modelfile" and "event file".  The format of the files are described inSection 4.3. OutputRunning Amis with an initial model and an event space, we get anestimated model as an output.  The estimated model is a set offeatures and corresponding estimated alpha values.  The format of theoutput file is the same as the initial model file (note that a featurename and an alpha value is separated by `tab', not spaces).  We cannow compute a probability p(x|y) given an event (x,y) with the abovedefinition.4. Data formatWe prepare three files: configuration file, initial model file, andevent file.  I describe the format of each of them.Note: In all files, a string from '#' to the end of a line is treatedas a comment (ignored).  Each token is separated by spaces or tabs,and Enter means the end of line.  If you want to use these charactersin a token, escape them with '\'.a) Configuration fileA configuration file consists of lines of a pair of a property nameand a value like this:----------------------------------------------------------------------DATA_FORMAT	AmisFEATURE_TYPE	integerMODEL_FILE	me.modelEVENT_FILE	me.eventOUTPUT_FILE	me.outputLOG_FILE	me.logESTIMATION_ALGORITHM	GISNUM_ITERATIONS	200REPORT_INTERVAL	1PRECISION	6----------------------------------------------------------------------Each property has the following meanings:DATA_FORMAT: Data format of model and event files.  Currently Amis    supports Amis-original format ("Amis"), format for fixed-target    events ("AmisFix"), and one for feature forests ("AmisTree").    The Amis format will be described later.  The default value is    "Amis".FEATURE_TYPE: The type of feature functions.  The default value is    "binary".  You can specify one of binary/integer/real.MODEL_FILE: File name of the model file.  The default value is    "amis.model".  You can specify multiple files separated by spaces.EVENT_FILE: File name of the event file.  The default value is    "amis.event".  You can specify multiple files separated by spaces.OUTPUT_FILE: File name of the estimated model file.  The default value is    "amis.output".LOG_FILE: File name of the log file.  The default value is "amis.log".ESTIMATION_ALGORITHM: Specify the algorithm for parameter estimation.    Currently Amis supports the following algorithms:        Generalized Iterative Scaling ("GIS"): Slower than IIS, but costs            less memory.        Optimized implmentation of Improved Iterative Scaling ("IIS"):            Faster than GIS, but costs much memory.        Limited-memory BFGS ("BFGS"): Application of general-purpose            function minimizer.  Usually much faster than GIS and IIS,            and costs less memory.        GIS and BFGS with MAP estimation ("GISMAP" and "BFGSMAP): MAP            estimation versions of GIS and BFGS.    The default value is "GIS".FEATURE_COUNT_HASH: Enable or disable to use a "map" class instead of    a vector for factoring in the IIS algorithm.  This option is    meaningful only for the IIS algorithm.  If the system uses too    much memory with the IIS algorithm, try this option.NUM_ITERATIONS: The number of iterations.  The default value is 200.MEMORY_SIZE: The memory size for limited-memory BFGS.  The default value is 5.REPORT_INTERVAL: Log-likelihood, minimum update value, and maximum    update value are reported in each interval.  The default value is    1.PRECISION: The precision of data of a model and events.  The default    value is 6.EVENT_ON_FILE: Whether an event space is put on a file or a memory.    The default value is "FALSE", which means an event space is put on    a memory.  Of course on-memory one is much faster.  If you don't    have enough memory to run the estimator, set "TRUE".EVENT_ON_FILE_NAME: The name of a temporary file to put an event    space.  The default value is "amis.event.tmp".  This property is    meaningful only when the "EVENT_ON_FILE" property is set to    "TRUE".b) Model fileInitial and estimated model files have the same format as describedbelow.  We can therefore use the estimated model file as an initialmodel file for further improving the estimation.  A model fileconsists of lines of a pair of a feature name and its correspondinginitial alpha value.----------------------------------------------------------------------feature1    1.000000e+00feature2    1.000000e+00feature3    1.000000e+00----------------------------------------------------------------------Feature names can be any string which does not contain spaces andcolons.  Alpha values can be specified by fixed or scientificnotation.c) Event fileAn event file consists of event descriptions, each of which has anevent ID, an observed event and complement events, that have differenttargets x and the same histories y.  Since we need to sum upprobabilities for all x under given y, we must specify all thecomplement events for each observed event.  An event file has thefollowing format:----------------------------------------------------------------------event_11    feature1:2 feature30    feature10    feature2:3event_21    feature2 feature3:50    feature2----------------------------------------------------------------------Each event description is separated with a blank line.  Thedescription begin with the description of an observed event.  Thebeginning of the line has an event ID, which is actually ignored byAmis.  The rest of the line describes lists of activated features.  Ifa feature list has an empirical count of the event in the trainingdata, it has an empirical count in the beginning of the line.  If youhave exactly the same event e = (x,y) in your training data, you canpack them into one description.  The rest of the line describescomplement events, whose empirical counts are 0.  You can specify morethan two lines which have positive empirical counts, and it means thatan observed event is ambiguous.  You can specify the feature valueafter ':', if the value of the feature function is not 1.5. InstallSince the detailed installation process is described in "INSTALL", weoverview the installation process here.  ">" is a command-line prompt.----------------------------------------------------------------------> ./configurecreating cache ./config.cache ...creating Makefilecreating config.h> makec++ -DHAVE_CONFIG_H -I. -I. -I.     -g -O2 -ffast-math -fomit-frame-pointer -fstrict-aliasing -w -Wall -c AmisFormat.cc ...> make install----------------------------------------------------------------------If you want to install the system in your home directory, give"--prefix" option to the configure script.  You can see other optionsby "--help" option.  Options specific to this package are as follows.--enable-debug: Whether to print debugging messages.  Default is "no"    and print out no debugging messages.  Specify a number (0-5) to    enable messages.  A larger value will print more messages.--enable-profile: Whether to profile the execution.  Default is "no"    and no profiling is enabled.  Specify a number (0-5) to enable    profiling.  A larger value will enable more profiling.--enable-feature-lambda: Whether to use lambda-values instead of alphas    in computation of probabilities.  Using alphas is faster, but using    lambdas is robust.  If the default (using alphas) causes parameters    to be infinity or nan, try this option.You have two optional "make" targets.  "make check" will test thecompiled code with some simple examples in the "test/" directory."make html" will make HTML documents.For maintainers:You need the following steps to get the "configure" script, which isnot included in the CVS repository.----------------------------------------------------------------------> aclocal; autoconf; autoheader; automake----------------------------------------------------------------------After that, follow the above installation scheme.  You will need theabove steps whenever you modified "configure.in", or "Makefile.am".6. Running the estimatorTo run Amis, first prepare three data files as described in Section 4,and run the following command:    amis [OPTIONS] [CONFIG_FILE]CONFIG_FILE is a configuration file described in Section 4.a.  You canoverride properties defined in CONFIG_FILE by following command-lineoptions.[-h|--help]: Print help messages.[-f|--feature_type] feature_type: Same as the FEATURE_TYPE property.[-m|--model-file] model_file: Same as the MODEL_FILE property.[-e|--event-file] event_file: Same as the EVENT_FILE property.[-o|--output-file] output_file: Same as the OUTPUT_FILE property.[-l|--log-file] log_file: Same as the LOG_FILE property.[-d|--data-format] data_format: Same as the DATA_FORMAT property.[-a|--estimation-algorithm] estimation_algorithm: Same as the    ESTIMATION_ALGORITHM property.[-i|--num-iterations] num_iterations: Same as the NUM_ITERATIONS    property.[-n|--num-newton-iterations] num_newton_iterations: Same as the    NUM_NEWTON_ITERATIONS property.[-r|--report-interval] report_interval: Same as the REPORT_INTERVAL    property.[-p|--precision] precision: Same as the PRECISION property.7. ExampleAlong with a simple example, I describe how to use the estimator.Some other examples will be found in "test/" directory.  Suppose weare making a maximum entropy model for POS tagging.  For example, tosentence "I like handball", we can assign POS to each word, like"I/Noun like/Verb handball/Noun".  We consider the tagging task as asequence of _events_, each of which is the task to assign one of tagsto each word given a previous word and tag.  For example, when we arelooking at "like", we select a tag (_target_) for "like" under thecontext (_history_) of having "I/Noun" as a previous word.  In theAmis format, tagging events are expressed as following ("BOS" is for"Beginnig Of Sentence").----------------------------------------------------------------------event_BOS/BOS-I/Noun 1 BOS/BOS-I/Noun */*-I/Noun */*-*/Noun 0*/*-*/Verb 0 */*-*/Prep 0 */*-*/Modifevent_I/Noun-like/Verb0  I/Noun-like/Noun */Noun-like/Noun */*-like/Noun */*-*/Noun1  I/Noun-like/Verb */Noun-like/Verb */*-like/Verb */*-*/Verb0  */Noun-like/Prep */*-like/Prep */*-*/Prep0  */*-*/Modifevent_like/Verb-handball/Noun...----------------------------------------------------------------------Each block separated by a blank line corresponds to each event.  Thefirst line of each block is a name of an event, which has in fact noeffect to the estimator (just for human readability).  Other linesdescribe the features activated for the event.  Each line correspondsto each target, in this example, Noun, Verb, Prep, and Modif.  Everyevent description must have every line for all possible targets forthe history.  The beginning of the line describes whether the event is"observed" or not.  If the pair of target and history corresponding tothe line is observed in the training data, it has the positive number.If it is not observed, for example "assign Verb for I", it is 0.The model file will be as follows.----------------------------------------------------------------------BOS/BOS-I/Noun  1.0*/*-I/Noun      1.0*/*-*/Noun      1.0*/*-*/Verb      1.0*/*-*/Prep      1.0*/*-*/Modif     1.0...----------------------------------------------------------------------Running the estimator, and we get will get the output file like this:----------------------------------------------------------------------BOS/BOS-I/Noun  8.03*/*-I/Noun      1.45*/*-*/Noun      0.84*/*-*/Verb      0.72*/*-*/Prep      0.54*/*-*/Modif     0.48...----------------------------------------------------------------------Using these parameter values, we can compute the probability of anevent.  For example, un-normalized score for event "assign Noun for Iunder the context BOS/BOS" is,  q(Noun|BOS/BOS-I/*) = prod( a_i^f_i(Noun|BOS/BOS-I/*) )                      = prod( 8.03 * 1.45 * 0.84 )                      = 9.78Similarly, q(Verb|...)=0.72, q(Prep|...)=0.54, q(Modif|...)=0.48.Hence, the probability of the event of assining Noun is,                         1  p(Noun|BOS/BOS-I/*) = --- q(Noun|BOS/BOS-I/*)                        Z_y                               9.78                      = -------------------                        9.78+0.72+0.54+0.48                      = 0.8498. Notea) You should exclude features with 0 empirical count.  Weights forSuch features should be 1.0.  Including such features have no impacton estimation, but the estimation will be slower.c) If the estimator requires too much memory, try the EVENT_ON_FILEoption.  It might be slow, but requires much less memory.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -