📄 readme.html
字号:
<html> <head> <title>Pocket CRF</title> <link type="text/css" rel="stylesheet" href="default.css"> </head><body> <h1>Pocket CRF</h1> <h2>Contents</h2> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#highlights">Highlights</a></li> <li><a href="#news">News</a></li> <li><a href="#usage">Usage</a></li> <li><a href="#reference">Reference</a></li> <li><a href="#todo">To do</a></li> </ul> <h2><a name="introduction">Introduction</a></h2> <p>Pocket CRF is a simple open source <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">Conditional Random Fields (CRFs)</a> package, developed for practical sequence labeling tasks in NLP research. </p> <h2><a name="highlights">Highlights</a></h2> <ul> <li>No order limitation: Any n-gram features can be adopted for training and testing.</li> <li>Use L1 norm regularization for fast feature selection.</li> <li>Multi-thread training.</li> <li>Less memory requirement for training.</li> <li>Can perform averaged perceptron and passive aggressive training.</li> <li>Use LBFGS method for fast training.</li> <li>Can perform n-best outputs.</li> <li>Can output marginal probabilities for all candidates.</li> </ul> <h2><a name="news">News</a></h2> <ul> <li>Pocket CRF 0.45</li> <ul> <li>Support online passive aggressive learning, which requires much less iterations than CRF, while keeps comparable performance. To use passive aggressive, add option "-a 2" in training command. Note that the iteration number should be specified, e.g. "-i 10"</li> <li>Add option "-m" to for efficient training: load all data into memory for fast training, or save them to disk to reduce memory cost.</li> <li>Support union of features, see "training file format".</li> </ul> <li>Pocket CRF 0.44</li> <ul> <li>Speed up first order markov chain training</li> <li>Add averaged perceptron training algorithm, which requires much less iterations than CRF, while keeps comparable performance. To use averaged perceptron, add option "-a 1" in training command. Note that the iteration number should be specified when average perceptron used. e.g. "-i 10"</li> <li>Add option "-d" to specify iteration depth in LBFGS</li> <li>Templates share the same "%y" no longer need arranged closedly</li> </ul> <li>Pocket CRF 0.43</li> <ul> <li>Fix several bugs</li> <li>Less memory requirement for training, which implies you could use much more features than before. Empirically, you could use more than 30,000,000 features on 32 bit 2G memory system. Memory requirement grows when more threads used. The cost is additional disk space requirement, the volume is (sometimes more than) 128*feature_number bytes, e.g., if you use 10,000,000 features, 1.28 G disk space is required. And training process is a little bit longer due to the IO operation. However, for large training data set, the additional time cost is trivial compared with the whole training process.</li> </ul> </ul> <ul> <li>Pocket CRF 0.42</li> <ul> <li>Fix several bugs</li> <li>Support null features</li> <li>Provide command</li> <li>Old APIs in version 0.41 are remained</li> </ul> </ul> <ul> <li>Pocket CRF 0.41</li> <ul> <li>Refine feature count cut off method</li> <li>Can perform L1 norm regularization for feature selection</li> <li>Can perform multi-thread training</li> <li>Old APIs in version 0.40 are remained, can read version 0.40 model file</li> </ul> </ul> <h2><a name="usage">Usage</a></h2> <ul> <li>Testing Pocket CRF on your computer</li> <p>After downloading Pocket CRF package, unzip it, and switch to the directory that contains this document.<br> If you are in Windows platform, type the following command to learn model:<br> <b>crf_learn template train model</b><br> After learning, type the following command to test:<br> <b>crf_test model key result</b><br> When you see "label precision:0.936", this means Pocket CRF has run correctly on your computer.<br><br> If you use Linux or cgywin, you could follow the steps below:<br> Type command to generate crf_test:<br> <b>make</b><br> <b>mv crf crf_test</b><br> Edit main.cpp, go to line 368 change the code "return main_test(argc,argv);" to "return main_learn(argc,argv);"<br> Quit editing, type command to generate crf_learn:<br> <b>make</b><br> <b>mv crf crf_learn</b><br> Then type command to test:<br> <b>./crf_learn template train model</b><br> <b>./crf_test model key result</b><br> Then you will see "label precision:0.936", testing success!<br> </p> <li>Training</li> <p> To use Pocket CRF for training model, use command like "<b>crf_learn template train model</b>". Here 3 file names should be given one by one: template file, train file, model file. In this example, the 3 names are "template","train","model". The first 2 files should be prepared, and the last file is generated by Pocket CRF.</p> <li>Training file format</li> <p>Training data should be arranged like tables in the file. Each table denotes a labeled sequence. Use enter character to separate the tables. Each row in the table denotes the observation(s) and the labels for the corresponding token, separated by tabular character. Number of columns should be fixed through out, and the last column denotes the label sequence. Each cell in the row denotes a union of observation(s), separated by space. Here is an example: </p> <pre>At IN IN DT JJ Othe DT IN DT JJ NN Bsame JJ IN DT JJ NN , Itime NN DT JJ NN , PRP I, , JJ NN , PRP VBZ Ohe PRP NN , PRP VBZ RB Bremains VBZ , PRP VBZ RB JJ Ofairly RB PRP VBZ RB JJ IN Opessimistic JJ VBZ RB JJ IN DT Oabout IN RB JJ IN DT NN Othe DT JJ IN DT NN IN Boutlook NN IN DT NN IN NNS Ifor IN DT NN IN NNS , Oimports NNS NN IN NNS , VBN B, , IN NNS , VBN VBD Ogiven VBN NNS , VBN VBD JJ Ocontinued VBD , VBN VBD JJ NN Bhigh JJ VBN VBD JJ NN CC Iconsumer NN VBD JJ NN CC NN Iand CC JJ NN CC NN NNS Icapital NN NN CC NN NNS NNS Igoods NNS CC NN NNS NNS . Iinflows NNS NN NNS NNS . I. . NNS NNS . OHe PRP PRP VBZ DT Breckons VBZ PRP VBZ DT JJ Othe DT PRP VBZ DT JJ NN Bcurrent JJ VBZ DT JJ NN NN Iaccount NN DT JJ NN NN MD Ideficit NN JJ NN NN MD VB Iwill MD NN NN MD VB TO Onarrow VB NN MD VB TO RB Oto TO MD VB TO RB # Oonly RB VB TO RB # CD B# # TO RB # CD CD I1.8 CD RB # CD CD IN Ibillion CD # CD CD IN NNP Iin IN CD CD IN NNP . OSeptember NNP CD IN NNP . B. . IN NNP . O </pre>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -