📄 readme.html

📁 pocket_crf_0.45
💻 HTML
📖 第 1 页 / 共 2 页
字号:
12 下一页
<html>	<head>		<title>Pocket CRF</title>		<link type="text/css" rel="stylesheet" href="default.css">	</head><body>	<h1>Pocket CRF</h1>	<h2>Contents</h2>	<ul>      		<li><a href="#introduction">Introduction</a></li>      		<li><a href="#highlights">Highlights</a></li>      		<li><a href="#news">News</a></li>      		<li><a href="#usage">Usage</a></li>      		<li><a href="#reference">Reference</a></li>      		<li><a href="#todo">To do</a></li>      	</ul>	<h2><a name="introduction">Introduction</a></h2>      	<p>Pocket CRF is a simple open source <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">Conditional Random Fields (CRFs)</a>      	package, developed for practical sequence labeling tasks in NLP research. </p>      	      		<h2><a name="highlights">Highlights</a></h2>	<ul>		<li>No order limitation: Any n-gram features can be adopted for	training and testing.</li>		<li>Use L1 norm regularization for fast feature selection.</li>		<li>Multi-thread training.</li>		<li>Less memory requirement for training.</li>		<li>Can perform averaged perceptron and passive aggressive training.</li>		<li>Use LBFGS method for fast training.</li>		<li>Can perform n-best outputs.</li>		<li>Can output marginal probabilities for all candidates.</li>	</ul>			<h2><a name="news">News</a></h2>	<ul>		<li>Pocket CRF 0.45</li>		<ul>			<li>Support online passive aggressive learning, which requires much less iterations than CRF, while keeps comparable performance. To use passive aggressive, add option "-a 2" in training command. Note that the iteration number should be specified, e.g. "-i 10"</li>			<li>Add option "-m" to for efficient training: load all data into memory for fast training, or save them to disk to reduce memory cost.</li>			<li>Support union of features, see "training file format".</li>		</ul>		<li>Pocket CRF 0.44</li>		<ul>			<li>Speed up first order markov chain training</li>			<li>Add averaged perceptron training algorithm, which requires much less iterations than CRF, while keeps comparable performance. 			To use averaged perceptron, add option "-a 1" in training command. Note that the iteration number should be specified 			when average perceptron used. e.g. "-i 10"</li>			<li>Add option "-d" to specify iteration depth in LBFGS</li>			<li>Templates share the same "%y" no longer need arranged closedly</li>		</ul>		<li>Pocket CRF 0.43</li>		<ul>			<li>Fix several bugs</li>			<li>Less memory requirement for training, which implies you could use much more features than before. Empirically, 			you could use more than 30,000,000 features on 32 bit 2G memory system. Memory requirement grows when more threads			used. The cost is additional disk space requirement, the volume is (sometimes more than) 128*feature_number 			bytes, e.g., if you use 10,000,000 features, 1.28 G disk space is required. And training process is a little bit			longer due to the IO operation. However, for large training data set, the additional time cost is trivial compared			with the whole training process.</li>		</ul>	</ul>	<ul>		<li>Pocket CRF 0.42</li>		<ul>			<li>Fix several bugs</li>			<li>Support null features</li>			<li>Provide command</li>			<li>Old APIs in version 0.41 are remained</li>		</ul>	</ul>	<ul>		<li>Pocket CRF 0.41</li>		<ul>			<li>Refine feature count cut off method</li>			<li>Can perform L1 norm regularization for feature selection</li>			<li>Can perform multi-thread training</li>			<li>Old APIs in version 0.40 are remained, can read version 0.40 model file</li>		</ul>	</ul>	<h2><a name="usage">Usage</a></h2>	<ul>		<li>Testing Pocket CRF on your computer</li>		<p>After downloading Pocket CRF package, unzip it, and switch to the directory that contains this document.<br>		If you are in Windows platform, type the following command to learn model:<br>		<b>crf_learn template train model</b><br>		After learning, type the following command to test:<br>		<b>crf_test model key result</b><br>		When you see "label precision:0.936", this means Pocket CRF has run correctly on your computer.<br><br>		If you use Linux or cgywin, you could follow the steps below:<br>		Type command to generate crf_test:<br>		<b>make</b><br>		<b>mv crf crf_test</b><br>		Edit main.cpp, go to line 368 change the code "return main_test(argc,argv);" to "return main_learn(argc,argv);"<br>		Quit editing, type command to generate crf_learn:<br>		<b>make</b><br>		<b>mv crf crf_learn</b><br>		Then type command to test:<br>		<b>./crf_learn template train model</b><br>		<b>./crf_test model key result</b><br>		Then you will see "label precision:0.936", testing success!<br>		</p>				<li>Training</li>		<p>		To use Pocket CRF for training model, use command like "<b>crf_learn template train model</b>". Here 3 file names should		be given one by one: template file, train file, model file. In this example, the 3 names are "template","train","model".		The first 2 files should be prepared, and the last file is generated by Pocket CRF.</p>		<li>Training file format</li>		<p>Training data should be arranged like tables in the file. Each table denotes a labeled sequence. Use enter character to 		separate the tables. Each row in the table denotes the observation(s) and the labels for the corresponding token, separated by		tabular character. Number of columns should be fixed through out, and the last column denotes the label sequence. Each cell in the row denotes a union of observation(s), separated by space.                 Here is an example: </p>		<pre>At      IN      IN DT JJ        Othe     DT      IN DT JJ NN     Bsame    JJ      IN DT JJ NN ,   Itime    NN      DT JJ NN , PRP  I,       ,       JJ NN , PRP VBZ Ohe      PRP     NN , PRP VBZ RB Bremains VBZ     , PRP VBZ RB JJ Ofairly  RB      PRP VBZ RB JJ IN        Opessimistic     JJ      VBZ RB JJ IN DT Oabout   IN      RB JJ IN DT NN  Othe     DT      JJ IN DT NN IN  Boutlook NN      IN DT NN IN NNS Ifor     IN      DT NN IN NNS ,  Oimports NNS     NN IN NNS , VBN B,       ,       IN NNS , VBN VBD        Ogiven   VBN     NNS , VBN VBD JJ        Ocontinued       VBD     , VBN VBD JJ NN Bhigh    JJ      VBN VBD JJ NN CC        Iconsumer        NN      VBD JJ NN CC NN Iand     CC      JJ NN CC NN NNS Icapital NN      NN CC NN NNS NNS        Igoods   NNS     CC NN NNS NNS . Iinflows NNS     NN NNS NNS .    I.       .       NNS NNS .       OHe      PRP     PRP VBZ DT      Breckons VBZ     PRP VBZ DT JJ   Othe     DT      PRP VBZ DT JJ NN        Bcurrent JJ      VBZ DT JJ NN NN Iaccount NN      DT JJ NN NN MD  Ideficit NN      JJ NN NN MD VB  Iwill    MD      NN NN MD VB TO  Onarrow  VB      NN MD VB TO RB  Oto      TO      MD VB TO RB #   Oonly    RB      VB TO RB # CD   B#       #       TO RB # CD CD   I1.8     CD      RB # CD CD IN   Ibillion CD      # CD CD IN NNP  Iin      IN      CD CD IN NNP .  OSeptember       NNP     CD IN NNP .     B.       .       IN NNP .        O		</pre>
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -