⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ictclas_api.htm

📁 关于bordland公司dataset使用说明
💻 HTM
📖 第 1 页 / 共 5 页
字号:
                </p>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=463>
                <p style="margin-top: 0; margin-bottom: 0">
                <font face="华文楷体"><a href="#evaluation">ICTCLAS的性能评估</a></font>
                <ul>
                  <li>
                    <p style="margin-top: 0; margin-bottom: 0"><font face="华文楷体"> <a href="#973">ICTCLAS在973评测中的测试结果</a></font></li>               
                  <li>
                    <p style="margin-top: 0; margin-bottom: 0"><font face="华文楷体"><a href="#acl">第一届国际分词大赛的评测结果</a></font></li>
                </ul>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=463>
                <p style="margin-top: 0; margin-bottom: 0">
                <font face="华文楷体"><a href="#Dairly">ICTCLAS大事记</a></font>
                </p>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<P><BR>[Hua-Ping Zhang 2003 : Chapter 1 - Introduction / 1]                             
</P>
<h2><a name="background"><span style="font-family:黑体;mso-ascii-font-family:
Arial"></a>背景</span></h2>
<p class="MsoNormal" style="text-indent:21.0pt;mso-char-indent-count:2.0;
mso-char-indent-size:10.5pt"><span style="font-family:宋体;mso-ascii-font-family:
&quot;MS Song\, \000B&#12;\, Beijing&quot;;mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">&nbsp;<font color="#800000">词是最小的能够独立活动的有意义的语言成分</font></span><font color="#800000"><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">,</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">但汉语是以字为基本的书写单位,词语之间没有明显的区分标记,因此,中文词法分析是中文信息处理的基础与关键。所有涉及中文内容处理的系统,如果没有一个好的中文词法分析系统支持,正确率都会受很大影响。具体来说,中文词法分析的主要应用领域包括:</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
</o:p>
</span></font></p>
<p class="MsoNormal" style="margin-left:63.0pt;text-indent:-21.0pt;mso-list:l5 level1 lfo7;
tab-stops:list 63.0pt"><span lang="EN-US" style="font-family:
Wingdings"><font color="#800000">l<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                            
</span></font></span><font color="#800000"><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">信息检索(搜索引擎)</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
</o:p>
</span></font></p>
<p class="MsoNormal" style="margin-left:63.0pt;text-indent:-21.0pt;mso-list:l5 level1 lfo7;
tab-stops:list 63.0pt"><span lang="EN-US" style="font-family:
Wingdings"><font color="#800000">l<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                            
</span></font></span><font color="#800000"><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">机器翻译</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
</o:p>
</span></font></p>
<p class="MsoNormal" style="margin-left:63.0pt;text-indent:-21.0pt;mso-list:l5 level1 lfo7;
tab-stops:list 63.0pt"><span lang="EN-US" style="font-family:
Wingdings"><font color="#800000">l<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                            
</span></font></span><font color="#800000"><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">文本分类、摘要、过滤</span></font></p>
<p class="MsoNormal" style="margin-left:63.0pt;text-indent:-21.0pt;mso-list:l5 level1 lfo7;
tab-stops:list 63.0pt"><span lang="EN-US" style="font-family:
Wingdings"><font color="#800000">l<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                            
</span></font></span><font color="#800000"><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">信息提取</span></font></p>
<p class="MsoNormal" style="margin-left:63.0pt;text-indent:-21.0pt;mso-list:l5 level1 lfo7;
tab-stops:list 63.0pt"><span lang="EN-US" style="font-family:
Wingdings"><font color="#800000">l<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                            
</span></font></span><font color="#800000"><span style="font-family:宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;
mso-hansi-font-family:&quot;Times New Roman&quot;">其他和中文内容处理相关的领域</span></font></p>
<p class="MsoNormal" style="text-indent:21.0pt;mso-char-indent-count:2.0;
mso-char-indent-size:10.5pt"><span style="font-family:宋体;mso-ascii-font-family:
&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;"><font color="#800000">中文词法分析又是一个非常困难的问题,其难点主要体现在以下几方面:</font></span></p>
<p class="MsoNormal" style="margin-left:63.0pt;text-indent:-21.0pt;mso-list:l5 level1 lfo7;
tab-stops:list 63.0pt"><span lang="EN-US" style="font-family:
Wingdings"><font color="#800000">l<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                            
</span></font></span><font color="#800000"><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">词语切分:由于汉语词语之间没有空格分开,需要从连续的汉字串中正确辨认汉语的词语,常见的歧义现象如:“的确切”可能是“的确/切”或者“的/确切”,“马上”可能是一个词表示很快,也可能是两个词“马/上”表示位置;这些类型的歧义现象在汉语中非常常见,会对汉语词语切分造成极大的干扰;</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
</o:p>
</span></font></p>
<p class="MsoNormal" style="margin-left:63.0pt;text-indent:-21.0pt;mso-list:l5 level1 lfo7;
tab-stops:list 63.0pt"><span lang="EN-US" style="font-family:
Wingdings"><font color="#800000">l<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                            
</span></font></span><font color="#800000"><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">未定义词识别:词典中不可能收录所有的词语,大量的人名、地名、机构名、外来语译名、新词语等等,如“王小山、十里堡、北京计算机研究所、瓦杰帕依、非典”等等,都需要通过软件来自动识别,而在汉语中这些未定义词没有空格作为边界,其组成成分又是有意义的普通汉字,因此识别难度很大;</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
</o:p>
</span></font></p>
<p class="MsoNormal" style="margin-left:63.0pt;text-indent:-21.0pt;mso-list:l5 level1 lfo7;
tab-stops:list 63.0pt"><span lang="EN-US" style="font-family:
Wingdings"><font color="#800000">l<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                            
</span></font></span><font color="#800000"><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">词性标注:汉语中词语兼类情况非常常见,比如说“领导”可以是动词、也可以是名词,要正确标注出每个词的词性,也有很多困难。</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
</o:p>
</span></font></p>
<span style="font-family: 宋体; mso-ascii-font-family: 'MS Song\', '\000B&#12;\', Beijing; mso-hansi-font-family: 'MS Song\', '\000B&#12;\', Beijing"><font color="#800000">&nbsp;&nbsp;&nbsp;                                     
虽然汉语词法分析的研究已经有了很长的历史,但在很多应用系统中,还是非常缺乏一个集成的、能够全面解决上述问题的汉语词法分析系统。中国科学院计算技术研究所在多年研究工作积累的基础上,耗时两年研制出了汉语词法分析系统ICTCLAS(Institute                                     
of Computing Technology, Chinese Lexical Analysis System)。该系统在国内外多次著名的技术评测(包括我国973专家组组织的评测和国际中文处理研究机构SigHan组织的评测)中都获得了多项第一名,在国内外产生了广泛的影响,并已被应用到国内外许多著名大学、研究机构和公司的科研教学和商业系统中,产生了良好的经济和社会效益。</font></span>                                    
<p>&gt;&gt;<span style="font-family: 宋体; mso-ascii-font-family: 'MS Song\'',  '\000B \'', Beijing; mso-hansi-font-family: 'MS Song\'',  '\000B \'', Beijing"><a href="#chap1">Back</a>&nbsp; 
</span>&gt;|<span style="font-family: 宋体; mso-ascii-font-family: 'MS Song\'',  '\000B \'', Beijing; mso-hansi-font-family: 'MS Song\'',  '\000B \'', Beijing"><a href="#header">Top</a></span></p>
<P>[Hua-Ping Zhang 2003 : Chapter 1 - Introduction / 2]       
</P>
<h2><span style="font-family:黑体;mso-ascii-font-family:
Arial"><A NAME="background"></A></span><span style="font-family: 黑体; mso-ascii-font-family: Arial"> 
</span><a name="ICTCLAS"><span lang="EN-US">ICTCLAS</span></a><span style="font-family: 黑体; mso-ascii-font-family: Arial; mso-bookmark: _Toc51384088">介绍</span></h2>
<p class="MsoNormal" style="text-indent:21.0pt;mso-char-indent-count:2.0;
mso-char-indent-size:10.5pt"><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">ICTCLAS</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">的最主要的特点在于采用了层叠隐马尔可夫模型(</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">Hierarchical                                     
Hidden Markov Model</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;                                    
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">),将汉语词法分析的主要问题(汉语分词、未定义词识别和词性标注)都统一到了一个完整的理论框架中,以获得最好的总体效果。</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"> 
<o:p>
</o:p>
</span></p>
<p class="MsoNormal" style="text-indent:21.0pt;mso-char-indent-count:2.0;
mso-char-indent-size:10.5pt"><span style="font-family:宋体;mso-ascii-font-family:
&quot;MS Song\, \000B&#12;\, Beijing&quot;;mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">该系统的功能有:中文分词;词性标注;命名实体识别;新词识别;用户词典。</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
</o:p>
</span></p>
<p class="MsoNormal" style="text-indent:21.0pt;mso-char-indent-count:2.0;
mso-char-indent-size:10.5pt"><span style="font-family:宋体;mso-ascii-font-family:
&quot;MS Song\, \000B&#12;\, Beijing&quot;;mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">特色在于:</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">C/C++</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">编写,支持</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">Linux</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">、FreeBSD及</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">Windows</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">多种</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">系列操作系统;</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">ICTCLAS</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">有</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">GB2312</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">和</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">BIG5</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">版本,可分别处理目简繁体中文;支持当前广泛承认的分词和词类标准,包括计算所词类标注集</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">ICTPOS3.0</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">,北大标准、滨州大学标准、国家语委标准、台湾“中研院”、香港“城市大学”;用户可以直接自定义输出的词类标准,定义输出格式;可按需要输出多个最优结果;所有功能模块均可拆卸组装。</span></p>
<p class="MsoNormal" style="text-indent:21.0pt;mso-char-indent-count:2.0;
mso-char-indent-size:10.5pt"><span style="font-family:宋体;mso-ascii-font-family:
&quot;MS Song\, \000B&#12;\, Beijing&quot;;mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">计算所汉语词法分析系统</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">ICTCLAS</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">同时还提供一套完整的</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">API</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">接口</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">(</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">包括:动态连接库,静态连接库,</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">Linux</span><span style="font-family: 宋体; mso-ascii-font-family: 'MS Song\', '\000B&#12;\', Beijing; mso-hansi-font-family: 'MS Song\', '\000B&#12;\', Beijing">和FreeBSD</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">下的库函数和</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">COM</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">组件</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">)</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">和相应的概率词典,开发者可以直接在自己的系统中调用</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">ICTCLAS</span><span style="font-family:宋体;mso-ascii-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;;
mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">,在分词和词性标注的基础上继续上层开发。</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
&nbsp;                           
</span></p>
<p class="MsoNormal" style="text-indent:21.0pt;mso-char-indent-count:2.0;
mso-char-indent-size:10.5pt"><span style="font-family:宋体;mso-ascii-font-family:
&quot;MS Song\, \000B&#12;\, Beijing&quot;;mso-hansi-font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;">欢迎相关领域的工程技术人员、研究人员使用,并提供宝贵意见。</span><span lang="EN-US" style="font-family:&quot;MS Song\, \000B&#12;\, Beijing&quot;"><o:p>
</o:p>
</span></p>
<p>&gt;&gt;<span style="font-family: 宋体; mso-ascii-font-family: 'MS Song\'',  '\000B \'', Beijing; mso-hansi-font-family: 'MS Song\'',  '\000B \'', Beijing"><a href="#chap1">Back</a>&nbsp; 
</span>&gt;|<span style="font-family: 宋体; mso-ascii-font-family: 'MS Song\'',  '\000B \'', Beijing; mso-hansi-font-family: 'MS Song\'',  '\000B \'', Beijing"><a href="#header">Top</a></span></p>
<P>[Hua-Ping Zhang 2003 : Chapter 1 - Introduction / 3]       
</P>
<h2><a name="evaluation"><span lang="EN-US">ICTCLAS</span></a><span style="font-family: 黑体; mso-ascii-font-family: Arial; mso-bookmark: _Toc51384094">的性能评估</span></h2>
<h3><a name="973"><span lang="EN-US" style="mso-bookmark: OLE_LINK1">ICTCLAS</span></a><span style="mso-bookmark: OLE_LINK1"><span style="font-family:
宋体;mso-ascii-font-family:&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">在</span><span lang="EN-US">973</span><span style="font-family:宋体;mso-ascii-font-family:
&quot;Times New Roman&quot;;mso-hansi-font-family:&quot;Times New Roman&quot;">评测中的测试结果</span></span></h3>
<p class="MsoNormal" style="text-indent:21.0pt;mso-char-indent-count:2.0;
mso-char-indent-size:10.5pt"><span lang="EN-US" style="mso-bidi-font-size:10.5pt;
font-family:宋体;mso-hansi-font-family:&quot;Times New Roman&quot;;mso-font-kerning:0pt">2002年7月6日,ICTCLAS参加了</span><span style="font-family:宋体">国家<span lang="EN-US">973英汉机器翻译第二阶段的</span></span><span style="mso-bidi-font-size:10.5pt;font-family:宋体;mso-hansi-font-family:&quot;Times New Roman&quot;;
mso-font-kerning:0pt">开放</span><span style="font-family:宋体">评测,测试结果如下:<span lang="EN-US"><o:p>
</o:p>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -