📄 lucene.net.analysis.cn.xml
字号:
<?xml version="1.0"?>
<doc>
<assembly>
<name>Lucene.Net.Analysis.Cn</name>
</assembly>
<members>
<member name="T:Lucene.Net.Analysis.Cn.ChineseAnalyzer">
<summary>
Title: ChineseAnalyzer
Description:
Subclass of org.apache.lucene.analysis.Analyzer
build from a ChineseTokenizer, filtered with ChineseFilter.
Copyright: Copyright (c) 2001
Company:
@author Yiyi Sun
@version $Id: ChineseAnalyzer.java, v 1.2 2003/01/22 20:54:47 ehatcher Exp $
</summary>
</member>
<member name="M:Lucene.Net.Analysis.Cn.ChineseAnalyzer.TokenStream(System.String,System.IO.TextReader)">
<summary>
Creates a TokenStream which tokenizes all the text in the provided Reader.
</summary>
<returns>A TokenStream build from a ChineseTokenizer filtered with ChineseFilter.</returns>
</member>
<member name="T:Lucene.Net.Analysis.Cn.ChineseFilter">
<summary>
Title: ChineseFilter
Description: Filter with a stop word table
Rule: No digital is allowed.
English word/token should larger than 1 character.
One Chinese character as one Chinese word.
TO DO:
1. Add Chinese stop words, such as \ue400
2. Dictionary based Chinese word extraction
3. Intelligent Chinese word extraction
Copyright: Copyright (c) 2001
Company:
@author Yiyi Sun
@version $Id: ChineseFilter.java, v 1.4 2003/01/23 12:49:33 ehatcher Exp $
</summary>
</member>
<member name="T:Lucene.Net.Analysis.Cn.ChineseTokenizer">
Title: ChineseTokenizer
Description: Extract tokens from the Stream using Character.getType()
Rule: A Chinese character as a single token
Copyright: Copyright (c) 2001
Company:
The difference between thr ChineseTokenizer and the
CJKTokenizer (id=23545) is that they have different
token parsing logic.
Let me use an example. If having a Chinese text
"C1C2C3C4" to be indexed, the tokens returned from the
ChineseTokenizer are C1, C2, C3, C4. And the tokens
returned from the CJKTokenizer are C1C2, C2C3, C3C4.
Therefore the index the CJKTokenizer created is much
larger.
The problem is that when searching for C1, C1C2, C1C3,
C4C2, C1C2C3 ... the ChineseTokenizer works, but the
CJKTokenizer will not work.
@author Yiyi Sun
@version $Id: ChineseTokenizer.java, v 1.4 2003/03/02 13:56:03 otis Exp $
</member>
</members>
</doc>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -