chinesetokenizer.html
来自「 Lucene是apache软件基金会[4] jakarta项目组的一个子项目」· HTML 代码 · 共 319 行 · 第 1/2 页
HTML
319 行
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!--NewPage-->
<HTML>
<HEAD>
<!-- Generated by javadoc (build 1.4.2_04) on Wed Feb 14 11:49:19 EST 2007 -->
<TITLE>
ChineseTokenizer (Lucene 2.1.0 API)
</TITLE>
<META NAME="keywords" CONTENT="org.apache.lucene.analysis.cn.ChineseTokenizer class">
<LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../../stylesheet.css" TITLE="Style">
<SCRIPT type="text/javascript">
function windowTitle()
{
parent.document.title="ChineseTokenizer (Lucene 2.1.0 API)";
}
</SCRIPT>
</HEAD>
<BODY BGCOLOR="white" onload="windowTitle();">
<!-- ========= START OF TOP NAVBAR ======= -->
<A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY="">
<TR>
<TD COLSPAN=3 BGCOLOR="#EEEEFF" CLASS="NavBarCell1">
<A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">
<TR ALIGN="center" VALIGN="top">
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A> </TD>
<TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> <FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="class-use/ChineseTokenizer.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A> </TD>
<TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A> </TD>
</TR>
</TABLE>
</TD>
<TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM>
</EM>
</TD>
</TR>
<TR>
<TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">
<A HREF="../../../../../org/apache/lucene/analysis/cn/ChineseFilter.html" title="class in org.apache.lucene.analysis.cn"><B>PREV CLASS</B></A>
NEXT CLASS</FONT></TD>
<TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">
<A HREF="../../../../../index.html" target="_top"><B>FRAMES</B></A>
<A HREF="ChineseTokenizer.html" target="_top"><B>NO FRAMES</B></A>
<SCRIPT type="text/javascript">
<!--
if(window==top) {
document.writeln('<A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>');
}
//-->
</SCRIPT>
<NOSCRIPT>
<A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>
</NOSCRIPT>
</FONT></TD>
</TR>
<TR>
<TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">
SUMMARY: NESTED | <A HREF="#fields_inherited_from_class_org.apache.lucene.analysis.Tokenizer">FIELD</A> | <A HREF="#constructor_summary">CONSTR</A> | <A HREF="#method_summary">METHOD</A></FONT></TD>
<TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">
DETAIL: FIELD | <A HREF="#constructor_detail">CONSTR</A> | <A HREF="#method_detail">METHOD</A></FONT></TD>
</TR>
</TABLE>
<A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= -->
<HR>
<!-- ======== START OF CLASS DATA ======== -->
<H2>
<FONT SIZE="-1">
org.apache.lucene.analysis.cn</FONT>
<BR>
Class ChineseTokenizer</H2>
<PRE>
<A HREF="http://java.sun.com/j2se/1.4/docs/api/java/lang/Object.html" title="class or interface in java.lang">java.lang.Object</A>
<IMG SRC="../../../../../resources/inherit.gif" ALT="extended by"><A HREF="../../../../../org/apache/lucene/analysis/TokenStream.html" title="class in org.apache.lucene.analysis">org.apache.lucene.analysis.TokenStream</A>
<IMG SRC="../../../../../resources/inherit.gif" ALT="extended by"><A HREF="../../../../../org/apache/lucene/analysis/Tokenizer.html" title="class in org.apache.lucene.analysis">org.apache.lucene.analysis.Tokenizer</A>
<IMG SRC="../../../../../resources/inherit.gif" ALT="extended by"><B>org.apache.lucene.analysis.cn.ChineseTokenizer</B>
</PRE>
<HR>
<DL>
<DT>public final class <B>ChineseTokenizer</B><DT>extends <A HREF="../../../../../org/apache/lucene/analysis/Tokenizer.html" title="class in org.apache.lucene.analysis">Tokenizer</A></DL>
<P>
Title: ChineseTokenizer Description: Extract tokens from the Stream using Character.getType() Rule: A Chinese character as a single token Copyright: Copyright (c) 2001 Company: The difference between thr ChineseTokenizer and the CJKTokenizer (id=23545) is that they have different token parsing logic. Let me use an example. If having a Chinese text "C1C2C3C4" to be indexed, the tokens returned from the ChineseTokenizer are C1, C2, C3, C4. And the tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4. Therefore the index the CJKTokenizer created is much larger. The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.
<P>
<P>
<DL>
<DT><B>Version:</B></DT> <DD>1.0</DD><DT><B>Author:</B></DT> <DD>Yiyi Sun</DD></DL>
<HR>
<P>
<!-- ======== NESTED CLASS SUMMARY ======== -->
<!-- =========== FIELD SUMMARY =========== -->
<A NAME="field_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY="">
<TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor">
<TD COLSPAN=2><FONT SIZE="+2">
<B>Field Summary</B></FONT></TD>
</TR>
</TABLE>
<A NAME="fields_inherited_from_class_org.apache.lucene.analysis.Tokenizer"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY="">
<TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor">
<TD><B>Fields inherited from class org.apache.lucene.analysis.<A HREF="../../../../../org/apache/lucene/analysis/Tokenizer.html" title="class in org.apache.lucene.analysis">Tokenizer</A></B></TD>
</TR>
<TR BGCOLOR="white" CLASS="TableRowColor">
<TD><CODE><A HREF="../../../../../org/apache/lucene/analysis/Tokenizer.html#input">input</A></CODE></TD>
</TR>
</TABLE>
<!-- ======== CONSTRUCTOR SUMMARY ======== -->
<A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY="">
<TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor">
<TD COLSPAN=2><FONT SIZE="+2">
<B>Constructor Summary</B></FONT></TD>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?