⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chinesefilter.java

📁 Jaoso新闻文章发布系统 0.9.1final 程序架构: Struts+Spring+Hibernate 主要功能:   ·新闻采用在线编辑器,可以象使用word一样编辑新闻,可简繁
💻 JAVA
字号:
package jaoso.framework.core.search.lucene.analyzer;

import org.apache.lucene.analysis.*;

import java.util.Hashtable;

/**
 * Title: ChineseFilter Description: Filter with a stop word table Rule: No
 * digital is allowed. English word/token should larger than 1 character. One
 * Chinese character as one Chinese word. TO DO: 1. Add Chinese stop words, such
 * as \ue400 2. Dictionary based Chinese word extraction 3. Intelligent Chinese
 * word extraction
 * 
 * Copyright: Copyright (c) 2001 Company:
 * 
 * @author Yiyi Sun
 * @version 1.0
 *  
 */
public final class ChineseFilter extends TokenFilter {
	// Only English now, Chinese to be added later.

	/** DOCUMENT ME! */
	public static final String[] STOP_WORDS = { "and", "are", "as", "at", "be",
			"but", "by", "for", "if", "in", "into", "is", "it", "no", "not",
			"of", "on", "or", "such", "that", "the", "their", "then", "there",
			"these", "they", "this", "to", "was", "will", "with" };

	/** DOCUMENT ME! */
	private Hashtable stopTable;

	/**
	 * Creates a new ChineseFilter object.
	 * 
	 * @param in
	 *            DOCUMENT ME!
	 */
	public ChineseFilter(TokenStream in) {
		super(in);
		input = in;

		stopTable = new Hashtable(STOP_WORDS.length);

		for (int i = 0; i < STOP_WORDS.length; i++) {
			stopTable.put(STOP_WORDS[i], STOP_WORDS[i]);
		}
	}

	/**
	 * DOCUMENT ME!
	 * 
	 * @return DOCUMENT ME!
	 * 
	 * @throws java.io.IOException
	 *             DOCUMENT ME!
	 */
	public final Token next() throws java.io.IOException {
		for (Token token = input.next(); token != null; token = input.next()) {
			String text = token.termText();

			if (stopTable.get(text) == null) {
				switch (Character.getType(text.charAt(0))) {
				case Character.LOWERCASE_LETTER:
				case Character.UPPERCASE_LETTER:

					// English word/token should larger than 1 character.
					if (text.length() > 1) {
						return token;
					}

					break;

				case Character.OTHER_LETTER:

					// One Chinese character as one Chinese word.
					// Chinese word extraction to be added later here.
					return token;
				}
			}
		}

		return null;
	}
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -