⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chinesefilter.java

📁 版本信息:Jaoso新闻文章发布系统 0.9.1beta1 原POPTEN新闻发布系统现更名为Jaoso,不兼容popten,目前不提供popten升级Jaoso的程序.
💻 JAVA
字号:
package jaoso.framework.core.search.lucene.analyzer;

import org.apache.lucene.analysis.*;

import java.util.Hashtable;


/**
 * Title: ChineseFilter
 * Description: Filter with a stop word table
 *              Rule: No digital is allowed.
 *                    English word/token should larger than 1 character.
 *                    One Chinese character as one Chinese word.
 * TO DO:
 *   1. Add Chinese stop words, such as \ue400
 *   2. Dictionary based Chinese word extraction
 *   3. Intelligent Chinese word extraction
 *
 * Copyright:    Copyright (c) 2001
 * Company:
 * @author Yiyi Sun
 * @version 1.0
 *
 */
public final class ChineseFilter extends TokenFilter {

    //~ Static fields/initializers =============================================

    // Only English now, Chinese to be added later.

    /**  DOCUMENT ME! */
    public static final String[] STOP_WORDS = {
            "and", "are", "as", "at", "be", "but", "by", "for", "if", "in",
            "into", "is", "it", "no", "not", "of", "on", "or", "such", "that",
            "the", "their", "then", "there", "these", "they", "this", "to",
            "was", "will", "with"
        };

    //~ Instance fields ========================================================

    /**  DOCUMENT ME! */
    private Hashtable stopTable;

    //~ Constructors ===========================================================

    /**
     * Creates a new ChineseFilter object.
     *
     * @param in DOCUMENT ME!
     */
    public ChineseFilter(TokenStream in) {
        super(in);
        input = in;

        stopTable = new Hashtable(STOP_WORDS.length);

        for (int i = 0; i < STOP_WORDS.length; i++) {

            stopTable.put(STOP_WORDS[i], STOP_WORDS[i]);
        }
    }

    //~ Methods ================================================================

    /**
     * DOCUMENT ME!
     *
     * @return DOCUMENT ME!
     *
     * @throws java.io.IOException DOCUMENT ME!
     */
    public final Token next() throws java.io.IOException {

        for (Token token = input.next(); token != null; token = input.next()) {

            String text = token.termText();

            if (stopTable.get(text) == null) {

                switch (Character.getType(text.charAt(0))) {

                case Character.LOWERCASE_LETTER:
                case Character.UPPERCASE_LETTER:

                    // English word/token should larger than 1 character.
                    if (text.length() > 1) {

                        return token;
                    }

                    break;

                case Character.OTHER_LETTER:

                    // One Chinese character as one Chinese word.
                    // Chinese word extraction to be added later here.
                    return token;
                }
            }
        }

        return null;
    }
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -