wordlistloader.java

来自「中文分词,中科院分词的改装版。使用java调用dll来实现的。」· Java 代码 · 共 105 行

JAVA

105 行

package org.apache.lucene.analysis;/** * Copyright 2004 The Apache Software Foundation * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * *     http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */import java.io.File;import java.io.FileReader;import java.io.IOException;import java.io.Reader;import java.io.BufferedReader;import java.util.HashSet;import java.util.Hashtable;import java.util.Iterator;/** * Loader for text files that represent a list of stopwords. * * @author Gerhard Schwarz * @version $Id: WordlistLoader.java 387550 2006-03-21 15:36:32Z yonik $ */public class WordlistLoader {  /**   * Loads a text file and adds every line as an entry to a HashSet (omitting   * leading and trailing whitespace). Every line of the file should contain only   * one word. The words need to be in lowercase if you make use of an   * Analyzer which uses LowerCaseFilter (like StandardAnalyzer).   *   * @param wordfile File containing the wordlist   * @return A HashSet with the file's words   */  public static HashSet getWordSet(File wordfile) throws IOException {    HashSet result = new HashSet();    FileReader reader = null;    try {      reader = new FileReader(wordfile);      result = getWordSet(reader);    }    finally {      if (reader != null)        reader.close();    }    return result;  }  /**   * Reads lines from a Reader and adds every line as an entry to a HashSet (omitting   * leading and trailing whitespace). Every line of the Reader should contain only   * one word. The words need to be in lowercase if you make use of an   * Analyzer which uses LowerCaseFilter (like StandardAnalyzer).   *   * @param reader Reader containing the wordlist   * @return A HashSet with the reader's words   */  public static HashSet getWordSet(Reader reader) throws IOException {    HashSet result = new HashSet();    BufferedReader br = null;    try {      if (reader instanceof BufferedReader) {        br = (BufferedReader) reader;      } else {        br = new BufferedReader(reader);      }      String word = null;      while ((word = br.readLine()) != null) {        result.add(word.trim());      }    }    finally {      if (br != null)        br.close();    }    return result;  }  /**   * Builds a wordlist table, using words as both keys and values   * for backward compatibility.   *   * @param wordSet   stopword set   */  private static Hashtable makeWordTable(HashSet wordSet) {    Hashtable table = new Hashtable();    for (Iterator iter = wordSet.iterator(); iter.hasNext();) {      String word = (String)iter.next();      table.put(word, word);    }    return table;  }}

wordlistloader.java - 源码说明

本页面展示了「中文分词,中科院分词的改装版。使用java调用dll来实现的。」中的 wordlistloader.java 源码文件，采用 Java 编程语言编写，共 105 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫开发者社区收录了大量与中文分词相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?