⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 dictionaryfactoryimpl.java

📁 基于词典和最大匹配算法的的中文分词组件
💻 JAVA
字号:
/**
 * 
 */
package org.solol.mmseg.internal;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

import org.solol.mmseg.core.Config;
import org.solol.mmseg.core.DictionaryFactory;
import org.solol.mmseg.core.IDictionary;

/**
 * @author solo L
 * 
 */
public class DictionaryFactoryImpl extends DictionaryFactory {

	private IDictionary dictionary = null;

	public IDictionary createDictionary() {
		if (dictionary == null) {
			dictionary = loadDictionary();
		}
		return dictionary;
	}

	private static IDictionary loadDictionary() {
		IDictionary dictionary = new Dictionary();
		String chars = ".\\wordlist\\chars.lex";
		loadWords(chars, dictionary, "UTF-8");
		String words = ".\\wordlist\\words.lex";
		loadWords(words, dictionary, "UTF-8");
		return dictionary;
	}

	private static void loadWords(String fileName, IDictionary dictionary,
			String charSet) {
		InputStream is = null;
		InputStreamReader isr = null;
		BufferedReader br = null;

		try {
			is = new FileInputStream(fileName);
			isr = new InputStreamReader(is, charSet);
			br = new BufferedReader(isr);

			String word = null;

			while ((word = br.readLine()) != null) {

				if (word.indexOf("#") == -1) {
					if (word.indexOf(" ") == -1) {
						if (word.length() <= Config.WORD_MAX_LENGTH) {
							dictionary.addWord(word,Word.CJK_WORD);
						}
					} else {
						String value = word.substring(0, word.indexOf(" "));
						if (value.length() <= Config.WORD_MAX_LENGTH) {
							int frequency = Integer.parseInt(word
									.substring(word.indexOf(" ") + 1,word.lastIndexOf(" ")));
							dictionary.addWord(value, frequency,Word.CJK_WORD);
						}
					}

				}
			}
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if (br != null) {
					br.close();
				}
				if (isr != null) {
					isr.close();
				}
				if (is != null) {
					is.close();
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -