⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chunk.java

📁 基于词典和最大匹配算法的的中文分词组件
💻 JAVA
字号:
/**
 * 
 */
package org.solol.mmseg.internal;

import org.solol.mmseg.core.IChunk;
import org.solol.mmseg.core.IWord;

/**
 * @author solo L
 * 
 */
public final class Chunk implements IChunk {

	private IWord[] words = null;

	private double averageLength = -1D;

	private double variance = -1D;

	private double degreeMorphemicFreedom = -1D;

	private int length = -1;

	public Chunk(IWord[] words) {
		this.words = words;
	}

	/*
	 * (non-Javadoc)
	 * 
	 * @see org.solol.mmseg.core.IChunk#getLargestAverageLength()
	 */
	public double getAverageLength() {
		if (averageLength == -1D) {
			averageLength = (double) getLength() / (double) words.length;
		}

		return averageLength;
	}

	/*
	 * (non-Javadoc)
	 * 
	 * @see org.solol.mmseg.core.IChunk#getSmallestVariance()
	 */
	public double getVariance() {
		if (variance == -1D) {
			double tempVariance = 0D;
			for (int i = 0; i < words.length; i++) {
				double temp = (double)words[i].getLength()-getAverageLength();
				tempVariance += temp * temp;
			}
			
			variance = Math.sqrt(tempVariance / (double) words.length);			
		}
		return variance;
	}

	/*
	 * (non-Javadoc)
	 * 
	 * @see org.solol.mmseg.core.IChunk#getWords()
	 */
	public IWord[] getWords() {
		return words;
	}

	public int getLength() {
		if (length == -1) {
			length = 0;
			for (int i = 0; i < words.length; i++) {
				length += words[i].getLength();
			}
		}

		return length;
	}

	public double getDegreeOfMorphemicFreedom() {
		if (degreeMorphemicFreedom == -1D) {
			degreeMorphemicFreedom = 0D;
			for (int i = 0; i < words.length; i++) {
				if (words[i].getLength() == 1) {
					degreeMorphemicFreedom += Math.log((double) words[i]
							.getFrequency());
				}
			}
		}
		return degreeMorphemicFreedom;
	}

}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -