segment.java

来自「一个主题相关的网络爬虫,实现与某一主题相关的网页的爬取」· Java 代码 · 共 32 行

JAVA
32
字号
/*2008-5-31*/
package com.segment;

import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.mira.lucene.analysis.IK_CAnalyzer;

public class Segment {
	public static String show(Analyzer a, String s) throws Exception {

		StringReader reader = new StringReader(s);
		TokenStream ts = a.tokenStream(s, reader);
		String s1 = "", s2 = "";
		Token t = ts.next();
		while (t != null) {
			s2 = t.termText() + " ";
			s1 += s2;
			t = ts.next();
		}
		return s1;
	}

	public String segment(String s) throws Exception {
		Analyzer a = new IK_CAnalyzer();
		return show(a, s);
	}


}

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?