⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 segment.java

📁 一个主题相关的网络爬虫,实现与某一主题相关的网页的爬取
💻 JAVA
字号:
/*2008-5-31*/
package com.segment;

import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.mira.lucene.analysis.IK_CAnalyzer;

public class Segment {
	public static String show(Analyzer a, String s) throws Exception {

		StringReader reader = new StringReader(s);
		TokenStream ts = a.tokenStream(s, reader);
		String s1 = "", s2 = "";
		Token t = ts.next();
		while (t != null) {
			s2 = t.termText() + " ";
			s1 += s2;
			t = ts.next();
		}
		return s1;
	}

	public String segment(String s) throws Exception {
		Analyzer a = new IK_CAnalyzer();
		return show(a, s);
	}


}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -