⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 dictrainer.java

📁 简单分词程序 读入一个pdf 输出一个分好词的txt
💻 JAVA
字号:
package WordSegment;
import java.io.*;
import java.util.*;


public class DicTrainer
{
	private Dictionary dic = new Dictionary();
        private Dictionary rel_dic = new Dictionary();

	public Dictionary getDic()
	{
		return dic;
	}
	
	public void Train(String fileName)
	{
		File aFile = new File(fileName);
		FileInputStream inFile = null;
		try
		{
			inFile = new FileInputStream(aFile);
		}
		catch (FileNotFoundException e)
		{
			e.printStackTrace(System.err);
			System.exit(1);
		}

		try
		{
			BufferedReader inStream = new BufferedReader(new InputStreamReader(inFile));
			String line;
			while ((line = inStream.readLine()) != null)	
			{
				StringTokenizer st = new StringTokenizer(line);
				while(st.hasMoreTokens())
					dic.addWord(st.nextToken());
			}
			inFile.close();
		}
		catch (IOException e)
		{
			e.printStackTrace(System.err);
			System.exit(0);
		}
	}
        
        public void TrainMore(String fileName){
                File aFile = new File(fileName);
		FileInputStream inFile = null;
		try
		{
			inFile = new FileInputStream(aFile);
		}
		catch (FileNotFoundException e)
		{
			e.printStackTrace(System.err);
			System.exit(1);
		}

		try
		{
			BufferedReader inStream = new BufferedReader(new InputStreamReader(inFile));
			String line;
			while ((line = inStream.readLine()) != null)	
			{
				StringTokenizer st = new StringTokenizer(line);
                                String r_word = st.nextToken();
                                Integer fre = Integer.valueOf(st.nextToken());				
				rel_dic.addRelatedWord(r_word, fre);
			}
			inFile.close();
		}
		catch (IOException e)
		{
			e.printStackTrace(System.err);
			System.exit(0);
		}
        }

	public void SaveDic(String fileName)
	{
		ObjectOutputStream objout = null;
		try
		{
			objout = new ObjectOutputStream(new FileOutputStream(new File(fileName)));
			objout.writeObject(dic);
			objout.close();
		}
		catch (IOException e)
		{
			e.printStackTrace(System.err);
			System.exit(1);
		}
	}
        
        public void SaveRel_Dic(String fileName)
	{
		ObjectOutputStream objout = null;
		try
		{
			objout = new ObjectOutputStream(new FileOutputStream(new File(fileName)));
			objout.writeObject(rel_dic);
			objout.close();
		}
		catch (IOException e)
		{
			e.printStackTrace(System.err);
			System.exit(1);
		}
	}
        
	public static void main(String[] args) 
	{
		DicTrainer trainer = new DicTrainer();
		//trainer.Train("test.txt");                
		//trainer.SaveDic("dic.dat");		
                trainer.TrainMore("SogouR.mini.txt");
                trainer.SaveRel_Dic("rel_dic.dat");
                
	}
};

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -