⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 解决未登陆词.txt

📁 java解决中文分词未登陆词
💻 TXT
字号:
public class CheckArtical{
private static Hashtable W_table = new Hashtable();

private static String ChangeWord(String word, String sentence){
String chg_word = new String();
char c;


BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); 

System.out.println(word);
System.out.println(" is not in lexicon,change it!");

//System.out.println(sentence+"\n");
try{ 
while(true){
chg_word = br.readLine();
//System.out.println(chg_word);
if(chg_word.equalsIgnoreCase("Q"))System.exit(1);
if(chg_word.equalsIgnoreCase("Quit")){
chg_word = word;
break;
}
else{
chg_word = (String)W_table.get(chg_word);
System.out.println(chg_word);
if(chg_word == null)

System.out.println("try again\n");
else
break; 

}

}
}
catch(Exception e){
System.out.println(e);
System.out.println("error in change word!");
System.exit(1);
}
return chg_word;
}

private static void ConstructTable(){
String word = new String();
byte Aword[] = null; 
ResultSet rs = null;
Statement stm1 = null;
Connection con1 = null;
String JDBC_DRIVER = "com.mysql.jdbc.Driver";
String Database_URL = "jdbc:mysql://localhost/wordbase"; 


try{

Class.forName(JDBC_DRIVER);
con1 = DriverManager.getConnection(Database_URL, "root", "root");
stm1 = con1.createStatement();

//construct w_table
rs = stm1.executeQuery("SELECT * FROM worddatas");
rs.beforeFirst();
while(rs.next()){
word = rs.getString(1);
Aword = word.getBytes("ISO-8859-1");
word = new String(Aword, "GB2312");

//System.out.println(Aword);
W_table.put((String)word, (String)word);
System.out.println((String)W_table.get((String)word));
//i++;

}

//System.out.println(i);
// con1.close();
}

catch(Exception e){
System.out.println(e);
System.exit(1);
}

} 

public static void main(String[] args){
int i, j, k;
File in = new File(args[0]);
File out = new File(args[1]);
//File lex = new File(args[2]);
//File hash = new File(args[3]);
StringBuffer ori_sentence = new StringBuffer();
StringBuffer chg_sentence = new StringBuffer();
StringBuffer sentence = new StringBuffer();
StringBuffer word = new StringBuffer();
byte[] Aword = null;
String GBword = new String();
int NumberOfRead = -1;
char Line[] = new char[8192];

Locale.setDefault(Locale.SIMPLIFIED_CHINESE);

//construct hashtable
ConstructTable();
/*
try{
FileReader ilex = new FileReader(lex);
FileWriter hw = new FileWriter(hash);
do{
NumberOfRead = ilex.read(Line);
for(i=0; i<NumberOfRead; i++){
if(Line[i] != '\n'){
word.append(Line[1]);

}
else{
Aword = word.tostring().getBytes("ISO-8859-1");
GBword = new String(Aword, "GB2312");
if(!GBword.equals(null))System.out.println(GBword);
W_table.put(GBword.intern(), GBword);
hw.write((String)W_table.get(GBword.intern()));
hw.flush();
word.delete(0, word.length());
}
}
}while(NumberOfRead == 8192);

}
catch(Exception e){
System.out.println(e);
System.exit(1);
} */

//segment and check
ChineseSegmenter seg = ChineseSegmenter.getGBSegmenter();
try{
FileReader ir = new FileReader(in);

FileWriter ow = new FileWriter(out);
do{
NumberOfRead = ir.read(Line);
for(i=0; i<NumberOfRead; i++){
if((Line[i] != '。') && (Line[i] != '!') && (Line[i] != '?') && (Line[i] != '\n'))
sentence.append(Line[i]);
else{
ori_sentence.append(seg.segmentLine(sentence.tostring(), "#"));
// System.out.println(ori_sentence.tostring());
for(j=0; j<ori_sentence.length(); j++){
if(ori_sentence.charAt(j) != '#'){
word.append(ori_sentence.charAt(j));
System.out.println(word);
}
else{
//Aword = word.tostring().getBytes("ISO-8859-1");
// GBword = new String(Aword, "GB2312");
GBword = (String)W_table.get(word.tostring()) ;
if(GBword == null){
//System.out.println(word);
System.out.println(word.tostring());
chg_sentence.append(ChangeWord(word.tostring(), ori_sentence.tostring()));

}
else{
chg_sentence.append(word.tostring());
}
word.delete(0, word.length());
}

}
ow.write(chg_sentence.tostring()+Line[i]);
ori_sentence.delete(0, ori_sentence.length());
chg_sentence.delete(0, chg_sentence.length());
sentence.delete(0, sentence.length());

}
}
}while(NumberOfRead == 8192);
}
catch(Exception e){
System.out.println(e);
System.exit(1);
}
}
}
我的hashtable是全局的,在函数ConstructTable中,put和get 都没有问题,可以得到正确的答案,但是在别的函数main和changeword中,总是get 不到任何值,get操作的结果得到的都是null!请高手指教,这是怎么回事?
<---->
响应者 1:
我的这个程序是用来做中文分词和改词的,chinesesegmenter是一个分词程序,把一句中文根据词库分成一个个的词语。在分出的词中如果有不在词库中的则把它修改成词库中的词。
<---->
响应者 2:
key 有没有做trim();
<---->
响应者 3:
由于你的方法都是静态的,可能hashtable还没有生成,你就执行了
在main和changeword中
你判断一下W_table是不是为空的
if(W_table==null || W_table.size()==0){
ConstructTable();
}
<---->
响应者 4:
试过,在main中,W_table的size不是0
<---->

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -