sort.txt
来自「对中文文章进行分词、切词 特点:高效」· 文本 代码 · 共 61 行
TXT
61 行
use Encode;
open(DICTIONARY,"<newdictionary.txt")||die "Cannot open the resource file:dictionary.txt";
my %lexicon;#词典散列
print "正在载入词典......\n";
while(<DICTIONARY>)#词典初始化到%lex
{
chomp $_;#去掉回车换行符
$lexicon{$_}=$_;#生成散列表存储词典中的词语,关键字和键值一样
}
close(DICTIONARY)||die "newdictionary.txt close error occurred!";
print "词典已载入!\n";
print "请输入文本文件全名:\nPS:必须在本文件夹内!\n";
$source=<STDIN>;
open(SOURCE,"<$source")||die "Cannot open the resource file:$source";
print "——————————————————————\n";
print "请输入结果的文本文件全名:\nPS:在本文件夹内!\n";
$output=<STDIN>;
print "处理中,请稍候......\n";
open(OUTPUT,">$output")||die "Cannot open the result file:$output";
while(<SOURCE>)
{
chomp;
$_=decode("gb2312",$_);
@sentence=split //,$_;
$i=0;
foreach $char (@sentence)
{
$sentence[$i]=encode("gb2312",$char);
$i++;
}
do
{
@temp;
for($i=0;($i<10)&&$#sentence;$i++)
{
$temp[$i]=shift(@sentence);
}
$temporary=join("",@temp);
while(!(exists $lexicon{$temporary}))
{
if($#temp==0)
{
last;
}
else
{
unshift @sentence,pop(@temp);
$temporary=join("",@temp);
}
}
print OUTPUT $temporary," ";
}while($#sentence>0);
print OUTPUT ("@sentence\n");
}
print "处理完毕!";
close(SOURCE)||die "test3.txt close error occurred!";
close(OUTPUT)||die "result.txt close error occurred!";
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?