sort.txt

来自「对中文文章进行分词、切词 特点:高效」· 文本 代码 · 共 61 行

TXT
61
字号
use Encode;
open(DICTIONARY,"<newdictionary.txt")||die "Cannot open the resource file:dictionary.txt";
my %lexicon;#词典散列
print "正在载入词典......\n";
while(<DICTIONARY>)#词典初始化到%lex
{
	chomp $_;#去掉回车换行符
	$lexicon{$_}=$_;#生成散列表存储词典中的词语,关键字和键值一样
}
close(DICTIONARY)||die "newdictionary.txt close error occurred!";
print "词典已载入!\n";

print "请输入文本文件全名:\nPS:必须在本文件夹内!\n";
$source=<STDIN>;
open(SOURCE,"<$source")||die "Cannot open the resource file:$source";
print "——————————————————————\n";
print "请输入结果的文本文件全名:\nPS:在本文件夹内!\n";
$output=<STDIN>;
print "处理中,请稍候......\n";
open(OUTPUT,">$output")||die "Cannot open the result file:$output";

while(<SOURCE>)
{
		chomp;                                        
		$_=decode("gb2312",$_);                       
		@sentence=split //,$_;                        
		$i=0;                                         
		foreach $char (@sentence)                     
		{				
			$sentence[$i]=encode("gb2312",$char);     
			$i++;                                     
		}
		do                                            
		{
				@temp;                                
				for($i=0;($i<10)&&$#sentence;$i++)    
				{
					$temp[$i]=shift(@sentence);	      
				}

				$temporary=join("",@temp);            
				while(!(exists $lexicon{$temporary})) 
				{
					if($#temp==0)                     
					{
						last;                         
					}
					else                              
					{
					unshift @sentence,pop(@temp);     
					$temporary=join("",@temp);        
					}
				}
			print OUTPUT $temporary," ";              
		}while($#sentence>0);                         
		print OUTPUT ("@sentence\n");                 
}
print "处理完毕!";
close(SOURCE)||die "test3.txt close error occurred!";
close(OUTPUT)||die "result.txt close error occurred!";

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?