wakati.pl

来自「namazu. 虽然是日语的,也适用于文件中单词索引后全文检索.」· PL 代码 · 共 99 行

99 行

## -*- Perl -*-# $Id: wakati.pl,v 1.9 2000/01/31 06:24:04 satoru Exp $# Copyright (C) 1997-1999 Satoru Takabayashi All rights reserved.# Copyright (C) 2000 Namazu Project All rights reserved.#     This is free software with ABSOLUTELY NO WARRANTY.##  This program is free software; you can redistribute it and/or modify#  it under the terms of the GNU General Public License as published by#  the Free Software Foundation; either versions 2, or (at your option)#  any later version.# #  This program is distributed in the hope that it will be useful#  but WITHOUT ANY WARRANTY; without even the implied warranty of#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the#  GNU General Public License for more details.##  You should have received a copy of the GNU General Public License#  along with this program; if not, write to the Free Software#  Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA#  02111-1307, USA##  This file must be encoded in EUC-JP encoding#package wakati;use strict;# Do wakatigaki processing for a Japanese text.sub wakatize_japanese ($) {    my ($content) = @_;    my @tmp = wakatize_japanese_sub($content);    # Remove words consists of only Hiragana characters     # when -H option is specified.    # Original of this code was contributed by <furukawa@tcp-ip.or.jp>.     # [1997-11-13]    # And do Okurigana processing. [1998-04-24]    if ($var::Opt{'hiragana'} || $var::Opt{'okurigana'}){        for (my $ndx = 0; $ndx <= $#tmp; ++$ndx){	    $tmp[$ndx] =~ s/(\s)/ $1/g;	    $tmp[$ndx] = ' ' . $tmp[$ndx];	    if ($var::Opt{'okurigana'}) {		$tmp[$ndx] =~ s/([^\xa4][\xa1-\xfe])+(\xa4[\xa1-\xf3])+ /$1 /g;	    }	    if ($var::Opt{'hiragana'}) {		$tmp[$ndx] =~ s/ (\xa4[\xa1-\xf3])+ //g;	    }        }    }    # Collect only noun words when -m option is specified.    if ($var::Opt{'noun'}) {	$$content = "";	$$content .= shift(@tmp) =~ /(.+ )叹混/ ? $1 : "" while @tmp;     } else {	$$content = join("\n", @tmp);    }    util::dprint(_("-- wakatized content --\n")."$$content\n");}sub wakatize_japanese_sub ($) {    my ($content) = @_;    my $str = "";    my @tmp = ();    if ($conf::WAKATI =~ /^module_(\w+)/) {	my $module = $1;	if ($module eq "kakasi") {	    $str = Text::Kakasi::do_kakasi($$content);	} elsif ($module eq "chasen") {	    $str = Text::ChaSen::sparse_tostr_long($$content);	} else {	    util::cdie(_("invalid wakati module: ")."$module\n");	}        util::dprint(_("-- wakatized bare content --\n")."$str\n\n");	@tmp = split('\n', $str);    } else {	my $tmpfile = util::tmpnam("NMZ.wakati");        util::dprint(_("wakati: using ")."$conf::WAKATI\n");	# Don't use IPC::Open2 because it's not efficent.	{	    my $fh_wakati = util::efopen("|$conf::WAKATI > $tmpfile");	    print $fh_wakati $$content;	}	{	    my $fh_wakati = util::efopen($tmpfile);	    @tmp = <$fh_wakati>;	    chomp @tmp;	}	unlink $tmpfile;    }    return @tmp;}1;

wakati.pl - 源码说明

本页面展示了「namazu. 虽然是日语的,也适用于文件中单词索引后全文检索.」中的 wakati.pl 源码文件，采用 PL 编程语言编写，共 99 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与namazu相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?