standardtokenizer.pm

来自「Plucene-1.25.tar.gz PERL版本的lucene」· PM 代码 · 共 72 行

72 行

package Plucene::Analysis::Standard::StandardTokenizer;=head1 NAME Plucene::Analysis::Standard::StandardTokenizer - standard tokenizer=head1 SYNOPSIS	# isa Plucene::Analysis::CharTokenizer=head1 DESCRIPTIONThis is the standard tokenizer.This should be a good tokenizer for most European-language documents.=head1 METHODS=cutuse strict;use warnings;use base 'Plucene::Analysis::CharTokenizer';# Don't blame me, blame the Plucene people!my $alpha      = qr/\p{IsAlpha}+/;my $apostrophe = qr/$alpha('$alpha)+/;my $acronym    = qr/$alpha\.($alpha\.)+/;my $company    = qr/$alpha(&|\@)$alpha/;my $hostname   = qr/\w+(\.\w+)+/;my $email      = qr/\w+\@$hostname/;my $p          = qr/[_\/.,-]/;my $hasdigit   = qr/\w*\d\w*/;my $num        = qr/\w+$p$hasdigit|$hasdigit$p\w+                   |\w+($p$hasdigit$p\w+)+                   |$hasdigit($p\w+$p$hasdigit)+                   |\w+$p$hasdigit($p\w+$p$hasdigit)+                   |$hasdigit$p\w+($p$hasdigit$p\w+)+/x;=head2 token_reThe regular expression for tokenising.=cutsub token_re {	qr/        $apostrophe | $acronym | $company | $hostname | $email | $num        | \w+    /x;}=head2 normalizeRemove 's and .=cutsub normalize {	my $class = shift;	# These are in the StandardFilter in Java, but Perl is not Java.	# Thankfully.	local $_ = shift;	if (/$apostrophe/) { s/'s//; }	if (/$company/)    { s/\.//g; }	return $_;}1;

standardtokenizer.pm - 源码说明

本页面展示了「Plucene-1.25.tar.gz PERL版本的lucene」中的 standardtokenizer.pm 源码文件，采用 PM 编程语言编写，共 72 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与Plucene相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?