⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 standardtokenizer.pm

📁 Plucene-1.25.tar.gz PERL版本的lucene
💻 PM
字号:
package Plucene::Analysis::Standard::StandardTokenizer;=head1 NAME Plucene::Analysis::Standard::StandardTokenizer - standard tokenizer=head1 SYNOPSIS	# isa Plucene::Analysis::CharTokenizer=head1 DESCRIPTIONThis is the standard tokenizer.This should be a good tokenizer for most European-language documents.=head1 METHODS=cutuse strict;use warnings;use base 'Plucene::Analysis::CharTokenizer';# Don't blame me, blame the Plucene people!my $alpha      = qr/\p{IsAlpha}+/;my $apostrophe = qr/$alpha('$alpha)+/;my $acronym    = qr/$alpha\.($alpha\.)+/;my $company    = qr/$alpha(&|\@)$alpha/;my $hostname   = qr/\w+(\.\w+)+/;my $email      = qr/\w+\@$hostname/;my $p          = qr/[_\/.,-]/;my $hasdigit   = qr/\w*\d\w*/;my $num        = qr/\w+$p$hasdigit|$hasdigit$p\w+                   |\w+($p$hasdigit$p\w+)+                   |$hasdigit($p\w+$p$hasdigit)+                   |\w+$p$hasdigit($p\w+$p$hasdigit)+                   |$hasdigit$p\w+($p$hasdigit$p\w+)+/x;=head2 token_reThe regular expression for tokenising.=cutsub token_re {	qr/        $apostrophe | $acronym | $company | $hostname | $email | $num        | \w+    /x;}=head2 normalizeRemove 's and .=cutsub normalize {	my $class = shift;	# These are in the StandardFilter in Java, but Perl is not Java.	# Thankfully.	local $_ = shift;	if (/$apostrophe/) { s/'s//; }	if (/$company/)    { s/\.//g; }	return $_;}1;

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -