📄 standardtokenizer.pm
字号:
package Plucene::Analysis::Standard::StandardTokenizer;=head1 NAME Plucene::Analysis::Standard::StandardTokenizer - standard tokenizer=head1 SYNOPSIS # isa Plucene::Analysis::CharTokenizer=head1 DESCRIPTIONThis is the standard tokenizer.This should be a good tokenizer for most European-language documents.=head1 METHODS=cutuse strict;use warnings;use base 'Plucene::Analysis::CharTokenizer';# Don't blame me, blame the Plucene people!my $alpha = qr/\p{IsAlpha}+/;my $apostrophe = qr/$alpha('$alpha)+/;my $acronym = qr/$alpha\.($alpha\.)+/;my $company = qr/$alpha(&|\@)$alpha/;my $hostname = qr/\w+(\.\w+)+/;my $email = qr/\w+\@$hostname/;my $p = qr/[_\/.,-]/;my $hasdigit = qr/\w*\d\w*/;my $num = qr/\w+$p$hasdigit|$hasdigit$p\w+ |\w+($p$hasdigit$p\w+)+ |$hasdigit($p\w+$p$hasdigit)+ |\w+$p$hasdigit($p\w+$p$hasdigit)+ |$hasdigit$p\w+($p$hasdigit$p\w+)+/x;=head2 token_reThe regular expression for tokenising.=cutsub token_re { qr/ $apostrophe | $acronym | $company | $hostname | $email | $num | \w+ /x;}=head2 normalizeRemove 's and .=cutsub normalize { my $class = shift; # These are in the StandardFilter in Java, but Perl is not Java. # Thankfully. local $_ = shift; if (/$apostrophe/) { s/'s//; } if (/$company/) { s/\.//g; } return $_;}1;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -