📄 token.pm
字号:
package KinoSearch::Analysis::Token;1;__END____H__#ifndef H_KINOSEARCH_ANALYSIS_TOKEN#define H_KINOSEARCH_ANALYSIS_TOKEN 1#include "EXTERN.h"#include "perl.h"#include "XSUB.h"#include "KinoSearchUtilMemManager.h"typedef struct Token Token;struct Token { char *text; STRLEN len; I32 start_offset; I32 end_offset; I32 pos_inc; Token *next; Token *prev;};Token* Kino_Token_new(char* text, STRLEN len, I32 start_offset, I32 end_offset, I32 pos_inc);void Kino_Token_destroy(Token*);#endif /* include guard */__C__#include "KinoSearchAnalysisToken.h"Token*Kino_Token_new(char* text, STRLEN len, I32 start_offset, I32 end_offset, I32 pos_inc) { Token *token; /* allocate */ Kino_New(0, token, 1, Token); /* allocate and assign */ token->text = Kino_savepvn(text, len); /* assign */ token->len = len; token->start_offset = start_offset; token->end_offset = end_offset; token->pos_inc = pos_inc; /* init */ token->next = NULL; token->prev = NULL; return token;}voidKino_Token_destroy(Token *token) { Kino_Safefree(token->text); Kino_Safefree(token);}__POD__=head1 NAMEKinoSearch::Analysis::Token - unit of text=head1 SYNOPSIS # private class - no public API=head1 PRIVATE CLASSYou can't actually instantiate a Token object at the Perl level -- however,you can affect individual Tokens within a TokenBatch by way of TokenBatch's(experimental) API.=head1 DESCRIPTIONToken is the fundamental unit used by KinoSearch's Analyzer subclasses. EachToken has 4 attributes: text, start_offset, end_offset, and pos_inc (forposition increment).The text of a token is a string.A Token's start_offset and end_offset locate it within a larger text, even ifthe Token's text attribute gets modified -- by stemming, for instance. TheToken for "beating" in the text "beating a dead horse" begins life with astart_offset of 0 and an end_offset of 7; after stemming, the text is "beat",but the end_offset is still 7. The position increment, which defaults to 1, is a an advanced tool formanipulating phrase matching. Ordinarily, Tokens are assigned consecutiveposition numbers: 0, 1, and 2 for "three blind mice". However, if you set theposition increment for "blind" to, say, 1000, then the three tokens will endup assigned to positions 0, 1, and 1001 -- and will no longer produce a phrasematch for the query '"three blind mice"'.=head1 COPYRIGHTCopyright 2006-2007 Marvin Humphrey=head1 LICENSE, DISCLAIMER, BUGS, etc.See L<KinoSearch|KinoSearch> version 0.163.=cut
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -