⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 token.pm

📁 外国人写的Perl搜索引擎程序
💻 PM
字号:
package KinoSearch::Analysis::Token;1;__END____H__#ifndef H_KINOSEARCH_ANALYSIS_TOKEN#define H_KINOSEARCH_ANALYSIS_TOKEN 1#include "EXTERN.h"#include "perl.h"#include "XSUB.h"#include "KinoSearchUtilMemManager.h"typedef struct Token Token;struct Token {    char   *text;    STRLEN  len;    I32     start_offset;    I32     end_offset;    I32     pos_inc;    Token  *next;    Token  *prev;};Token* Kino_Token_new(char* text, STRLEN len, I32 start_offset,                       I32 end_offset, I32 pos_inc);void Kino_Token_destroy(Token*);#endif /* include guard */__C__#include "KinoSearchAnalysisToken.h"Token*Kino_Token_new(char* text, STRLEN len, I32 start_offset, I32 end_offset,                I32 pos_inc) {    Token *token;    /* allocate */    Kino_New(0, token, 1, Token);    /* allocate and assign */    token->text = Kino_savepvn(text, len);    /* assign */    token->len          = len;    token->start_offset = start_offset;    token->end_offset   = end_offset;    token->pos_inc      = pos_inc;    /* init */    token->next = NULL;    token->prev = NULL;    return token;}voidKino_Token_destroy(Token *token) {    Kino_Safefree(token->text);    Kino_Safefree(token);}__POD__=head1 NAMEKinoSearch::Analysis::Token - unit of text=head1 SYNOPSIS    # private class - no public API=head1 PRIVATE CLASSYou can't actually instantiate a Token object at the Perl level -- however,you can affect individual Tokens within a TokenBatch by way of TokenBatch's(experimental) API.=head1 DESCRIPTIONToken is the fundamental unit used by KinoSearch's Analyzer subclasses.  EachToken has 4 attributes: text, start_offset, end_offset, and pos_inc (forposition increment).The text of a token is a string.A Token's start_offset and end_offset locate it within a larger text, even ifthe Token's text attribute gets modified -- by stemming, for instance.  TheToken for "beating" in the text "beating a dead horse" begins life with astart_offset of 0 and an end_offset of 7; after stemming, the text is "beat",but the end_offset is still 7. The position increment, which defaults to 1, is a an advanced tool formanipulating phrase matching.  Ordinarily, Tokens are assigned consecutiveposition numbers: 0, 1, and 2 for "three blind mice".  However, if you set theposition increment for "blind" to, say, 1000, then the three tokens will endup assigned to positions 0, 1, and 1001 -- and will no longer produce a phrasematch for the query '"three blind mice"'.=head1 COPYRIGHTCopyright 2006-2007 Marvin Humphrey=head1 LICENSE, DISCLAIMER, BUGS, etc.See L<KinoSearch|KinoSearch> version 0.163.=cut

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -