⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 readme.tokenizers

📁 sqlite-3.4.1,嵌入式数据库.是一个功能强大的开源数据库,给学习和研发以及小型公司的发展带来了全所未有的好处.
💻 TOKENIZERS
字号:
1. FTS2 Tokenizers  When creating a new full-text table, FTS2 allows the user to select  the text tokenizer implementation to be used when indexing text  by specifying a "tokenizer" clause as part of the CREATE VIRTUAL TABLE  statement:    CREATE VIRTUAL TABLE <table-name> USING fts2(      <columns ...> [, tokenizer <tokenizer-name> [<tokenizer-args>]]    );  The built-in tokenizers (valid values to pass as <tokenizer name>) are  "simple" and "porter".  <tokenizer-args> should consist of zero or more white-space separated  arguments to pass to the selected tokenizer implementation. The   interpretation of the arguments, if any, depends on the individual   tokenizer.2. Custom Tokenizers  FTS2 allows users to provide custom tokenizer implementations. The   interface used to create a new tokenizer is defined and described in   the fts2_tokenizer.h source file.  Registering a new FTS2 tokenizer is similar to registering a new   virtual table module with SQLite. The user passes a pointer to a  structure containing pointers to various callback functions that  make up the implementation of the new tokenizer type. For tokenizers,  the structure (defined in fts2_tokenizer.h) is called  "sqlite3_tokenizer_module".  FTS2 does not expose a C-function that users call to register new  tokenizer types with a database handle. Instead, the pointer must  be encoded as an SQL blob value and passed to FTS2 through the SQL  engine by evaluating a special scalar function, "fts2_tokenizer()".  The fts2_tokenizer() function may be called with one or two arguments,  as follows:    SELECT fts2_tokenizer(<tokenizer-name>);    SELECT fts2_tokenizer(<tokenizer-name>, <sqlite3_tokenizer_module ptr>);    Where <tokenizer-name> is a string identifying the tokenizer and  <sqlite3_tokenizer_module ptr> is a pointer to an sqlite3_tokenizer_module  structure encoded as an SQL blob. If the second argument is present,  it is registered as tokenizer <tokenizer-name> and a copy of it  returned. If only one argument is passed, a pointer to the tokenizer  implementation currently registered as <tokenizer-name> is returned,  encoded as a blob. Or, if no such tokenizer exists, an SQL exception  (error) is raised.  SECURITY: If the fts2 extension is used in an environment where potentially    malicious users may execute arbitrary SQL (i.e. gears), they should be    prevented from invoking the fts2_tokenizer() function, possibly using the    authorisation callback.  See "Sample code" below for an example of calling the fts2_tokenizer()  function from C code.3. ICU Library Tokenizers  If this extension is compiled with the SQLITE_ENABLE_ICU pre-processor   symbol defined, then there exists a built-in tokenizer named "icu"   implemented using the ICU library. The first argument passed to the  xCreate() method (see fts2_tokenizer.h) of this tokenizer may be  an ICU locale identifier. For example "tr_TR" for Turkish as used  in Turkey, or "en_AU" for English as used in Australia. For example:    "CREATE VIRTUAL TABLE thai_text USING fts2(text, tokenizer icu th_TH)"  The ICU tokenizer implementation is very simple. It splits the input  text according to the ICU rules for finding word boundaries and discards  any tokens that consist entirely of white-space. This may be suitable  for some applications in some locales, but not all. If more complex  processing is required, for example to implement stemming or   discard punctuation, this can be done by creating a tokenizer   implementation that uses the ICU tokenizer as part of it's implementation.  When using the ICU tokenizer this way, it is safe to overwrite the  contents of the strings returned by the xNext() method (see  fts2_tokenizer.h).4. Sample code.  The following two code samples illustrate the way C code should invoke  the fts2_tokenizer() scalar function:      int registerTokenizer(        sqlite3 *db,         char *zName,         const sqlite3_tokenizer_module *p      ){        int rc;        sqlite3_stmt *pStmt;        const char zSql[] = "SELECT fts2_tokenizer(?, ?)";              rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, 0);        if( rc!=SQLITE_OK ){          return rc;        }              sqlite3_bind_text(pStmt, 1, zName, -1, SQLITE_STATIC);        sqlite3_bind_blob(pStmt, 2, &p, sizeof(p), SQLITE_STATIC);        sqlite3_step(pStmt);              return sqlite3_finalize(pStmt);      }            int queryTokenizer(        sqlite3 *db,         char *zName,          const sqlite3_tokenizer_module **pp      ){        int rc;        sqlite3_stmt *pStmt;        const char zSql[] = "SELECT fts2_tokenizer(?)";              *pp = 0;        rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, 0);        if( rc!=SQLITE_OK ){          return rc;        }              sqlite3_bind_text(pStmt, 1, zName, -1, SQLITE_STATIC);        if( SQLITE_ROW==sqlite3_step(pStmt) ){          if( sqlite3_column_type(pStmt, 0)==SQLITE_BLOB ){            memcpy(pp, sqlite3_column_blob(pStmt, 0), sizeof(*pp));          }        }              return sqlite3_finalize(pStmt);      }

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -