readme.tsearch2

来自「PostgreSQL 8.2中增加了很多企业用户所需要的功能和性能上的提高,其开」· TSEARCH2 代码 · 共 211 行
TSEARCH2
211 行
Tsearch2 - full text search extension for PostgreSQL   [1]Online version of this document is available   Tsearch2  -  is the full text engine, fully integrated into PostgreSQL   RDBMS.Main features     * Full online update     * Supports multiple table driven configurations     * flexible  and  rich linguistic support (dictionaries, stop words),       thesaurus     * full multibyte (UTF-8) support     * Sophisticated  ranking  functions  with  support  of proximity and       structure information (rank, rank_cd)     * Index support (GiST and Gin) with concurrency and recovery support     * Rich query language with query rewriting support     * Headline support (text fragments with highlighted search terms)     * Ability to plug-in custom dictionaries and parsers     * Template  generator  for  tsearch2  dictionaries  with [2]snowball       stemmer support     * It is mature (5 years of development)   Tsearch2,  in a nutshell, provides FTS operator (contains) for the new   data  types,  representing  document  (tsvector)  and query (tsquery).   Table  driven  configuration  allows creation of custom searches using   standard SQL commands.   tsvector is a searchable data type, representing document. It is a set   of  unique  words  along  with  their  positional  information  in the   document,  organized  in a special structure optimized for fast access   and  lookup. Each entry could be labelled to reflect its importance in   document.   tsquery  is  a  data  type for textual queries with support of boolean   operators.  It  consists of lexemes (optionally labelled) with boolean   operators between.   Table driven configuration allows to specify:     * parser, which used to break document onto lexemes     * what lexemes to index and the way they are processed     * dictionaries to be used along with stop words recognition.OpenFTS vs Tsearch2   [3]OpenFTS  is  a middleware between application and database. OpenFTS   uses  tsearch2  as  a  storage and database engine as a query executor   (searching).   Everything  else,  i.e.  parsing  of  documents,  query   processing, linguistics, carry outs on client side. That's why OpenFTS   has  its own configuration table (fts_conf) and works with its own set   of dictionaries. OpenFTS is more flexible, because it could be used in   multi-server  architecture  with  separate  machines for repository of   documents  (documents  could  be  stored  in filesystem), database and   query engine.   See [4]Documentation Roadmap for links to documentation.Authors     * Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia     * Teodor Sigaev <teodor@sigaev.ru>, Moscow,Moscow University,RussiaContributors     * Robert   John   Shepherd   and   Andrew   J.   Kopciuch  submitted       "Introduction  to  tsearch" (Robert - tsearch v1, Andrew - tsearch       v2)     * Brandon   Craig   Rhodes  wrote  "Tsearch2  Guide"  and  "Tsearch2       Reference" and proposed new naming convention for tsearch V2Sponsors     * ABC Startsiden - compound words support     * University of Mannheim for UTF-8 support (in 8.2)     * jfg:networks ([5]http:www.jfg-networks.com/) for Gin - Generalized       Inverted index (in 8.2)     * Georgia  Public  Library  Service  and LibLime, Inc. for Thesaurus       dictionary     * PostGIS community - GiST Concurrency and Recovery   The  authors are grateful to the Russian Foundation for Basic Research   and Delta-Soft Ltd., Moscow, Russia for support.Limitations     * Length of lexeme < 2K     * Length of tsvector (lexemes + positions) < 1Mb     * The number of lexemes < 4^32     * 0< Positional information < 16383     * No more than 256 positions per lexeme     * The number of nodes ( lexemes + operations) in tsquery < 32768References     * GiST development site -       [6]http://www.sai.msu.su/~megera/postgres/gist     * GiN development - [7]http://www.sigaev.ru/gin/     * OpenFTS home page - [8]http://openfts.sourceforge.net/     * Mailing list -       [9]http://sourceforge.net/mailarchive/forum.php?forum=openfts-gene       ralDocumentation Roadmap     * Several docs are available from docs/ subdirectory          + "Tsearch V2 Introduction" by Andrew Kopciuch          + "Tsearch2 Guide" by Brandon Rhodes          + "Tsearch2 Reference" by Brandon Rhodes     * Readme.gendict in gendict/ subdirectory          + Also, check [10]Gendict tutorial     * Check [11]tsearch2 Wiki pages for various documentationSupport   Authors  urgently  recommend  people  to  use  [12]openfts-general  or   [13]pgsql-general mailing lists for questions and discussions.Development History   Latest news   To the PostgreSQL 8.2 release we added:     * multibyte (UTF-8) support     * Thesaurus dictionary     * Query rewriting     * rank_cd  relevation  function  now  support  different  weights of       lexemes     * GiN support adds scalability of tsearch2   Pre-tsearch era          Development  of  OpenFTS  began in 2000 after realizing that we          need  a  search engine optimized for online updates with access          to  metadata  from  the  database. This is essential for online          news agencies, web portals, digital libraries, etc. Most search          engines  available utilize an inverted index which is very fast          for  searching  but  very  slow for online updates. Incremental          updates  of  an  inverted  index  is a complex engineering task          while  we  needed something light, free and with the ability to          access  metadata  from  the  database. The last requirement was          very important because in a real life application search engine          should  always  consult  metadata  (  topic,  permissions, date          range,  version,  etc.).  We  extensively  use  PostgreSQL as a          database  backend and have no intention to move from it, so the          problem  was  to find a data structure and a fast way to access          it.  PostgreSQL  has  rather  unique data type for storing sets          (think  about  words) - arrays, but lacks index access to them.          During our research we found a paper of Joseph Hellerstein, who          introduced  an  interesting  data structure suitable for sets -          RD-tree  (Russian  Doll  tree). Further research lead us to the          idea to use GiST for implementing RD-tree, but at that time the          GiST  code  was untouched for a long time and contained several          bugs.  After  work  on  improving  GiST  for  version  7.0.3 of          PostgreSQL  was done, we were able to implement RD-Tree and use          it  for index access to arrays of integers. This implementation          was  ideally  suited  for  small  arrays and eliminated complex          joins,  but  was practically useless for indexing large arrays.          The  next improvement came from an idea to represent a document          by  a  single bit-signature, a so-called superimposed signature          (see "Index Structures for Databases Containing Data Items with          Set-valued  Attributes",  1997,  Sven  Helmer  for details). We          developed  the  contrib/intarray  module and used it for full          text indexing.   tsearch v1          It was inconvenient to use integer id's instead of words, so we          introduced  a new data type called 'txtidx' - a searchable data          type  (textual)  with  indexed access. This was a first step of          our  work  on  an  implementation of a built-in PostgreSQL full          text search engine. Even though tsearch v1 had many features of          a  search  engine it lacked configuration support and relevance          ranking.  People were encouraged to use OpenFTS, which provided          relevance  ranking based on positional information and flexible          configuration.  OpenFTS  v.0.34  is  the  last version based on          tsearch v1.   tsearch V2          People  recognized  tsearch  as  a  powerful tool for full text          searching  and  insisted  on  adding  ranking  support,  better          configurability,  etc.  We already thought about moving most of          the  features  of  OpenFTS to tsearch, and in the early 2003 we          decided  to  work  on  a  new  version of tsearch. We abandoned          auxiliary  index  tables  which  were  used by OpenFTS to store          positional  information  and  modified the txtidx type to store          them  internally.  We added table-driven configuration, support          of  ispell  dictionaries,  snowball stemmers and the ability to          specify  which types of lexemes to index. Now, it's possible to          generate  headlines of documents with highlighted search terms.          These  changes make tsearch more user friendly and turn it into          a  really  powerful  full  text  search  engine. Brandon Rhodes          proposed  to  rename  tsearch  functions for consistency and we          renamed  txtidx  type  to tsvector and other things as well. To          allow  users  of tsearch v1 smooth upgrade, we named the module          as tsearch2. Since version 0.35 OpenFTS uses tsearch2.References   1. http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/Tsearch_V2_Readme.html   2. http://snowball.tartarus.org/   3. http://openfts.sourceforge.net/   4. file://localhost/u/megera/WWW/postgres/gist/tsearch/V2/docs/Tsearch_V2_Readme82.html#dm   5. http:www.jfg-networks.com/   6. http://www.sai.msu.su/~megera/postgres/gist   7. http://www.sigaev.ru/gin/   8. http://openfts.sourceforge.net/   9. http://sourceforge.net/mailarchive/forum.php?forum=openfts-general  10. http://www.sai.msu.su/~megera/wiki/Gendict  11. http://www.sai.msu.su/~megera/wiki/Tsearch2  12. http://sourceforge.net/mailarchive/forum.php?forum=openfts-general  13. http://archives.postgresql.org/pgsql-general/
readme.tsearch2 - 源码说明

本页面展示了「PostgreSQL 8.2中增加了很多企业用户所需要的功能和性能上的提高,其开发团队说,该版本将加速更多企业向该数据库移植.核心开发成员之一Bruce Momjian表示,在新版PostgreSQL」中的 readme.tsearch2 源码文件，采用 TSEARCH2 编程语言编写，共 211 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与PostgreSQL相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?