📄 libtour.html

📁 FastDb是高效的内存数据库系统
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
<H2 ALIGN=CENTER>RSTree</H2><P>The R*-tree is very much like the R-tree, having identical keys andsupporting identical query predicates. But it contains some optimizationsthat often make t outperform simple R-trees during lookup, by paying forsome additional complexity during insertion. First, it has different Penaltyand PickSplit methods. Second, it uses a technique known as <I>Forced Reinsert</I>to keep the tree fuller and better balanced than an R-tree.</P><P>Forced reinsert works as follows -- when an entry is inserted into afull node, some percentage of the entries on that node are deleted, andreinserted into the tree. If the node is at a non-leaf level, the reinsertionplaces the deleted entries back into nodes at that level. &nbsp;If reinsertiondiscovers a full node, then it splits it as usual. This sporadic reinsertionof elements accomodates changes in the data distribution as the tree fillsin.</P><P>Forced reinsert is supported by libGiST, and the RSTree implementationsimply takes advantage of it. &nbsp;The PickSplit and Penalty methods forRSTree are a bit complicated, as proposed by the inventors of the R*-tree.</P><UL><LI>RT.h:</LI><UL><LI><TT>int RT::ForcedReinsert()</TT>: this method is overriden here toreturn 1 rather than 0, turning on libGiST's forced reinsert logic.</LI><LI><TT>GiSTlist&lt;GiSTentry*&gt; RT::RemoveTop(GiSTnode *node)</TT>:this method deletes the top k% of entries on <TT>node</TT>, where k = <TT>GiST::RemoveRatio()</TT>,and returns them in a linked list. Beckmann et al. recommend removing thek% furthest from the center of the bounding box of the node. The defaultbehavior in <TT>GiST::RemoveTop</TT> is just to remove the first k% ofentries from the left of the node, so we overload it here to use Beckmann,et al's technique.</LI></UL><LI>RTentry.h:</LI><UL><LI>class <TT>RTpenalty</TT>: the standard <TT>GiSTPenalty</TT> class isjust a container for a double. The R*-tree paper specifies a way of resolvingties if two penalties are equal, and yet <I>another</I> technique for resolvingthe tie if the first tie-breaker doesn't help. In order to support this,the <TT>RTpenalty</TT> class contains 3 doubles, and its comparison methodsbreak ties accordingly.</LI></UL></UL><P>These are the only significant differences between RSTree and RTree.</P><CENTER><P><HR WIDTH="100%"></P></CENTER><H2 ALIGN=CENTER>BTree</H2><P>B+-trees have 3 fundamental differences from R-trees:</P><OL><LI>the keys in a B+-tree are in ascending order from left to right oneach page.</LI><LI>range scans in a B+-tree do not traverse the tree; rather, they findthe leftmost item and scan across the leaves of the tree until they exceedthe right boundary of the range.</LI><LI>in a B+-tree node, there are 1 fewer keys than there are pointers</LI></OL><P>The first of these two issues are handled in our BTree implementationby informing libGiST that the tree has the IsOrdered property. &nbsp;Thisis described below. &nbsp;The third of these issues is handled via <I>keycompression</I>: two virtual methods of GiSTentry -- Compress and Decompress-- were not used in our RTree implementation, but are added for BTree.They allow keys to be compressed on disk to take up less space and improvefanout. &nbsp;When keys are read off disk, they are decompressed, and beforethey are written back, they are compressed. The BTree implementation essentiallycompresses the first key on a node to nothing, which effectively givesthe standard B+-tree layout of <I>n-1</I> keys for <I>n</I> pointers.</P><P>We now describe the salient parts of the BTree implementation that makeit differ from the RTree and RSTree implementations. &nbsp;First, of course,the keys in a BTree are different -- in uncompressed format, they are alphabeticranges of words (our demo works on textual data -- see the <TT>BTkey</TT>class in BTentry.h). The operators that can appear in BTpredicates arethe standard ordering operators: <TT>&lt;, &lt;=, =, &gt;=, &gt;; </TT>otherwise,BTpredicates are very similar to RTpredicates. &nbsp;Further differencesare described by file:</P><UL><LI>BT.h:</LI><UL><LI><TT>BT::IsOrdered()</TT>: this method, inherited from the <TT>GiST</TT>class, defaults to returning 0, signifying FALSE. In the case of BTree,it returns 1, signifying TRUE, to tell libGiST to ensure that keys areordered on each page, and that range queries are handled without the usualdepth-first traversal<SUP>1</SUP>.</LI></UL><LI>BTnode.h</LI><UL><LI><TT>BTnode::Insert(const GiSTentry &amp;entry)</TT>: This routine includesan assertion for debugging purposes, and then calls its ancestor method<TT>GiSTnode::Insert</TT>. &nbsp;You could remove this descendant methodwithout harm, but it's better to leave it in to prevent problems aftermodification of the code.</LI><LI><TT>BTnode::InsertBefore(const GiSTentry &amp;entry, int index)</TT>:This routine requires some explanation. &nbsp;Its ancestor, <TT>GiSTnode::InsertBefore</TT>,<TT></TT>simply inserts entry <TT>entry</TT> onto this node before the entryat the <TT>index</TT>'th spot, shuffling that entry and its neighbors overto the right. However, <TT>BTnode::InsertBefore</TT> overrides this behaviorwith some extra detail.</LI><P>In uncompressed format, BTree keys are pairs of words demarcating arange, closed on the left end and open on the right (e.g. the range <TT>[&quot;albacore&quot;,&quot;apple&quot;)</TT> contains albacore, all the words between albacoreand apple, but not apple). The leftmost entry (at index 0) has a left endof <TT>-infinity</TT>, and the rightmost entry has a right end of <TT>infinity</TT>.All other entries have a left end equal to their left neighbor's rightend, and a right end equal to their right neighbors left end. <TT>BTnode::InsertBefore</TT>ensures that these properties remain true after insertion.</P><P>(<B>N.B.:</B> The above is not true for leaf nodes, which have keysthat are ranges containing <I>only</I> the value of the item, e.g. <TT>[&quot;albacore&quot;,&quot;albacore)</TT>. This is why <TT>GiSTnode::Insert </TT>works for leafnodes but not for internal nodes.)</P><LI><TT>BTnode::Coalesce(const GiSTnode&amp; node, const GiSTentry&amp;entry)</TT>: This method is also overridden to ensure the properties above.We are moving entries from <TT>node</TT>, which is to our right, to <TT>this</TT>.If <TT>this</TT> is a non-empty internal page, the rightmost entry on <TT>this</TT>has an upperbound of +inf, and the leftmost entry on <TT>node</TT> hasa lower bound of -inf. This code takes care of that -- both these valuesshould be set to the right bound of <TT>entry (</TT>our parent entry).</LI></UL><LI>BTentry.h</LI><UL><LI>class <TT>BTkey</TT> is a simple string class, with special cases forrepresenting negative and positive infinity</LI><LI>class <TT>BTintkey</TT> is a simple class for pairs of <TT>BTkey</TT>s,representing a range. It is the key type for <TT>BTentry</TT>.</LI><LI><TT>GiSTcompressedEntry BTEntry::Compress()</TT>: compresses a keyand copies the pointer into the result. In this case, if it's the leftmostentry on the node it compresses down to 0 bytes. &nbsp;Any other entrycompresses down to its lower boundary.</LI><LI><TT>void BTEntry::Decompress(const GiSTcompressedEntry entry)</TT>:decompresses a key and copies the pointer into the result. In this case,it reverses the logic of compress by peeking at the neighbor to the rightto get the upper bound. &nbsp;If this is the leftmost entry, the lowerbound is <TT>-inf</TT>.</LI></UL></UL><P>Otherwise there should be no great surprises in the BTree code.</P><CENTER><P><HR WIDTH="100%"></P></CENTER><H2 ALIGN=CENTER>Writing your own methods</H2><P>If you've made it this far, you should be ready to develop your ownindexing code using libGiST. &nbsp;If you have ordered data and mostlyare supporting range predicates, you should probably start with the BTreesample code and modify it to suit. &nbsp;If you have unordered data, it'sprobably best to start with the RTree code, and when you get things working,try adding Forced Resinsert (a la RSTree) and see if it helps.</P><P><HR WIDTH="100%"></P><P><SUP>1</SUP>In v0.9b1 range queries are handled depth-first even if<TT>IsOrdered()</TT> returns 1. This is a bug, which we hope to fix ina subsequent release.</P></BODY></HTML>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -