📄 libtour.html
字号:
<H2 ALIGN=CENTER>RSTree</H2>
<P>The R*-tree is very much like the R-tree, having identical keys and
supporting identical query predicates. But it contains some optimizations
that often make t outperform simple R-trees during lookup, by paying for
some additional complexity during insertion. First, it has different Penalty
and PickSplit methods. Second, it uses a technique known as <I>Forced Reinsert</I>
to keep the tree fuller and better balanced than an R-tree.</P>
<P>Forced reinsert works as follows -- when an entry is inserted into a
full node, some percentage of the entries on that node are deleted, and
reinserted into the tree. If the node is at a non-leaf level, the reinsertion
places the deleted entries back into nodes at that level. If reinsertion
discovers a full node, then it splits it as usual. This sporadic reinsertion
of elements accomodates changes in the data distribution as the tree fills
in.</P>
<P>Forced reinsert is supported by libGiST, and the RSTree implementation
simply takes advantage of it. The PickSplit and Penalty methods for
RSTree are a bit complicated, as proposed by the inventors of the R*-tree.</P>
<UL>
<LI>RT.h:</LI>
<UL>
<LI><TT>int RT::ForcedReinsert()</TT>: this method is overriden here to
return 1 rather than 0, turning on libGiST's forced reinsert logic.</LI>
<LI><TT>GiSTlist<GiSTentry*> RT::RemoveTop(GiSTnode *node)</TT>:
this method deletes the top k% of entries on <TT>node</TT>, where k = <TT>GiST::RemoveRatio()</TT>,
and returns them in a linked list. Beckmann et al. recommend removing the
k% furthest from the center of the bounding box of the node. The default
behavior in <TT>GiST::RemoveTop</TT> is just to remove the first k% of
entries from the left of the node, so we overload it here to use Beckmann,
et al's technique.</LI>
</UL>
<LI>RTentry.h:</LI>
<UL>
<LI>class <TT>RTpenalty</TT>: the standard <TT>GiSTPenalty</TT> class is
just a container for a double. The R*-tree paper specifies a way of resolving
ties if two penalties are equal, and yet <I>another</I> technique for resolving
the tie if the first tie-breaker doesn't help. In order to support this,
the <TT>RTpenalty</TT> class contains 3 doubles, and its comparison methods
break ties accordingly.</LI>
</UL>
</UL>
<P>These are the only significant differences between RSTree and RTree.</P>
<CENTER><P>
<HR WIDTH="100%"></P></CENTER>
<H2 ALIGN=CENTER>BTree</H2>
<P>B+-trees have 3 fundamental differences from R-trees:</P>
<OL>
<LI>the keys in a B+-tree are in ascending order from left to right on
each page.</LI>
<LI>range scans in a B+-tree do not traverse the tree; rather, they find
the leftmost item and scan across the leaves of the tree until they exceed
the right boundary of the range.</LI>
<LI>in a B+-tree node, there are 1 fewer keys than there are pointers</LI>
</OL>
<P>The first of these two issues are handled in our BTree implementation
by informing libGiST that the tree has the IsOrdered property. This
is described below. The third of these issues is handled via <I>key
compression</I>: two virtual methods of GiSTentry -- Compress and Decompress
-- were not used in our RTree implementation, but are added for BTree.
They allow keys to be compressed on disk to take up less space and improve
fanout. When keys are read off disk, they are decompressed, and before
they are written back, they are compressed. The BTree implementation essentially
compresses the first key on a node to nothing, which effectively gives
the standard B+-tree layout of <I>n-1</I> keys for <I>n</I> pointers.</P>
<P>We now describe the salient parts of the BTree implementation that make
it differ from the RTree and RSTree implementations. First, of course,
the keys in a BTree are different -- in uncompressed format, they are alphabetic
ranges of words (our demo works on textual data -- see the <TT>BTkey</TT>
class in BTentry.h). The operators that can appear in BTpredicates are
the standard ordering operators: <TT><, <=, =, >=, >; </TT>otherwise,
BTpredicates are very similar to RTpredicates. Further differences
are described by file:</P>
<UL>
<LI>BT.h:</LI>
<UL>
<LI><TT>BT::IsOrdered()</TT>: this method, inherited from the <TT>GiST</TT>
class, defaults to returning 0, signifying FALSE. In the case of BTree,
it returns 1, signifying TRUE, to tell libGiST to ensure that keys are
ordered on each page, and that range queries are handled without the usual
depth-first traversal<SUP>1</SUP>.</LI>
</UL>
<LI>BTnode.h</LI>
<UL>
<LI><TT>BTnode::Insert(const GiSTentry &entry)</TT>: This routine includes
an assertion for debugging purposes, and then calls its ancestor method
<TT>GiSTnode::Insert</TT>. You could remove this descendant method
without harm, but it's better to leave it in to prevent problems after
modification of the code.</LI>
<LI><TT>BTnode::InsertBefore(const GiSTentry &entry, int index)</TT>:
This routine requires some explanation. Its ancestor, <TT>GiSTnode::InsertBefore</TT>,<TT>
</TT>simply inserts entry <TT>entry</TT> onto this node before the entry
at the <TT>index</TT>'th spot, shuffling that entry and its neighbors over
to the right. However, <TT>BTnode::InsertBefore</TT> overrides this behavior
with some extra detail.</LI>
<P>In uncompressed format, BTree keys are pairs of words demarcating a
range, closed on the left end and open on the right (e.g. the range <TT>["albacore",
"apple")</TT> contains albacore, all the words between albacore
and apple, but not apple). The leftmost entry (at index 0) has a left end
of <TT>-infinity</TT>, and the rightmost entry has a right end of <TT>infinity</TT>.
All other entries have a left end equal to their left neighbor's right
end, and a right end equal to their right neighbors left end. <TT>BTnode::InsertBefore</TT>
ensures that these properties remain true after insertion.</P>
<P>(<B>N.B.:</B> The above is not true for leaf nodes, which have keys
that are ranges containing <I>only</I> the value of the item, e.g. <TT>["albacore",
"albacore)</TT>. This is why <TT>GiSTnode::Insert </TT>works for leaf
nodes but not for internal nodes.)</P>
<LI><TT>BTnode::Coalesce(const GiSTnode& node, const GiSTentry&
entry)</TT>: This method is also overridden to ensure the properties above.
We are moving entries from <TT>node</TT>, which is to our right, to <TT>this</TT>.
If <TT>this</TT> is a non-empty internal page, the rightmost entry on <TT>this</TT>
has an upperbound of +inf, and the leftmost entry on <TT>node</TT> has
a lower bound of -inf. This code takes care of that -- both these values
should be set to the right bound of <TT>entry (</TT>our parent entry).</LI>
</UL>
<LI>BTentry.h</LI>
<UL>
<LI>class <TT>BTkey</TT> is a simple string class, with special cases for
representing negative and positive infinity</LI>
<LI>class <TT>BTintkey</TT> is a simple class for pairs of <TT>BTkey</TT>s,
representing a range. It is the key type for <TT>BTentry</TT>.</LI>
<LI><TT>GiSTcompressedEntry BTEntry::Compress()</TT>: compresses a key
and copies the pointer into the result. In this case, if it's the leftmost
entry on the node it compresses down to 0 bytes. Any other entry
compresses down to its lower boundary.</LI>
<LI><TT>void BTEntry::Decompress(const GiSTcompressedEntry entry)</TT>:
decompresses a key and copies the pointer into the result. In this case,
it reverses the logic of compress by peeking at the neighbor to the right
to get the upper bound. If this is the leftmost entry, the lower
bound is <TT>-inf</TT>.</LI>
</UL>
</UL>
<P>Otherwise there should be no great surprises in the BTree code.</P>
<CENTER><P>
<HR WIDTH="100%"></P></CENTER>
<H2 ALIGN=CENTER>Writing your own methods</H2>
<P>If you've made it this far, you should be ready to develop your own
indexing code using libGiST. If you have ordered data and mostly
are supporting range predicates, you should probably start with the BTree
sample code and modify it to suit. If you have unordered data, it's
probably best to start with the RTree code, and when you get things working,
try adding Forced Resinsert (a la RSTree) and see if it helps.</P>
<P>
<HR WIDTH="100%"></P>
<P><SUP>1</SUP>In v0.9b1 range queries are handled depth-first even if
<TT>IsOrdered()</TT> returns 1. This is a bug, which we hope to fix in
a subsequent release.</P>
</BODY>
</HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -