📄 libtour.html
字号:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE></TITLE>
<META NAME="Author" CONTENT="Joe Hellerstein">
<META NAME="GENERATOR" CONTENT="Mozilla/3.0b5aGold (WinNT; I) [Netscape]">
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#FF0000" VLINK="#800080" ALINK="#0000FF">
<H1>A Guided Tour of Sample Code<IMG SRC="gist.gif" HEIGHT=194 WIDTH=254 ALIGN=RIGHT></H1>
<CENTER><P>
<HR WIDTH="100%"></P></CENTER>
<P>The libGiST package is a C++ library that provides the basic GiST functionality.
By itself it does nothing, since some of the important methods are <I>pure
virtual</I>. However, with a little coding on your part, you can make it
index data in a variety of ways, and support just about any query.</P>
<P>We have provided some examples to get you started. The RTree directory
contains code which uses libGiST as the basis for a simple R-tree-like
storage/retrieval system that indexes 2-dimensional box data, and supports
natural geographic queries. The RSTree directory provides the same
functionality as the RTree directory, but uses the more efficient (and
complex) R*-tree techniques invented by Beckman, et al. [BKSS90].
The BTree directory contains code which uses libGiST to implement a simple
B+-tree storage/retrieval system that indexes textual data, to support
alphabetic equality and range queries.</P>
<P>As an introduction to libGiST, we will step through the header files
for each of these small projects, which will introduce you to increasingly
powerful features of libGiST. We will go through the simplest project,
RTree, in some detail, and then just describe the features of RSTree and
BTree that are different from RTree. Probably the easiest way to
use libGiST is to cannibalize some of this sample code, so it's worth looking
at carefully.</P>
<CENTER><P>
<HR WIDTH="100%"></P></CENTER>
<H2 ALIGN=CENTER>RTree</H2>
<P>R-trees were first proposed by Guttman [Gut84] to index geographic data
(points, lines, and polygons), in order to efficiently find data that satisfy
geographic predicates (i.e., <TT>key {contains,contained-in,overlaps,equals}
constant</TT>). The keys in a (2D) R-tree are "bounding boxes",
i.e. rectangles which are aligned with the x and y axes. A key signifies
that all objects stored below it fall within that geographic area. If you
want more detail on R-trees, Guttman's paper makes for fairly accessible
reading.</P>
<P>In order to extend libGiST to support R-tree behavior, we need four
of the basic GiST methods mentioned in the original GiST paper:</P>
<UL>
<LI>Consistent: compares a query rectangle and a key rectangle to see if
they correspond in a way appropriate to the query predicate. One tricky
bit in R-trees is that entries on internal nodes are treated differently
than entries at the leaves. (<TT>RTpredicate::Consistent(const RTpredicate
&)i</TT>n RTpredicate.h).</LI>
<LI>Union: given a set of geographic objects, finds the minimal bounding
box containing them all. (<TT>GiSTentry *RTnode::Union() </TT>in
RTnode.h)</LI>
<LI>Penalty: returns a <TT>GiSTPenalty</TT> (just a <TT>double</TT>, really)
representing the "badness" of inserting a new entry below this
entry -- for R-trees, this is measured by the increase in area of this
entry's bounding box required to accomodate the new entry. (<TT>GiSTpenalty
*RTentry::Penalty(const GiSTentry)</TT> in RTentry.h)</LI>
<LI>PickSplit: Guttman proposes 3 different algorithms in his paper; we
use his polynomial-time algorithm in our implementation. ( <TT>GiSTnode
*RTnode::PickSplit()</TT> in RTnode.h)</LI>
</UL>
<P>We will also need various utility methods like constructors and destructors
for our RTree classes, which inherit most (but not all!) of their behavior
from libGiST. We won't talk about the constructors and destructors, which
are pretty straightforward, but the rest of the simple utility methods
are as follows (organized by file):</P>
<UL>
<LI>RT.h: contains the <TT>RT</TT> class for R-trees, which inherits from
the <TT>GiST</TT> class of libGiST</LI>
<UL>
<LI><TT>GiSTobjid RT::IsA(): </TT>Not strictly necessary, but useful
for debugging. All classes which inherit from the GiSTobject class
can define an <TT>IsA</TT> method to identify themselves. The choice of
object tags is in <TT>libGiST/GiSTdefs.h</TT> if you wish to extend it.</LI>
<LI><TT>GiSTnode *RT::CreateNode()</TT>: This constructor method is
<I>critical</I>, as it ensures that our trees are actually instances of
the RT class, and hence will use the appropriate RT methods. <B>N.B.: </B>Your
programs will compile but <I>probably not work</I> if you don't define
constructors and other basic allocation utilties for all your classes!!
This is because libGiST makes heavy use of the virtual function mechanism
of C++, which requires that objects be allocated within their appropriate
subclasses.</LI>
<LI><TT>GiSTstore *RT::CreateStore(): </TT>this method identifies and passes
control to the storage subsystem of your choice. libGiST is distributed
with one storage subsystem, GiSTfile, which uses the file system to store
indices. Smarter storage systems (e.g. DBMSs of various sorts, or
buffer management packages) may be plugged in via inheritance to improve
the speed and/or functionality of your code.</LI>
</UL>
<LI>RTnode.h: defines the <TT>RTnode</TT> class for nodes in an R-tree,
which inherits from <TT>GiSTnode</TT></LI>
<UL>
<LI><TT>GiSTobjid RTnode::IsA()</TT>: as above</LI>
<LI><TT>GiSTobject *RTnode::Copy()</TT>: as above, <I>crucial</I> to the
correctness of your code.</LI>
<LI><TT>GiSTnode *RTnode::PickSplit()</TT> : One of the basic GiST methods.
Note that the splitting is effected by actually deleting entries from <TT>*this</TT>,
and putting them on the page which is returned by the method.</LI>
<LI><TT>GiSTentry *RTnode::Union()</TT>: another of the basic GiST methods
described above</LI>
<LI><TT>GiSTentry *RTnode::CreateEntry():</TT> used to generate an entry
of the appropriate subclass of <TT>GiSTentry</TT>, which in this case is
<TT>RTentry</TT>. <I> Like the </I><TT>Copy</TT><I> methods, this
is crucial!</I></LI>
<LI>i<TT>nt RTnode::FixedLength()</TT>: tells libGiST two things -- that
the keys in our nodes have a fixed length, and what that length is. If
our entries were of variable length (as will be the case in BTree below),
this method would return 0.</LI>
</UL>
<LI>RTentry.h: defines the <TT>RTentry </TT>clas for entries in an R-tree,
which inherits from <TT>GiSTnode</TT>. Also defines the <TT>RTkey</TT>
class, which represents bounding boxes. <B>N.B.:</B> The <TT>GiSTentry</TT>
class contains a member <TT>GiSTobject *key</TT>, which is used to hold
keys. So our <TT>RTkey</TT> class must inherit from <TT>GiSTobject.</TT></LI>
<UL>
<LI><TT>RTkey::Copy()</TT>: again, <I>critical.</I></LI>
<LI>A variety of bounding-box methods, which are relatively self-explanatory.</LI>
<LI><TT>RTkey &RTentry::Key()</TT>: casts the key member, which is
of class <TT>GiSTobject </TT>to class <TT>RTkey.</TT> Used to prevent
compiler warnings.</LI>
<LI><TT>GiSTobject *RTentry::Copy()</TT>: as above, you can't do without
it.</LI>
<LI><TT>void RTentry::InitKey()</TT>: this method allocates a new key for
the entry -- like the <TT>Copy</TT> methods, it's crucial that it return
an instance of the appropriate key subclass (in this case, <TT>RTkey</TT>).</LI>
<LI><TT>GiSTpenalty * RTentry::Penalty(const GiSTentry &newEntry)</TT>:
as discussed above</LI>
<LI><TT>int RTentry::CompressedLength()</TT>: since libGiST supports compressed
keys (discussed in BTree below), this method is required to calculate the
length of this <I>key</I> (not the entry) when compressed. In this case
it's just <TT>sizeof(RTkey)</TT>.</LI>
<LI>A variety of other methods for accessing/setting properties of the
key.</LI>
</UL>
<LI>RTpredicate.h: Home of the <TT>RTpredicate</TT> class, which contains
the <TT>Consistent</TT> method. In addition, <TT>RTpredicate</TT> has two
important members, <TT>oper</TT> and <TT>value</TT>. The former distinguishes
between the different box-comparison operators (<TT>Contains, Contained,
Overlap, Equal)</TT> and the latter holds the bounding box to which we're
comparing.</LI>
</UL>
<P>A quick look at the .cpp files will show that there's little more in
them than what's described above -- the only method of any complexity is
the PickSplit algorithm, which is taken directly from Guttman's paper.</P>
<CENTER><P>
<HR WIDTH="100%"></P></CENTER>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -