📄 bushv055.txt

📁 国外网站上的一些精典的C程序
💻 TXT
📖 第 1 页 / 共 2 页
字号:
上一页 12

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ BUSH (in btree.c) _____

BUSH is a Driver Routine, which creates a B-Tree by inserting Node Data for one
Record at a time into the Bushy Tree Structure. This builds BUSHY.DAT, a B-Tree
Index for WEIRD.DAT. Note that BUSH and other functions use some Global
Variables. BUSH takes the two parameters needed to insert a single Node Element,
the ASCII String Key derived from the String in WEIRD.DAT, and the fpos_t File
Position of the corresponding Data File Record.

BUSH calls CHOPDOWN to pass control from the Root Node of the Bushy Tree to the
correct Insertion Position at Level-0. The actual Node Insertion is done by
calling function STICKIN. This process may need to be repeated, because whenever
a Node splits, another Insertion will be needed on the next Level up. This
requirement is signalled by putting more==1, and if it gets to Root Level, the
Root Node is split by calling function NEWROOT.

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ CHOPDOWN (in btree.c) _____

CHOPDOWN uses the Binary Chop method to descend the Bushy Tree Node Hierarchy,
storing Node Data at each Level in a BRANCH Structure. By checking the File
Position stored for a given Level, CHOPDOWN may detect that BRANCH already
contains the required Node Data in Memory. If not, CHOPDOWN reads the Node Data
from Disk, halting the program if it finds a Node at the wrong Level.

CHOPDOWN does not insert Node Elements. It finds the correct Insertion Position
at Level-0 for a Key, but is equally useful for finding a Key which is already
present. If Duplicate Keys exist, CHOPDOWN will locate the first one. You might
want to insert a new Node Element at that point, or start scanning successive
Keys until you find the one you wanted. You can also give CHOPDOWN the NULL
Pointer as a Key, whereupon it will automatically choose Array Position-0 for
all Nodes down the hierarchy. This is useful when scanning through consecutive
Nodes.

The Binary Chop method needs a Binary Comparison function for Strings, and this
is provided by the Collation Routine COLX7F. To find the Insertion Position at
Level-0, or the first Key in a series of Duplicates, you need to search for the
smallest Key greater than or equal to your given Key. This means that every
Level-0 Node must be represented in its Level-1 Parent by its Maximum Key. You
need to use the same Binary Chop method at Level-1, because Duplicate Keys could
occur at Level-1 too. Every Node must be represented by its Maximum Key for the
Binary Chop to work down the hierarchy. This leads to the requirement for an
Overall Maximum Key, which is the Sentinel. Without the Sentinel, the algorithm
would fail to slot any temporary maximum into the endmost Node.

CHOPDOWN starts with a given ASCII String Key and the fpos_t File Position of
the next Node to search. This could be the Root Node, or any lower Level. Within
each Node, CHOPDOWN searches for the Array Position corresponding to the given
Key, and this yields the File Position of the next Node on the Level below. The
Array Position found within any Node, will contain the smallest Key greater than
or equal to the given Key.

CHOPDOWN finishes with the Node Data at every Level stored in the BRANCH
Structure Array. The required result is the Array Position within the Node at
Level-0. Remember that Nodes on Level-0 differ from those on other Levels. The
higher-Level Nodes contain File Positions for other Nodes within the Bushy Tree
File. But the Level-0 Nodes contain File Positions of Records in the Data File
which is Indexed by the Bushy Tree. So the real result is stored within the
Global Variable BRANCH, not the Return Value of CHOPDOWN.

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ STICKIN (in btree.c) _____

STICKIN inserts a Node Element at any Level of the Bushy Tree hierarchy. If
necessary it splits Nodes, and later inserts corresponding Node Elements further
up the Bushy Tree.

STICKIN works with the Node Data contained in Memory, in the BRANCH Structure
Array. Initially STICKIN is called at Level-0. Its first task is to observe the
Insertion Position found by CHOPDOWN, and shift along all Node Elements in equal
or higher Array Positions. The new Key and File Position Data are then inserted
at the correct Array Position in the working area of the BRANCH Structure. If
the resulting Node Data will fit into one Disk Node, the corresponding Disk Node
is rewritten.

STICKIN may detect that the Node Data is now one Element too long to fit on
Disk, so it splits the Node into two Siblings. The High Half must be written to
Disk at the File Position of the old Node, because the High Half contains the
Maximum Key which is expected at that File Position. The Low Half must be
written to Disk at the next vacant B-Tree File Position, with its own Maximum
Key from the middle of the old Node. An entry for this new Node needs to be
inserted within the Parent Node, at the Insertion Position which was stored in
the BRANCH Structure by CHOPDOWN.

The Return Value of STICKIN signals this further Insertion requirement. The
Maximum Key and File Position of the Low Half Node are returned as Pointer
Parameters. Details of the Low Half Node remain stored in the BRANCH Structure.

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ NEWROOT (in btree.c) _____

NEWROOT is called when the Root Node gets full and must be split into two
Siblings, both Children of a new Root Node. Its first task is to reallocate
Dynamic Memory to add a new Level to the BRANCH Structure Array.

The new Root Node always has two Elements. Element-0 contains the Maximum Key
and File Position of the Low Half of the old Root Node, values which are
obtained from the previous call to STICKIN. Element-1 contains the Sentinel Key,
which is the Overall Maximum Key at every Level including the old Root, with the
File Position of the old Root Node, which is obtained from the B-Tree File
Header.

The New Root is written at the next vacant B-Tree File Position, and the Tree
Header is rewritten with the New Root details.

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ COLX7F (in btree.c) _____

COLX7F is a basic Collation Routine which arranges ASCII Characters below '\x7F'
in order, particularly Alphanumeric Characters (1-9,Aa-Zz). As given, COLX7F
won't collate accented letters and other non-English characters above '\x7F'.
But it will collate punctuation in a reasonable way, with Separators towards the
beginning of the Collation Sequence. Obviously it isn't very useful to collate
Control Characters except for Testing Purposes, as I did here.

COLX7F uses a Lookup Table to convert ASCII Values to Collation Rankings. Then
it compares the Collation Rankings of successive Characters, generally operating
like a clone of strncmp. The Collation Sequence is documented in a Comma
Separated Variable File COLX7F.CSV, which you can feed into your Database or
Spreadsheet to devise modifications, or extend the method to your desired Code
Page.

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ TREESCAN (in treescan.c) _____

TREESCAN is the Driver Routine for validating the Bushy Tree. It opens the Data
File and the Bushy Tree File, reading both File Headers and the Sentinel Record.
A BRANCH Structure Array is set up for the known number of Levels in the Bushy
Tree. Like the other functions in treescan.c, TREESCAN uses a number of Global
Variables, which are deliberately distinguished from the Global Variables in
btree.c.

TREESCAN calls NODETREK to traverse the entire Bushy Tree, checking that all
Bushy Tree Key references are located correctly in the Data File. Then TREESCAN
goes through the Data File, calling SCANZERO to ensure that all Records
including Duplicates are found in the Bushy Tree. TREESCAN displays the time
taken for both these operations.

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ NODETREK (in treescan.c) _____

NODETREK scans all Nodes in the Bushy Tree hierarchy Recursively, though without
using Recursive calls to itself. NODETREK also writes CHECKORD.DAT, an
Externally Sorted version of WEIRD.DAT showing the String Keys collated in
order.

NODETREK calls CHOPDOWN with the Search Key set to NULL, which simply descends
the B-Tree hierarchy via the first Element of every Node. All Node Data for this
path is stored in the BRANCH Structure Array.

At Level-0, each Node is scanned. Every Element is compared to its predecessor
and its corresponding Record in the Data File. If the two Elements do not
conform to the Collation Sequence of COLX7F, or if a Key does not correspond to
the ASCII String in the Data File, the process stops with an Error Message. The

Data File Record is then written to CHECKORD.DAT, giving an Externally Sorted
version of WEIRD.DAT in Human Readable form. Now you can see the Sentinel Record
at the end, with its peculiar Rubout Symbol for '\x7F'.

At the end of each Level-0 Node, the algorithm ascends the Bushy Tree hierarchy
until it detects an incompletely scanned Node at a higher Level. Along this
path, it compares the End-Element Keys of Nodes to each other and to the Key of
the final Higher Level Node. If these keys are not all identical, the process
stops with an Error Message, because every Node should be referenced by its
Maximum Key.

NODETREK then increments the Array Position within the incompletely scanned
Node, and loops back to call CHOPDOWN and descend the Node hierarchy again to
repeat the process. In this way the Bushy Tree is scanned in a Recursive
pattern, but using the BRANCH Structure Array for storage rather than the Stack.

NODETREK terminates when the Root Node and its Children have all been scanned,
and the algorithm attempts to ascend past the Root Level. The Key comparisons up
this final path, should all match the Sentinel Key '\x7F', as the Sentinel
should signal an End Of Records condition. The process stops with another Error
Message if this does not happen.

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ SCANZERO (in treescan.c) _____

SCANZERO locates Keys at Level-0 of the Bushy Tree File, then scans through the
Node and subsequent Nodes if necessary, to find Duplicate Keys.

Initially SCANZERO calls CHOPDOWN to locate the first occurrence of the Key at
Level-0. CHOPDOWN stores all the Node Data for its path down the B-Tree
hierarchy, in the BRANCH Structure Array. SCANZERO then scans for a matching
Key, returning 0 when it finds a mismatch. Otherwise SCANZERO returns 1 when it
finds a Key with a File Position matching the correct Data File Record Position.

When SCANZERO detects the End Element of a Node, it uses the Node Data stored in
the BRANCH Structure Array to ascend the Node hierarchy until it detects an
incompletely scanned Node at a higher Level. SCANZERO increments the Array
Position within this Node, then calls CHOPDOWN with the Search Key set to NULL,
descending the Node hierarchy to the first Element of the next Level-0 Node.

SCANZERO loops back until it finds a return condition for a matching Data File
Record or a mismatched Key.

#### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%% #### %%%%

_____ TODO _____

I haven't really considered Deletions. I could suggest several possibilities, as
there are various aspects to the Deletion Problem, depending on whether you want
an Undelete Option, and whether you want to purge Deleted Records from your Data
File as well as from all your B-Tree Indexes.

But the simplest method is simply to flag Records as Deleted in the Data File,
giving you the option of Undeletion. When the Deleted Records become inconvenient
you can purge them by rewriting the Data File and rebuilding all the Bushy Tree
Indexes. In practice I suspect this is the most effective approach.

This version of the Bushy Tree builds one B-Tree Index, for ASCII Strings. But
the same coding could be used for any Data Type provided you remember to convert
all Keys to Big Endian form on a PC. Some Bit Manipulation will be required to
convert Integers and Floats into their corresponding Keys, but the same routines
should work. The format of Floating Point Numbers was actually designed to enable
them to be collated this way.

Finally this Documentation could probably be improved, especially for the benefit
of non English speakers. Please email any Feedback to mitchell.frank@bigwig.net.
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -