📄 chap19.htm
字号:
<h1><a name="085e_163e">19.3 Deleting a key from a B-tree<U><a name="085e_163e"></U></h1><P>
<a name="085e_163b"><a name="085e_163c">Deletion from a B-tree is analogous to insertion but a little more complicated. We sketch how it works instead of presenting the complete pseudocode.<P>
<a name="085e_163d">Assume that procedure B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>DELETE</FONT> is asked to delete the key <I>k </I>from the subtree rooted at <I>x</I>. This procedure is structured to guarantee that whenever B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>DELETE</FONT> is called recursively on a node <I>x</I>, the number of keys in <I>x</I> is at least the minimum degree <I>t</I>. Note that this condition requires one more key than the minimum required by the usual B-tree conditions, so that sometimes a key may have to be moved into a child node before recursion descends to that child. This strengthened condition allows us to delete a key from the tree in one downward pass without having to "back up" (with one exception, which we'll explain). The following specification for deletion from a B-tree should be interpreted with the understanding that if it ever happens that the root node <I>x</I> becomes an internal node having no keys, then <I>x</I> is deleted and <I>x'</I>s only child <I>c</I><SUB>1</SUB>[<I>x</I>] becomes the new root of the tree, decreasing the height of the tree by one and preserving the property that the root of the tree contains at least one key (unless the tree is empty).<P>
Figure 19.8 illustrates the various cases of deleting keys from a B-tree.<P>
1. If the key <I>k</I> is in node <I>x</I> and <I>x</I> is a leaf, delete the key <I>k</I> from <I>x</I>.<P>
2. If the key <I>k</I> is in node <I>x</I> and <I>x</I> is an internal node, do the following.<P>
a. If the child <I>y</I> that precedes <I>k</I> in node <I>x</I> has at least <I>t</I> keys, then find the predecessor <I>k</I>' of <I>k</I> in the subtree rooted at <I>y</I>. Recursively delete <I>k</I>', and replace <I>k</I> by <I>k</I>' in <I>x</I>. (Finding <I>k</I>' and deleting it can be performed in a single downward pass.)<P>
b. Symmetrically, if the child <I>z</I> that follows <I>k</I> in node <I>x</I> has at least <I>t</I> keys, then find the successor <I>k</I>' of <I>k</I> in the subtree rooted at <I>z</I>. Recursively delete <I>k</I>', and replace <I>k</I> by <I>k</I>' in <I>x</I>. (Finding <I>k</I><I>'</I> and deleting it can be performed in a single downward pass.)<P>
c. Otherwise, if both <I>y</I> and <I>z</I> have only <I>t</I>- 1 keys, merge <I>k</I> and all of <I>z</I> into <I>y</I>, so that <I>x</I> loses both <I>k</I> and the pointer to <I>z</I>, and <I>y</I> now contains 2<I>t</I> - 1 keys. Then, free <I>z</I> and recursively delete <I>k</I> from <I>y</I>.<P>
3. If the key <I>k</I> is not present in internal node <I>x</I>, determine the root <I>c<SUB>i</I></SUB>[<I>x</I>] of the appropriate subtree that must contain <I>k</I>, if <I>k</I> is in the tree at all. If <I>c<SUB>i</I></SUB>[<I>x</I>] has only <I>t</I> - 1 keys, execute step 3a or 3b as necessary to guarantee that we descend to a node containing at least <I>t</I> keys. Then, finish by recursing on the appropriate child of <I>x</I>.<P>
a. If <I>c<SUB>i</I></SUB>[<I>x</I>] has only <I>t</I> - 1 keys but has a sibling with <I>t</I> keys, give <I>c<SUB>i</I></SUB>[<I>x</I>] an extra key by moving a key from <I>x</I> down into <I>c<SUB>i</I></SUB>[<I>x</I>], moving a key from <I>c<SUB>i</I></SUB>[<I>x</I>]'s immediate left or right sibling up into <I>x</I>, and moving the appropriate child from the sibling into <I>c<SUB>i</I></SUB>[<I>x</I>].<P>
<img src="396_a.gif"><P>
<h4><a name="085e_163f">Figure 19.8 Deleting keys from a B-tree. The minimum degree for this B-tree is t = 3, so a node (other than the root) cannot have less than 2 keys. Nodes that are modified are lightly shaded. (a) The B-tree of Figure 19.7(e). (b) Deletion of F. This is case 1: simple deletion from a leaf. (c) Deletion of M. This is case 2a: the predecessor L of M is moved up to take M's position. (d) Deletion of G. This is case 2c: G is pushed down to make node DEGJK, and then G is deleted from this leaf (case 1). (e) Deletion of D. This is case 3b: the recursion can't descend to node CL because it has only 2 keys, so P is pushed down and merged with CL and TX to form CLPTX; then, D is deleted from a leaf (case 1). (e') After (d), the root is deleted and the tree shrinks in height by one. (f) Deletion of B. This is case 3a: C is moved to fill B's position and E is moved to fill C's position.<a name="085e_163f"></sub></sup></h4><P>
<img src="397_a.gif"><P>
b. If <I>c<SUB>i</I></SUB>[<I>x</I>] and all of <I>c<SUB>i</I></SUB>[<I>x</I>]'s siblings have <I>t</I> - 1 keys, merge <I>c<SUB>i</I></SUB> with one sibling, which involves moving a key from <I>x</I> down into the new merged node to become the median key for that node.<P>
Since most of the keys in a B-tree are in the leaves, we may expect that in practice, deletion operations are most often used to delete keys from leaves. The B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>DELETE</FONT> procedure then acts in one downward pass through the tree, without having to back up. When deleting a key in an internal node, however, the procedure makes a downward pass through the tree but may have to return to the node from which the key was deleted to replace the key with its predecessor or successor (cases 2a and 2b).<P>
Although this procedure seems complicated, it involves only <I>O</I>(<I>h</I>) disk operations for a B-tree of height <I>h</I>, since only <I>O</I>(1) calls to <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>READ</FONT> and <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>WRITE</FONT> are made between recursive invocations of the procedure. The CPU time required is <I>O</I>(<I>th</I>) = <I>O(t </I>log<I><SUB>t</SUB> n</I>).<P>
<h2><a name="085f_0001">Exercises<a name="085f_0001"></h2><P>
<a name="085f_0002">19.3-1<a name="085f_0002"><P>
Show the results of deleting <I>C, P</I>, and <I>V</I>, in order, from the tree of Figure 19.8(f).<P>
<a name="085f_0003">19.3-2<a name="085f_0003"><P>
Write pseudocode for B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>DELETE</FONT>.<P>
<P>
<P>
<h1><a name="0860_1646">Problems<a name="0860_1646"></h1><P>
<a name="0860_1647">19-1 Stacks on secondary storage<a name="0860_1647"><P>
<a name="0860_163e"><a name="0860_163f">Consider implementing a stack in a computer that has a relatively small amount of fast primary memory and a relatively large amount of slower disk storage. The operations <FONT FACE="Courier New" SIZE=2>PUSH</FONT> and <FONT FACE="Courier New" SIZE=2>POP</FONT> are supported on single-word values. The stack we wish to support can grow to be much larger than can fit in memory, and thus most of it must be stored on disk.<P>
<a name="0860_1640">A simple, but inefficient, stack implementation keeps the entire stack on disk. We maintain in memory a stack pointer, which is the disk address of the top element on the stack. If the pointer has value <I>p</I>, the top element is the (<I>p</I> mod <I>m</I>)th word on page <FONT FACE="Courier New" SIZE=2><img src="../images/hfbrdl12.gif"><I>p/m</I><img src="../images/hfbrdr12.gif"><I></I></FONT> of the disk, where <I>m</I> is the number of words per page.<P>
To implement the PUSH operation, we increment the stack pointer, read the appropriate page into memory from disk, copy the element to be pushed to the appropriate word on the page, and write the page back to disk. A <FONT FACE="Courier New" SIZE=2>POP</FONT> operation is similar. We decrement the stack pointer, read in the appropriate page from disk, and return the top of the stack. We need not write back the page, since it was not modified.<P>
Because disk operations are relatively expensive, we use the total number of disk accesses as a figure of merit for any implementation. We also count CPU time, but we charge <img src="../images/bound.gif">(<I>m</I>) for any disk access to a page of <I>m</I> words.<P>
<I><B>a.</I></B> Asymptotically, what is the worst-case number of disk accesses for <I>n </I>stack operations using this simple implementation? What is the CPU time for <I>n</I> stack operations? (Express your answer in terms of <I>m</I> and <I>n </I>for this and subsequent parts.)<P>
Now, consider a stack implementation in which we keep one page of the stack in memory. (We also maintain a small amount of memory to keep track of which page is currently in memory.) We can perform a stack operation only if the relevant disk page resides in memory. If necessary, the page currently in memory can be written to the disk and the new page read in from the disk to memory. If the relevant disk page is already in memory, then no disk accesses are required.<P>
<I><B>b.</I></B> What is the worst-case number of disk accesses required for <I>n</I> <FONT FACE="Courier New" SIZE=2>PUSH</FONT> operations? What is the CPU time?<P>
<I><B>c.</I></B> What is the worst-case number of disk accesses required for <I>n</I> stack operations? What is the CPU time?<P>
Suppose that we now implement the stack by keeping two pages in memory (in addition to a small number of words for bookkeeping).<P>
<a name="0860_1641"><I><B>d. </I></B>Describe how to manage the stack pages so that the amortized number of disk accesses for any stack operation is <I>O</I>(1<I>/m</I>) and the amortized CPU time for any stack operation is <I>O</I>(1).<P>
<a name="0860_1648">19-2 Joining and splitting 2-3-4 trees<a name="0860_1648"><P>
<a name="0860_1642"><a name="0860_1643"><a name="0860_1644"><a name="0860_1645">The <I><B>join</I></B> operation takes two dynamic sets <I>S</I>' and <I>S</I>\" and an element <I>x </I>such that for any <I>x</I>'<I><img src="../images/memof12.gif"> S</I>' and <I>x</I>\" <I><img src="../images/memof12.gif"> S</I>\"<I>, we have </I>key<I>[</I>x<I>'] < </I>key<I>[</I>x<I>] < </I>key<I>[</I>x<I>\"]. It returns a set </I>S = S<I>' <img src="../images/wideu.gif"></I> <I>{</I>x<I>}</I> <I><img src="../images/wideu.gif"></I> S<I>\". The </I><B>split<I></B> operation is like an "inverse" join: given a dynamic set </I>S<I> and an element </I>x <I><img src="../images/memof12.gif"> S</I>, it creates a set <I>S</I>' consisting of all elements in <I>S - </I>{<I>x</I>} whose keys are less than <I>key</I>[<I>x</I>] and a set <I>S</I>\" consisting of all elements in <I>S - </I>{<I>x</I>} whose keys are greater than <I>key</I>[<I>x</I>]. In this problem, we investigate how to implement these operations on 2-3-4 trees. We assume for convenience that elements consist only of keys and that all key values are distinct.<P>
<I><B>a. </I></B>Show how to maintain, for every node <I>x</I> of a 2-3-4 tree, the height of the subtree rooted at <I>x</I> as a field <I>height</I>[<I>x</I>]. Make sure that your implementation does not affect the asymptotic running times of searching, insertion, and deletion.<P>
<I><B>b.</I></B> Show how to implement the join operation. Given two 2-3-4 trees <I>T</I>'<I> and </I>T<I>\" and a key </I>k<I>, the join should run in </I>O(|h<I>'- h</I>\"|) time, where <I>h</I>' <I>and </I>h<I>\" are the heights of </I>T<I>'</I> and <I>T</I>\", respectively.<P>
<I><B>c. </I></B>Consider the path <I>p</I> from the root of a 2-3-4 tree <I>T</I> to a given key <I>k</I>, the set <I>S</I>' of keys in <I>T</I> that are less than <I>k</I>, and the set <I>S</I><I>\"</I> of keys in <I>T</I> that are greater than <I>k</I>. Show that <I>p</I> breaks <I>S</I>' into a set of trees {<I> T</I>'<SUB>0</SUB>,T<I>'</I><SUB>1</SUB>, . . . ,<I>T</I><I>'<SUB>m</I></SUB> } and a set of keys {<I>k</I><I>'</I><SUB>1</SUB>,<I>k</I>'<I><SUB>2</SUB>, . . .</I> ,<I> k</I><I>'<SUB>m</I></SUB>}, where, for <I>i</I> = 1, 2, . . . ,<I>m</I>, we have <I>y < k</I><I>'</I><SUB>1</SUB> <I>< z</I> for any keys <I>y </I><img src="../images/memof12.gif"><I> T</I><I>'i<SUB>-</I>1</SUB> and <I>z </I><img src="../images/memof12.gif"><I> T</I><I>'<SUB>i</SUB>. </I>What is the relationship between the heights of<I> T</I><I>'<SUB>i-</I>1</SUB><I> </I>and<I> T</I><I>'<SUB>i</I></SUB>? Describe how <I>p</I> breaks <I>S</I><I>\"</I> into sets of trees and keys.<P>
<I><B>d.</I></B> Show how to implement the split operation on <I>T</I>. Use the join operation to assemble the keys in <I>S</I>'<I> into a single 2-3-4 tree </I>T<I>'</I> and the keys in <I>S</I>\" <I>into a single 2-3-4 tree </I>T<I>\"</I>. The running time of the split operation should be <I>O</I>(lg<I> n</I>), where <I>n</I> is the number of keys in <I>T</I>. (<I>Hint</I>: The costs for joining should telescope.)<P>
<P>
<h1>Chapter notes</h1><P>
Knuth [123], Aho, Hopcroft, and Ullman [4], and Sedgewick [175] give further discussions of balanced-tree schemes and B-trees. Comer [48] provides a comprehensive survey of B-trees. Guibas and Sedgewick [93] discuss the relationships among various kinds of balanced-tree schemes, including red-black trees and 2-3-4 trees.<P>
<a name="0861_1646"><a name="0861_1647"><a name="0861_1648"><a name="0861_1649"><a name="0861_164a">In 1970, J. E. Hopcroft invented 2-3 trees, a precursor to B-trees and 2-3-4 trees, in which every internal node has either two or three children. B-trees were introduced by Bayer and McCreight in 1972 [18]; they did not explain their choice of name.<P>
<P>
<P>
<P>
<center>Go to <a href="chap20.htm">Chapter 20</A> Back to <a href="toc.htm">Table of Contents</A>
</P>
</center>
</BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -