📄 chap19.htm

📁 美国麻省理工的算法导论,英文版,看后程序员宏观的能力培养有好处
💻 HTM
📖 第 1 页 / 共 4 页
字号:
<pre>13  <B>for</B> <I>j</I> <img src="../images/arrlt12.gif"> <I>n</I>[<I>x</I>] <B>downto</B> <I>i</I></sub></sup></pre><P>
<pre>14        <B>do</B> key<SUB>j+1</SUB>[x] <img src="../images/arrlt12.gif"> key<SUB>j</SUB>[x]</sub></sup></pre><P>
<pre>15  <I>key<SUB>i</I></SUB>[<I>x</I>] <img src="../images/arrlt12.gif"> key<SUB>t<I></SUB>[</I>y<I>]</I></sub></sup></pre><P>
<pre>16  <I>n</I>[<I>x</I>]<I> </I><img src="../images/arrlt12.gif"> n<I>[</I>x<I>] + 1</I></sub></sup></pre><P>
<pre>17  DISK-WRITE(<I>y</I>)</sub></sup></pre><P>
<pre>18  DISK-WRITE(<I>z</I>)</sub></sup></pre><P>
<pre>19  DISK-WRITE(<I>x</I>)</sub></sup></pre><P>
B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>SPLIT</FONT>-<FONT FACE="Courier New" SIZE=2>CHILD</FONT> works by straightforward &quot;cutting and pasting. &quot; Here, <I>y</I> is the <I>i</I>th child of <I>x</I> and is the node being split. Node <I>y</I> originally has 2<I>t - </I>1 children but is reduced to <I>t - </I>1 children by this operation. Node<I> z</I> &quot;adopts&quot; the <I>t - </I>1 largest children of <I>y</I>, and <I>z</I> becomes a new child of <I>x</I>, positioned just after y in <I>x</I>'s table of children. The median key of <I>y</I> moves up to become the key in <I>x</I> that separates <I>y</I> and <I>z</I>.<P>
Lines 1-8 create node <I>z</I> and give it the larger <I>t </I>- 1 keys and corresponding <I>t</I> children of <I>y</I>. Line 9 adjusts the key count for<I> y</I>. Finally, lines 10-16 insert <I>z</I> as a child of <I>x</I>, move the median key from <I>y</I> up to <I>x</I> in order to separate <I>y</I> from <I>z</I>, and adjust <I>x</I>'s key count. Lines 17-19 write out all modified disk pages. The CPU time used by B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>SPLIT</FONT>-<FONT FACE="Courier New" SIZE=2>CHILD</FONT> is <img src="../images/bound.gif"> (<I>t</I>), due to the loops on lines 4-5 and 7-8. (The other loops run for at most <I>t</I> iterations.)<P>
<P>







<h2>Inserting a key into a B-tree</h2><P>
<a name="085c_1632"><a name="085c_1633">Inserting a key <I>k</I> into a B-tree<I> T</I> of height <I>h</I> is done in a single pass down the tree, requiring <I>O(h</I>) disk accesses. The CPU time required is <I>O</I>(<I>th</I>)<I> = O</I>(<I>t </I>log<I><SUB>t</SUB> n</I>). The B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT> procedure uses B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>SPLIT</FONT>-<FONT FACE="Courier New" SIZE=2>CHILD</FONT> to guarantee that the recursion never descends to a full node.<P>
<img src="391_a.gif"><P>
<h4><a name="085c_1636">Figure 19.6 Splitting the root with t = 4. Root node r is split in two, and a new root node s is created. The new root contains the median key of r and has the two halves of r as children. The B-tree grows in height by one when the root is split.<a name="085c_1636"></sub></sup></h4><P>
<pre><a name="085c_1634">B-TREE-INSERT(<I>T,k</I>)</sub></sup></pre><P>
<pre>1  <I>r</I> <img src="../images/arrlt12.gif"> <I>root</I>[<I>T</I>]</sub></sup></pre><P>
<pre>2  <B>if</B> <I>n</I>[<I>r</I>] = 2<I>t</I> - 1</sub></sup></pre><P>
<pre>3      <B>then</B> <I>s</I> <img src="../images/arrlt12.gif"> ALLOCATE-NODE()</sub></sup></pre><P>
<pre>4           <I>root</I>[<I>T</I>] <img src="../images/arrlt12.gif"> <I>s</I></sub></sup></pre><P>
<pre>5           <I>leaf</I>[<I>s</I>] <img src="../images/arrlt12.gif"> FALSE</sub></sup></pre><P>
<pre>6           <I>n</I>[<I>s</I>] <img src="../images/arrlt12.gif"> 0</sub></sup></pre><P>
<pre>7           <I>c</I><SUP>1</SUP>[<I>s</I>] <img src="../images/arrlt12.gif"> <I>r</I></sub></sup></pre><P>
<pre>8           B-TREE-SPLIT-CHILD(<I>s,1,r</I>)</sub></sup></pre><P>
<pre>9           B-TREE-INSERT-NONFULL(<I>s,k</I>)</sub></sup></pre><P>
<pre>10  <B>else</B> B-TREE-INSERT-NONFULL(<I>r,k</I>)</sub></sup></pre><P>
Lines 3-9 handle the case in which the root node <I>r</I> is full: the root is split and a new node <I>s</I> (having two children) becomes the root. Splitting the root is the only way to increase the height of a B-tree. Figure 19.6 illustrates this case. Unlike a binary search tree, a B-tree increases in height at the top instead of at the bottom. The procedure finishes by calling B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT>-<FONT FACE="Courier New" SIZE=2>NONFULL</FONT> to perform the insertion of key <I>k</I> in the tree rooted at the nonfull root node. B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT>-<FONT FACE="Courier New" SIZE=2>NONFULL</FONT> recurses as necessary down the tree, at all times guaranteeing that the node to which it recurses is not full by calling B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>SPLIT</FONT>-<FONT FACE="Courier New" SIZE=2>CHILD</FONT> as necessary. <P>
The auxiliary recursive procedure B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT>-<FONT FACE="Courier New" SIZE=2>NONFULL</FONT> inserts key <I>k</I> into node x, which is assumed to be nonfull when the procedure is called. The operation of B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT> and the recursive operation of B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT>-<FONT FACE="Courier New" SIZE=2>NONFULL</FONT> guarantee that this assumption is true.<P>
<pre><a name="085c_1635">B-TREE-INSERT-NONFULL(<I>x,k</I>)</sub></sup></pre><P>
<pre>1  <I>i</I> <img src="../images/arrlt12.gif"> <I>n</I>[<I>x</I>]</sub></sup></pre><P>
<pre>2  <B>if</B> <I>leaf</I>[<I>x</I>]</sub></sup></pre><P>
<pre>3<B>      then while</B> <I>i</I> <img src="../images/gteq.gif"> 1 and <I>k</I> &lt; <I>key<SUP>i</I></SUP>[<I>x</I>]</sub></sup></pre><P>
<pre>4               <B>do</B> <I>key<SUB>i</I>+1</SUB> [<I>x</I>] <img src="../images/arrlt12.gif"> <I>key<SUP>i</I></SUP>[<I>x</I>]</sub></sup></pre><P>
<pre>5                  <I>i</I> <img src="../images/arrlt12.gif"> <I>i</I> - 1</sub></sup></pre><P>
<pre>6           <I>key<SUB>i</I>+1</SUB>[<I>x</I>] <img src="../images/arrlt12.gif"> <I>k</I></sub></sup></pre><P>
<pre>7           <I>n</I>[<I>x</I>] <img src="../images/arrlt12.gif"> n<I>[</I>x<I>] + 1</I></sub></sup></pre><P>
<pre>8           DISK-WRITE(<I>x</I>)</sub></sup></pre><P>
<pre>9  <B>else while</B><I> i </I><img src="../images/gteq.gif"> 1 and <I>k</I> &lt; <I>key<SUP>i</I></SUP>[<I>x</I>]</sub></sup></pre><P>
<pre>10               <B>do</B> <I>i </I><img src="../images/arrlt12.gif"> i<I> - 1</I></sub></sup></pre><P>
<pre>11       <I>i</I> <img src="../images/arrlt12.gif"> <I>i</I> + 1</sub></sup></pre><P>
<pre>12       DISK-READ(<I>c<SUP>i</I></SUP>[<I>x</I>])</sub></sup></pre><P>
<pre>13<B>       if</B> <I>n</I>[<I>c<SUP>i</I></SUP>[<I>x</I>]] = 2<I>t</I> - 1</sub></sup></pre><P>
<pre>14<B>          then</B> B-TREE-SPLIT-CHILD(<I>x,i,c<SUP>i</I></SUP>[<I>x</I>])</sub></sup></pre><P>
<pre>15<B>               if</B> <I>k</I> &gt; <I>key<SUP>i</I></SUP>[<I>x</I>]</sub></sup></pre><P>
<pre>16<B>                  then</B> <I>i</I> <img src="../images/arrlt12.gif"> <I>i</I> + 1</sub></sup></pre><P>
<pre>17       B-TREE-INSERT-NONFULL(<I>c<SUP>i</I></SUP>[<I>x</I>]<I>,k</I>)</sub></sup></pre><P>
The B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT>-<FONT FACE="Courier New" SIZE=2>NONFULL</FONT> procedure works as follows. Lines 3<I>-</I>8 handle the case in which <I>x</I> is a leaf node by inserting key <I>k</I> into <I>x</I>. If <I>x</I> is not a leaf node, then we must insert <I>k</I> into the appropriate leaf node in the subtree rooted at internal node <I>x</I>. In this case, lines 9-11 determine the child of <I>x</I> to which the recursion descends. Line 13 detects whether the recursion would descend to a full child, in which case line 14 uses B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>SPLIT</FONT>-<FONT FACE="Courier New" SIZE=2>CHILD</FONT> to split that child into two nonfull children, and lines 15-16 determine which of the two children is now the correct one to descend to. (Note that there is no need for a <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>READ</FONT>(<I>c<SUB>i</I></SUB>[<I>x</I>]) after line 16 increments <I>i</I>, since the recursion will descend in this case to a child that was just created by B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>SPLIT</FONT>-<FONT FACE="Courier New" SIZE=2>CHILD</FONT>.) The net effect of lines 13-16 is thus to guarantee that the procedure never recurses to a full node. Line 17 then recurses to insert <I>k</I> into the appropriate subtree. Figure 19.7 illustrates the various cases of inserting into a B-tree.<P>
The number of disk accesses performed by B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT> is <I>O</I>(<I>h</I>) for a B-tree of height <I>h</I>, since only <I>O</I>(1) <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>READ</FONT> and <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>WRITE</FONT> operations are performed between calls to B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT>-<FONT FACE="Courier New" SIZE=2>NONFULL</FONT>. The total CPU time used is <I>O</I>(<I>th</I>) = <I>O</I>(<I>t</I>1og<I><SUB>t </SUB>n</I>). Since B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT>-<FONT FACE="Courier New" SIZE=2>NONFULL</FONT> is tail-recursive, it can be alternatively implemented as a <B>while</B> loop, demonstrating that the number of pages that need to be in main memory at any time is <I>O</I>(1).<P>
<img src="393_a.gif"><P>
<h4><a name="085c_1637">Figure 19.7 Inserting keys into a B-tree. The minimum degree t for this B-tree is 3, so a node can hold at most 5 keys. Nodes that are modified by the insertion process are lightly shaded. (a) The initial tree for this example. (b) The result of inserting B into the initial tree; this is a simple insertion into a leaf node. (c) The result of inserting Q into the previous tree. The node RSTUV is split into two nodes containing RS and UV, the key T is moved up to the root, and Q is inserted in the leftmost of the two halves (the RS node). (d) The result of inserting L into the previous tree. The root is split right away, since it is full, and the B-tree grows in height by one. Then L is inserted into the leaf containing JK. (e) The result of inserting F into the previous tree. The node ABCDE is split before F is inserted into the rightmost of the two halves (the DE node).<a name="085c_1637"></sub></sup></h4><P>
<P>







<h2><a name="085d_163b">Exercises<a name="085d_163b"></h2><P>
<a name="085d_163c">19.2-1<a name="085d_163c"><P>
Show the results of inserting the keys<P>
<pre>F, S, Q, K, C, L, H, T, V, W, M, R, N, P, A, B, X, Y, D, Z, E</sub></sup></pre><P>
in order into an empty B-tree. Only draw the configurations of the tree just before some node must split, and also draw the final configuration.<P>
<a name="085d_163d">19.2-2<a name="085d_163d"><P>
Explain under what circumstances, if any, redundant <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>READ</FONT> or <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>WRITE</FONT> operations are performed during the course of executing a call to B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT>. (A redundant <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>READ</FONT> is a <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>READ</FONT> for a page that is already in memory. A redundant <FONT FACE="Courier New" SIZE=2>DISK</FONT>-<FONT FACE="Courier New" SIZE=2>WRITE</FONT> writes to disk a page of information that is identical to what is already stored there.)<P>
<a name="085d_163e">19.2-3<a name="085d_163e"><P>
<a name="085d_1636"><a name="085d_1637"><a name="085d_1638">Explain how to find the minimum key stored in a B-tree and how to find the predecessor of a given key stored in a B-tree.<P>
<a name="085d_163f">19.2-4<a name="085d_163f"><P>
Suppose that the keys {1, 2, . . . , <I>n</I>} are inserted into an empty B-tree with minimum degree 2. How many nodes does the final B-tree have?<P>
<a name="085d_1640">19.2-5<a name="085d_1640"><P>
Since leaf nodes require no pointers to children, they could conceivably use a different (larger)<I> t</I> value than internal nodes for the same disk page size. Show how to modify the procedures for creating and inserting into a B-tree to handle this variation.<P>
<a name="085d_1641">19.2-6<a name="085d_1641"><P>
<a name="085d_1639"><a name="085d_163a">Suppose that B-<FONT FACE="Courier New" SIZE=2>TREE</FONT>-<FONT FACE="Courier New" SIZE=2>SEARCH</FONT> is implemented to use binary search rather than linear search within each node. Show that this makes the CPU time required <I>O</I>(lg <I>n</I>), independently of how <I>t</I> might be chosen as a function of <I>n</I>.<P>
<a name="085d_1642">19.2-7<a name="085d_1642"><P>
Suppose that disk hardware allows us to choose the size of a disk page arbitrarily, but that the time it takes to read the disk page is <I>a + bt</I>, where <I>a</I> and <I>b</I> are specified constants and <I>t</I> is the minimum degree for a B-tree using pages of the selected size. Describe how to choose <I>t</I> so as to minimize (approximately) the B-tree search time. Suggest an optimal value of <I>t</I> for the case in which <I>a</I> = 30 milliseconds and <I>b</I> = 40 microseconds. <P>
<P>


<P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -