📄 chap10.htm

📁 美国麻省理工的算法导论,英文版,看后程序员宏观的能力培养有好处
💻 HTM
📖 第 1 页 / 共 3 页
字号:
<pre>3  <I>q</I> <img src="../images/arrlt12.gif"> RANDOMIZED-PARTITION(<I>A, p, r</I>)</sub></sup></pre><P>
<pre>4  <I>k</I> <img src="../images/arrlt12.gif"> <I>q</I> - <I>p</I> + 1</sub></sup></pre><P>
<pre>5  if <I>i </I><img src="../images/lteq12.gif"> <I>k</I></sub></sup></pre><P>
<pre>6      <B>then return</B> RANDOMIZED-SELECT(<I>A, p, q, i</I>)</sub></sup></pre><P>
<pre>7      <B>else return</B> RANDOMIZED-SELECT(<I>A, q</I> + 1<I>, r, i </I>-<I> </I>k<I>)</I></sub></sup></pre><P>
After <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>PARTITION</FONT> is executed in line 3 of the algorithm, the array A[<I>p .. r</I>] is partitioned into two nonempty subarrays <I>A</I>[<I>p .. q</I>] and <I>A</I>[<I>q + </I>1 <I>.. r</I>] such that each element of <I>A</I>[<I>p .. q</I>] is less than each element of <I>A</I>[<I>q + </I>1 <I>.. r</I>]. Line 4 of the algorithm computes the number <I>k</I> of elements in the subarray <I>A</I>[<I>p .. q</I>]. The algorithm now determines in which of the two subarrays <I>A</I>[<I>p .. q</I>] and <I>A</I>[<I>q + </I>1 <I>.. r</I>] the <I>i</I>th smallest element lies. If <I>i</I> <img src="../images/lteq12.gif"> <I>k</I>, then the desired element lies on the low side of the partition, and it is recursively selected from the subarray in line 6. If <I>i</I> &gt; <I>k</I>, however, then the desired element lies on the high side of the partition. Since we already know <I>k</I> values that are smaller than the <I>i</I>th smallest element of <I>A</I>[<I>p .. r</I>]--namely, the elements of <I>A</I>[<I>p .. q</I>]--the desired element is the (<I>i - k</I>)th smallest element of <I>A</I>[<I>q + </I>1 <I>.. r</I>], which is found recursively in line 7.<P>
<a name="079c_13a8">The worst-case running time for <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT> is <img src="../images/bound.gif">(<I>n</I><SUP>2</SUP>), even to find the minimum, because we could be extremely unlucky and always partition around the largest remaining element. The algorithm works well in the average case, though, and because it is randomized, no particular input elicits the worst-case behavior.<P>
We can obtain an upper bound <I>T</I>(<I>n</I>) on the expected time required by <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT> on an input array of <I>n</I> elements as follows. We observed in Section 8.4 that the algorithm <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>PARTITION</FONT> produces a partition whose low side has 1 element with probability 2/<I>n</I> and <I>i</I> elements with probability 1/<I>n</I> for <I>i</I> = 2, 3, . . . , <I>n</I> - 1. Assuming that <I>T</I>(<I>n</I>) is monotonically increasing, in the worst case <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT> is always unlucky in that the <I>i</I>th element is determined to be on the larger side of the partition. Thus, we get the recurrence<P>
<img src="188_a.gif"><P>
The second line follows from the first since max(1, <I>n</I> - 1 ) = <I>n</I> - 1 and<P>
<img src="188_b.gif"><P>
If <I>n</I> is odd, each term <I>T</I>(<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n</I></FONT>/2<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrur14.gif"></FONT>), <I>T</I>(<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n</I></FONT>/2<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrur14.gif"></FONT> + 1), . . . ,<I>T</I>(<I>n</I> - 1) appears twice in the summation, and if <I>n</I> is even, each term <I>T</I>(<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n</I></FONT>/2<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrur14.gif"></FONT> + 1), <I>T</I>(<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n</I></FONT>/2<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrur14.gif"></FONT> + 2), . . . , <I>T</I>(<I>n</I> - 1) appears twice and the term <I>T</I>(<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n</I></FONT>/2<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrur14.gif"></FONT>) appears once. In either case, the summation of the first line is bounded from above by the summation of the second line. The third line follows from the second since in the worst case <I>T</I>(<I>n</I> - 1) = <I>O</I>(<I>n</I><SUP>2</SUP>), and thus the term <img src="188_c.gif"> can be absorbed by the term <I>O</I>(<I>n</I>).<P>
We solve the recurrence by substitution. Assume that <I>T</I>(<I>n</I>) <img src="../images/lteq12.gif"> <I>cn</I> for some constant <I>c</I> that satisfies the initial conditions of the recurrence. Using this inductive hypothesis, we have<P>
<img src="189_a.gif"><P>
since we can pick <I>c</I> large enough so that <I>c</I>(<I>n</I>/4 + 1/2) dominates the <I>O</I>(<I>n</I>) term.<P>
Thus, any order statistic, and in particular the median, can be determined on average in linear time.<P>





<h2><a name="079d_13aa">Exercises<a name="079d_13aa"></h2><P>
<a name="079d_13ab">10.2-1<a name="079d_13ab"><P>
<a name="079d_13a9">Write an iterative version of <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT>.<P>
<a name="079d_13ac">10.2-2<a name="079d_13ac"><P>
Suppose we use <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT> to select the minimum element of the array <I>A</I> = <img src="../images/lftwdchv.gif">3, 2, 9, 0, 7, 5, 4, 8, 6, 1<img src="../images/wdrtchv.gif">. Describe a sequence of partitions that results in a worst-case performance of <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT>.<P>
<a name="079d_13ad">10.2-3<a name="079d_13ad"><P>
Recall that in the presence of equal elements, the <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>PARTITION</FONT> procedure partitions the subarray <I>A</I>[<I>p . . r</I>] into two nonempty subarrays <I>A</I>[<I>p . . q</I>] and <I>A</I>[<I>q + </I>1 <I>. . r</I>] such that each element in <I>A</I>[<I>p . . q</I>] is less than <I>or equal to</I> every element in <I>A</I>[<I>q + </I>1 . <I>. r</I>]. If equal elements are present, does the <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT> procedure work correctly?<P>
<P>


<P>







<h1><a name="079e_13ae">10.3 Selection in worst-case linear time<a name="079e_13ae"></h1><P>
<a name="079e_13aa">We now examine a selection algorithm whose running time is <I>O</I>(<I>n</I>) in the worst case. Like <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT>, the algorithm <FONT FACE="Courier New" SIZE=2>SELECT</FONT> finds the desired element by recursively partitioning the input array. The idea behind the algorithm, however, is to <I>guarantee</I> a good split when the array is partitioned. <FONT FACE="Courier New" SIZE=2>SELECT</FONT> uses the deterministic partitioning algorithm P<FONT FACE="Courier New" SIZE=2>ARTITION </FONT>from quicksort (see Section 8.1), modified to take the element to partition around as an input parameter.<P>
<img src="190_a.gif"><P>
<h4><a name="079e_13af">Figure 10.1 Analysis of the algorithm <FONT FACE="Courier New" SIZE=2>SELECT</FONT>. The n elements are represented by small circles, and each group occupies a column. The medians of the groups are whitened, and the median-of-medians x is labeled. Arrows are drawn from larger elements to smaller, from which it can be seen that 3 out of every group of 5 elements to the right of x are greater than x, and 3 out of every group of 5 elements to the left of x are less than x. The elements greater than x are shown on a shaded background.<a name="079e_13af"></sub></sup></h4><P>
<a name="079e_13ab">The <FONT FACE="Courier New" SIZE=2>SELECT</FONT> algorithm determines the <I>i</I>th smallest of an input array of <I>n</I> elements by executing the following steps.<P>
1.     Divide the <I>n</I> elements of the input array into <FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrdl12.gif"><I>n/5</I><img src="../images/hfbrdr12.gif"></FONT> groups of 5 elements each and at most one group made up of the remaining <I>n</I> mod 5 elements.<P>
2.     Find the median of each of the <FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n/5</I><img src="../images/hfbrur14.gif"></FONT> groups by insertion sorting the elements of each group (of which there are 5 at most) and taking its middle element. (If the group has an even number of elements, take the larger of the two medians.)<P>
3.     Use <FONT FACE="Courier New" SIZE=2>SELECT</FONT> recursively to find the median <I>x</I> of the <FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n/5</I><img src="../images/hfbrur14.gif"></FONT> medians found in step 2.<P>
4.     Partition the input array around the median-of-medians <I>x</I> using a modified version of <FONT FACE="Courier New" SIZE=2>PARTITION</FONT>. Let <I>k</I> be the number of elements on the low side of the partition, so that <I>n - k</I> is the number of elements on the high side.<P>
5.     Use <FONT FACE="Courier New" SIZE=2>SELECT</FONT> recursively to find the <I>i</I>th smallest element on the low side if <I>i</I> <img src="../images/lteq12.gif"> <I>k</I>, or the (<I>i</I> - <I>k</I>)th smallest element on the high side if <I>i</I> &gt; <I>k</I>.<P>
To analyze the running time of <FONT FACE="Courier New" SIZE=2>SELECT</FONT>, we first determine a lower bound on the number of elements that are greater than the partitioning element <I>x</I>. Figure 10.1 is helpful in visualizing this bookkeeping. At least half of the medians found in step 2 are greater than or equal to the median-of-medians <I>x</I>. Thus, at least half of the     <FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n/5</I><img src="../images/hfbrur14.gif"></FONT> groups contribute 3 elements that are greater than <I>x</I>, except for the one group that has fewer than 5 elements if 5 does not divide <I>n</I> exactly, and the one group containing <I>x </I>itself. Discounting these two groups, it follows that the number of elements greater than <I>x</I> is at least<P>
<img src="191_a.gif"><P>
Similarly, the number of elements that are less than <I>x</I> is at least 3<I>n</I>/10 - 6. Thus, in the worst case, <FONT FACE="Courier New" SIZE=2>SELECT</FONT> is called recursively on at most 7<I>n</I>/10 + 6 elements in step 5.<P>
We can now develop a recurrence for the worst-case running time <I>T</I>(<I>n</I>) of the algorithm <FONT FACE="Courier New" SIZE=2>SELECT</FONT>. Steps 1, 2, and 4 take <I>O</I>(<I>n</I>) time. (Step 2 consists of <I>O</I>(<I>n</I>) calls of insertion sort on sets of size <I>O</I>(1).) Step 3 takes time <I>T</I>(<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n/5</I><img src="../images/hfbrur14.gif"></FONT>), and step 5 takes time at most <I>T</I>(7<I>n </I>/ 10 + 6), assuming that <I>T</I> is monotonically increasing. Note that 7<I>n </I>/ 10 + 6 &lt; <I>n</I> for <I>n</I> &gt; 20 and that any input of 80 or fewer elements requires <I>O</I>(1) time. We can therefore obtain the recurrence<P>
<img src="191_b.gif"><P>
We show that the running time is linear by substitution. Assume that <I>T</I>(<I>n</I>) <img src="../images/lteq12.gif"> <I>cn</I> for some constant <I>c</I> and all <I>n</I> <img src="../images/lteq12.gif"> 80. Substituting this inductive hypothesis into the right-hand side of the recurrence yields<P>
<pre><I>T</I>(<I>n</I>) <img src="../images/lteq12.gif"> <I>c </I><img src="../images/hfbrul14.gif"><I>n</I>/5<img src="../images/hfbrur14.gif"> + <I>c</I>(7<I>n</I>/10 + 6) + <I>O</I>(<I>n</I>)</sub></sup></pre><P>
<pre><img src="../images/lteq12.gif"> <I>cn</I>/5 + <I>c</I> + 7<I>cn</I>/10 + 6<I>c</I> + <I>O</I>(<I>n</I>)</sub></sup></pre><P>
<pre><img src="../images/lteq12.gif"> 9<I>cn</I>/10 + 7<I>c</I> + <I>O</I>(<I>n</I>)</sub></sup></pre><P>
<pre><img src="../images/lteq12.gif"> <I>cn</I>,</sub></sup></pre><P>
since we can pick <I>c</I> large enough so that <I>c</I>(<I>n </I>/ 10 - 7) is larger than the function described by the <I>O</I>(<I>n</I>) term for all <I>n</I> &gt; 80. The worst-case running time of <FONT FACE="Courier New" SIZE=2>SELECT</FONT> is therefore linear.<P>
<a name="079e_13ac"><a name="079e_13ad">As in a comparison sort (see Section 9.1), <FONT FACE="Courier New" SIZE=2>SELECT</FONT> and <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>SELECT</FONT> determine information about the relative order of elements only by comparing elements. Thus, the linear-time behavior is not a result of assumptions about the input, as was the case for the sorting algorithms in Chapter 9. Sorting requires <img src="../images/omega12.gif">(<I>n</I> 1g <I>n</I>) time in the comparison model, even on average (see Problem 9-1), and thus the method of sorting and indexing presented in the introduction to this chapter is asymptotically inefficient.<P>





<h2><a name="079f_13b2">Exercises<a name="079f_13b2"></h2><P>
<a name="079f_13b3">10.3-1<a name="079f_13b3"><P>
<a name="079f_13ae">In the algorithm <FONT FACE="Courier New" SIZE=2>SELECT</FONT>, the input elements are divided into groups of 5. Will the algorithm work in linear time if they are divided into groups of 7? How about groups of 3?<P>
<a name="079f_13b4">10.3-2<a name="079f_13b4"><P>
<a name="079f_13af">Analyze <FONT FACE="Courier New" SIZE=2>SELECT</FONT> to show that the number of elements greater than the median-of-medians <I>x</I> and the number of elements less than <I>x</I> is at least <FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrul14.gif"><I>n</I></FONT>/4<FONT FACE="Times New Roman" SIZE=2><img src="../images/hfbrur14.gif"></FONT> if <I>n</I> <img src="../images/gteq.gif"> 38.<P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -