📄 chap08.htm

📁 程序设计的艺术,此书是英文版的,经久不衰,非常的经典
💻 HTM
📖 第 1 页 / 共 4 页
字号:


<h2>Best-case partitioning</h2><P>
If the partitioning procedure produces two regions of size <I>n</I>/2, quicksort runs much faster. The recurrence is then<P>
<pre><I>T</I>(<I>n</I>) = 2<I>T</I>(<I>n</I>/2) + <img src="../images/bound.gif">(<I>n</I>),</sub></sup></pre><P>
which by case 2 of the master theorem (Theorem 4.1) has solution T(<I>n</I>) = <img src="../images/bound.gif">(<I>n</I> lg <I>n</I>). Thus, this best-case partitioning produces a much faster algorithm. Figure 8.3 shows the recursion tree for this best-case execution of quicksort.<P>
<P>







<h2>Balanced partitioning</h2><P>
The average-case running time of quicksort is much closer to the best case than to the worst case, as the analyses in Section 8.4 will show. The key to understanding why this might be true is to understand how the balance of the partitioning is reflected in the recurrence that describes the running time.<P>
Suppose, for example, that the partitioning algorithm always produces a 9-to-1 proportional split, which at first blush seems quite unbalanced. We then obtain the recurrence<P>
<pre><I>T</I>(<I>n</I>) = <I>T</I>(9<I>n</I>/10) + <I>T</I>(<I>n</I>/10) + <I>n</I></sub></sup></pre><P>
on the running time of quicksort, where we have replaced <img src="../images/bound.gif">(<I>n</I>) by <I>n</I> for convenience. Figure 8.4 shows the recursion tree for this recurrence. Notice that every level of the tree has cost <I>n</I>, until a boundary condition is reached at depth log<SUB>10</SUB> <I>n</I> = <img src="../images/bound.gif">(lg <I>n</I>), and then the levels have cost at most <I>n</I>. The recursion terminates at depth log<SUB>10/9 </SUB><I>n</I> = <img src="../images/bound.gif">(lg <I>n</I>). The total cost of quicksort is therefore <img src="../images/bound.gif">(<I>n </I>lg <I>n</I>). Thus, with a 9-to-1 proportional split at every level of recursion, which intuitively seems quite unbalanced, quicksort runs in <img src="../images/bound.gif">(<I>n </I>lg <I>n</I>) time--asymptotically the same as if the split were right down the middle. In fact, even a 99-to-1 split yields an <I>O</I>(<I>n</I> lg <I>n</I>) running time. The reason is that any split of <I>constant</I> proportionality yields a recursion tree of depth <img src="../images/bound.gif">(lg <I>n</I>), where the cost at each level is <I>O</I>(<I>n</I>). The running time is therefore <img src="../images/bound.gif">(<I>n </I>lg <I>n</I>) whenever the split has constant proportionality.<P>
<img src="158_a.gif"><P>
<h4><a name="077a_0001">Figure 8.3 A recursion tree for <FONT FACE="Courier New" SIZE=2>QUICKSORT</FONT> in which <FONT FACE="Courier New" SIZE=2>PARTITION</FONT> always balances the two sides of the partition equally (the best case). The resulting running time is <img src="../images/bound.gif">(n lg n).<a name="077a_0001"></sub></sup></h4><P>
<img src="158_b.gif"><P>
<h4><a name="077a_0002">Figure 8.4 A recursion tree for <FONT FACE="Courier New" SIZE=2>QUICKSORT</FONT> in which <FONT FACE="Courier New" SIZE=2>PARTITION</FONT> always produces a 9-to-1 split, yielding a running time of <img src="../images/bound.gif">(n lg n).<a name="077a_0002"></sub></sup></h4><P>
<P>







<h2>Intuition for the average case</h2><P>
To develop a clear notion of the average case for quicksort, we must make an assumption about how frequently we expect to encounter the various inputs. A common assumption is that all permutations of the input numbers are equally likely. We shall discuss this assumption in the next section, but first let's explore its ramifications.<P>
When we run quicksort on a random input array, it is unlikely that the partitioning always happens in the same way at every level, as our informal analysis has assumed. We expect that some of the splits will be reasonably well balanced and that some will be fairly unbalanced. For example, Exercise 8.2-5 asks to you show that about 80 percent of the time <FONT FACE="Courier New" SIZE=2>PARTITION</FONT> produces a split that is more balanced than 9 to 1, and about 20 percent of the time it produces a split that is less balanced than 9 to 1.<P>
In the average case, <FONT FACE="Courier New" SIZE=2>PARTITION</FONT> produces a mix of &quot;good&quot; and &quot;bad&quot; splits. In a recursion tree for an average-case execution of <FONT FACE="Courier New" SIZE=2>PARTITION</FONT>, the good and bad splits are distributed randomly throughout the tree. Suppose for the sake of intuition, however, that the good and bad splits alternate levels in the tree, and that the good splits are best-case splits and the bad splits are worst-case splits. Figure 8.5(a) shows the splits at two consecutive levels in the recursion tree. At the root of the tree, the cost is <I>n</I> for partitioning and the subarrays produced have sizes <I>n</I> - 1 and 1: the worst case. At the next level, the subarray of size <I>n </I>- 1 is best-case partitioned into two subarrays of size (<I>n</I> - 1)/2. Let's assume that the boundary-condition cost is 1 for the subarray of size 1.<P>
The combination of the bad split followed by the good split produces three subarrays of sizes 1, (<I>n</I> -1)/2, and (<I>n</I> - 1)/2 at a combined cost of 2<I>n </I>- 1 = <img src="../images/bound.gif">(<I>n</I>). Certainly, this situation is no worse than that in Figure 8. 5 (b), namely a single level of partitioning that produces two subarrays of sizes (<I>n</I> - 1)/2 + 1 and (<I>n</I> - 1)/2 at a cost of <I>n</I> = <img src="../images/bound.gif">(<I>n</I>). Yet this latter situation is very nearly balanced, certainly better than 9 to 1. Intuitively, the <img src="../images/bound.gif">(<I>n</I>) cost of the bad split can be absorbed into the <img src="../images/bound.gif">(<I>n</I>) cost of the good split, and the resulting split is good. Thus, the running time of quicksort, when levels alternate between good and bad splits, is like the running time for good splits alone: still <I>O</I>(<I>n</I> lg <I>n</I>), but with a slightly larger constant hidden by the <I>O</I>-notation. We shall give a rigorous analysis of the average case in Section 8.4.2.<P>
<img src="160_a.gif"><P>
<h4><a name="077b_0001">Figure 8.5 (a) Two levels of a recursion tree for quicksort. The partitioning at the root costs n and produces a "bad" split: two subarrays of sizes 1 and n - 1. The partitioning of the subarray of size n - 1 costs n - 1 and produces a "good" split: two subarrays of size (n - 1)/2. (b) A single level of a recursion tree that is worse than the combined levels in (a), yet very well balanced.<a name="077b_0001"></sub></sup></h4><P>
<P>







<h2><a name="077c_1347">Exercises<a name="077c_1347"></h2><P>
<a name="077c_1348">8.2-1<a name="077c_1348"><P>
Show that the running time of <FONT FACE="Courier New" SIZE=2>QUICKSORT</FONT> is <img src="../images/bound.gif">(<I>n </I>lg <I>n</I>) when all elements of array <I>A</I> have the same value.<P>
<a name="077c_1349">8.2-2<a name="077c_1349"><P>
Show that the running time of Q<FONT FACE="Courier New" SIZE=2>UICKSORT </FONT>is <img src="../images/bound.gif">(<I>n</I><SUP>2</SUP>) when the array <I>A</I> is sorted in nonincreasing order.<P>
<a name="077c_134a">8.2-3<a name="077c_134a"><P>
Banks often record transactions on an account in order of the times of the transactions, but many people like to receive their bank statements with checks listed in order by check number. People usually write checks in order by check number, and merchants usually cash them with reasonable dispatch. The problem of converting time-of-transaction ordering to check-number ordering is therefore the problem of sorting almost-sorted input. Argue that the procedure <FONT FACE="Courier New" SIZE=2>INSERTION</FONT>-<FONT FACE="Courier New" SIZE=2>SORT</FONT> would tend to beat the procedure <FONT FACE="Courier New" SIZE=2>QUICKSORT</FONT> on this problem.<P>
<a name="077c_134b">8.2-4<a name="077c_134b"><P>
<a name="077c_1345">Suppose that the splits at every level of quicksort are in the proportion 1 - <img src="../images/alpha12.gif"> to <img src="../images/alpha12.gif">, where 0 &lt; <img src="../images/alpha12.gif"> <img src="../images/lteq12.gif"> 1/2 is a constant. Show that the minimum depth of a leaf in the recursion tree is approximately ep1g <I>n</I>/lg <img src="../images/alpha12.gif"> and the maximum depth is approximately - lg <I>n</I>/lg(1 - <img src="../images/alpha12.gif">). (Don't worry about integer round-off.)<P>
<a name="077c_134c">8.2-5<a name="077c_134c"><P>
<a name="077c_1346">Argue that for any constant 0 &lt; <img src="../images/alpha12.gif"> <img src="../images/lteq12.gif"> 1/2, the probability is approximately 1 - 2<img src="../images/alpha12.gif"> that on a random input array, <FONT FACE="Courier New" SIZE=2>PARTITION</FONT> produces a split more balanced than 1 - <img src="../images/alpha12.gif"> to <img src="../images/alpha12.gif">. For what value of <img src="../images/alpha12.gif"> are the odds even that the split is more balanced than less balanced?<P>
<P>


<P>







<h1><a name="077d_1351">8.3 Randomized versions of quicksort<a name="077d_1351"></h1><P>
<a name="077d_1347"><a name="077d_1348">In exploring the average-case behavior of quicksort, we have made an assumption that all permutations of the input numbers are equally likely. When this assumption on the distribution of the inputs is valid, many people regard quicksort as the algorithm of choice for large enough inputs. In an engineering situation, however, we cannot always expect it to hold. (See Exercise 8.2-3.) This section introduces the notion of a randomized algorithm and presents two randomized versions of quicksort that overcome the assumption that all permutations of the input numbers are equally likely.<P>
An alternative to <I>assuming</I> a distribution of inputs is to <I>impose</I> a distribution. For example, suppose that before sorting the input array, quicksort randomly permutes the elements to enforce the property that every permutation is equally likely. (Exercise 8.3-4 asks for an algorithm that randomly permutes the elements of an array of size <I>n</I> in time <I>O</I>(<I>n</I>).) This modification does not improve the worst-case running time of the algorithm, but it does make the running time independent of the input ordering.<P>
<a name="077d_1349"><a name="077d_134a"><a name="077d_134b"><a name="077d_134c">We call an algorithm <I><B>randomized</I></B> if its behavior is determined not only by the input but also by values produced by a <I><B>random-number generator</I></B>. We shall assume that we have at our disposal a random-number generator <FONT FACE="Courier New" SIZE=2>RANDOM</FONT>. A call to <FONT FACE="Courier New" SIZE=2>RANDOM</FONT>(<I>a,b</I>) returns an integer between <I>a </I>and <I>b</I>, inclusive, with each such integer being equally likely. For example, <FONT FACE="Courier New" SIZE=2>RANDOM</FONT>(0, 1) produces a 0 with probability 1/2 and a 1 with probability 1/2. Each integer returned by <FONT FACE="Courier New" SIZE=2>RANDOM</FONT> is independent of the integers returned on previous calls. You may imagine R<FONT FACE="Courier New" SIZE=2>ANDOM </FONT>as rolling a (<I>b</I> - <I>a</I> + 1 )-sided die to obtain its output. (In practice, most programming environments offer a <I><B>pseudorandom-number generator</I></B>: a deterministic algorithm that returns numbers that &quot;look&quot; statistically random.)<P>
This randomized version of quicksort has an interesting property that is also possessed by many other randomized algorithms: <I>no particular input elicits its worst-case behavior</I>. Instead, its worst case depends on the random-number generator. Even intentionally, you cannot produce a bad input array for quicksort, since the random permutation makes the input order irrelevant. The randomized algorithm performs badly only if the random-number generator produces an unlucky permutation to be sorted. Exercise 13.4-4 shows that almost all permutations cause quicksort to perform nearly as well as the average case: there are <I>very</I> few permutations that cause near-worst-case behavior.<P>
A randomized strategy is typically useful when there are many ways in which an algorithm can proceed but it is difficult to determine a way that is guaranteed to be good. If many of the alternatives are good, simply choosing one randomly can yield a good strategy. Often, an algorithm must make many choices during its execution. If the benefits of good choices outweigh the costs of bad choices, a random selection of good and bad choices can yield an efficient algorithm. We noted in Section 8.2 that a mixture of good and bad splits yields a good running time for quicksort, and thus it makes sense that randomized versions of the algorithm should perform well.<P>
<a name="077d_134d"><a name="077d_134e">By modifying the <FONT FACE="Courier New" SIZE=2>PARTITION</FONT> procedure, we can design another randomized version of quicksort that uses this random-choice strategy. At each step of the quicksort algorithm, before the array is partitioned, we exchange element <I>A</I>[<I>p</I>] with an element chosen at random from <I>A</I>[<I>p</I> . . <I>r</I>]. This modification ensures that the pivot element <I>x</I> = <I>A</I>[<I>p</I>] is equally likely to be any of the <I>r</I> - <I>p</I> + 1 elements in the subarray. Thus, we expect the split of the input array to be reasonably well balanced on average. The randomized algorithm based on randomly permuting the input array also works well on average, but it is somewhat more difficult to analyze than this version.<P>
<a name="077d_134f">The changes to <FONT FACE="Courier New" SIZE=2>PARTITION</FONT> and <FONT FACE="Courier New" SIZE=2>QUICKSORT</FONT> are small. In the new partition procedure, we simply implement the swap before actually partitioning:<P>
<pre>RANDOMIZED-PARTITION(<I>A,p,r</I>)</sub></sup></pre><P>
<pre>1  <I>i</I> <img src="../images/arrlt12.gif"> RANDOM(<I>p,r</I>)</sub></sup></pre><P>
<pre>2  exchange <I>A</I>[<I>p</I>] <img src="../images/dblarr12.gif"> <I>A</I>[<I>i</I>]</sub></sup></pre><P>
<pre>3  <B>return</B> PARTITION(<I>A,p,r</I>)</sub></sup></pre><P>
<a name="077d_1350">We now make the new quicksort call <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>PARTITION</FONT> in place of <FONT FACE="Courier New" SIZE=2>PARTITION</FONT>:<P>
<pre>RANDOMIZED-QUICKSORT(<I>A,p,r</I>)</sub></sup></pre><P>
<pre>1  <B>if</B> <I>p</I> &lt; <I>r</I></sub></sup></pre><P>
<pre>2      <B>then</B> <I>q</I> <img src="../images/arrlt12.gif"> RANDOMIZED-PARTITION(<I>A,p,r</I>)</sub></sup></pre><P>
<pre>3           RANDOMIZED-QUICKSORT(<I>A,p,q</I>)</sub></sup></pre><P>
<pre>4           RANDOMIZED-QUICKSORT(<I>A,q </I>+ 1,<I>r</I>)</sub></sup></pre><P>
We analyze this algorithm in the next section.<P>





<h2><a name="077e_1355">Exercises<a name="077e_1355"></h2><P>
<a name="077e_1356">8.3-1<a name="077e_1356"><P>
<a name="077e_1351">Why do we analyze the average-case performance of a randomized algorithm and not its worst-case performance?<P>
<a name="077e_1357">8.3-2<a name="077e_1357"><P>
<a name="077e_1352">During the running of the procedure <FONT FACE="Courier New" SIZE=2>RANDOMIZED</FONT>-<FONT FACE="Courier New" SIZE=2>QUICKSORT</FONT>, how many calls are made to the random-number generator <FONT FACE="Courier New" SIZE=2>RANDOM</FONT> in the worst case? How does the answer change in the best case?<P>
<a name="077e_1358">8.3-3<a name="077e_1358"><P>
<a name="077e_1353">Describe an implementation of the procedure R<FONT FACE="Courier New" SIZE=2>ANDOM<I>(a, b)</I></FONT> that uses only fair coin flips. What is the expected running time of your procedure?<P>
<a name="077e_1359">8.3-4<a name="077e_1359"><P>
<a name="077e_1354">Give a <img src="../images/bound.gif">(<I>n</I>)-time, randomized procedure that takes as input an array <I>A</I>[1 . . <I>n</I>] and performs a random permutation on the array elements.<P>
<P>


<P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -