📄 chap09.htm
字号:
<HTML><HEAD>
<TITLE>Intro to Algorithms: CHAPTER 9: SORTING IN LINEAR TIME</TITLE></HEAD><BODY BGCOLOR="#FFFFFF">
<a href="chap10.htm"><img align=right src="../../images/next.gif" alt="Next Chapter" border=0></A>
<a href="toc.htm"><img align=right src="../../images/toc.gif" alt="Return to Table of Contents" border=0></A>
<a href="chap08.htm"><img align=right src="../../images/prev.gif" alt="Previous Chapter" border=0></A>
<h1><a name="0789_1378">CHAPTER 9: SORTING IN LINEAR TIME<a name="0789_1378"></h1><P>
We have now introduced several algorithms that can sort <I>n</I> numbers in <I>O</I>(<I>n </I>lg <I>n</I>) time. Merge sort and heapsort achieve this upper bound in the worst case; quicksort achieves it on average. Moreover, for each of these algorithms, we can produce a sequence of <I>n</I> input numbers that causes the algorithm to run in <img src="../images/omega12.gif">(<I>n </I>lg <I>n</I>) time.<P>
<a name="0789_1377">These algorithms share an interesting property: <I>the sorted order they determine is based only on comparisons between the input elements</I>. We call such sorting algorithms <I><B>comparison sorts</I></B>. All the sorting algorithms introduced thus far are comparison sorts.<P>
In Section 9.1, we shall prove that any comparison sort must make <img src="../images/omega12.gif">(<I>n </I>lg <I>n</I>) comparisons in the worst case to sort a sequence of <I>n</I> elements. Thus, merge sort and heapsort are asymptotically optimal, and no comparison sort exists that is faster by more than a constant factor.<P>
Sections 9.2 and 9.3 examine three sorting algorithms--counting sort, radix sort, and bucket sort--that run in linear time. Needless to say, these algorithms use operations other than comparisons to determine the sorted order. Consequently, the <img src="../images/omega12.gif">(<I>n</I> lg <I>n</I>) lower bound does not apply to them.<P>
<h1><a name="078b_137a">9.1 Lower bounds for sorting<a name="078b_137a"></h1><P>
<a name="078b_1378"><a name="078b_1379">In a comparison sort, we use only comparisons between elements to gain order information about an input sequence <img src="../images/lftwdchv.gif"><I>a</I><SUB>1</SUB>, <I>a</I><SUB>2</SUB>, . . . ,<I>a<SUB>n</SUB>) </I>That is, given two elements <I>a<SUB><FONT FACE="Courier New" SIZE=2>i</I></FONT></SUB> and <I>a<SUB><FONT FACE="Courier New" SIZE=2>j</I></FONT></SUB>, we perform one of the tests <I>a<SUB>i</SUB> < a<SUB>j</SUB>, a<SUB>i</SUB> <img src="../images/lteq12.gif"> a<SUB>j</SUB>, a<SUB>i</SUB> = a<SUB>j</SUB>, a<SUB>i</SUB> <img src="../images/gteq.gif"> a<SUB>j</SUB>, or a<SUB>i</SUB> > a<SUB>j</I></SUB> to determine their relative order. We may not inspect the values of the elements or gain order information about them in any other way.<P>
In this section, we assume without loss of generality that all of the input elements are distinct. Given this assumption, comparisons of the form <I>a<SUB>i</I></SUB> = <I>a<SUB>j</I></SUB> are useless, so we can assume that no comparisons of this form are made. We also note that the comparisons <I>a<SUB>i</I></SUB> <img src="../images/lteq12.gif"> <I>a<SUB>j</I></SUB>, <I>a<SUB>i</I></SUB> <img src="../images/gteq.gif"> <I>a<SUB>j</I></SUB>, <I>a<SUB>i</I></SUB> > <I>a<SUB>j</I></SUB>, and <I>a<SUB>i</I></SUB> < <I>a<SUB>j</I></SUB> are all equivalent in that they yield identical information about the relative order of <I>a<SUB>i</I></SUB> and <I>a<SUB>j</I></SUB>. We therefore assume that all comparisons have the form <I>a<SUB>i</I></SUB> <img src="../images/lteq12.gif"> <I>a<SUB>j</I></SUB>.<P>
<img src="173_a.gif"><P>
<h4><a name="078b_137b">Figure 9.1 The decision tree for insertion sort operating on three elements. There are 3! = 6 possible permutations of the input elements, so the decision tree must have at least 6 leaves.<a name="078b_137b"></sub></sup></h4><P>
<h2>The decision-tree model</h2><P>
<a name="078c_137a"><a name="078c_137b">Comparison sorts can be viewed abstractly in terms of <I><B>decision trees</I></B>. A decision tree represents the comparisons performed by a sorting algorithm when it operates on an input of a given size. Control, data movement, and all other aspects of the algorithm are ignored. Figure 9.1 shows the decision tree corresponding to the insertion sort algorithm from Section 1.1 operating on an input sequence of three elements.<P>
In a decision tree, each internal node is annotated by <I>a<SUB>i</SUB> </I>:<I> a<SUB>j</I></SUB> for some <I>i</I> and <I>j</I> in the range 1 <img src="../images/lteq12.gif"> <I>i</I>, <I>j</I> <img src="../images/lteq12.gif"> <I>n</I>, where <I>n</I> is the number of elements in the input sequence. Each leaf is annotated by a permutation <img src="../images/lftwdchv.gif"><img src="../images/piuc.gif">(1),<img src="../images/piuc.gif">(2), . . . ,<img src="../images/piuc.gif">(<I>n</I>)<img src="../images/wdrtchv.gif">. (See Section 6.1 for background on permutations.) The execution of the sorting algorithm corresponds to tracing a path from the root of the decision tree to a leaf. At each internal node, a comparison <I>a<SUB>i</I></SUB> <img src="../images/lteq12.gif"> <I>a<SUB>j</I></SUB> is made. The left subtree then dictates subsequent comparisons for <I>a<SUB>i</I></SUB> <img src="../images/lteq12.gif"> <I>a<SUB>j</I></SUB>, and the right subtree dictates subsequent comparisons for <I>a<SUB>i</I></SUB> > <I>a<SUB>j</I></SUB>. When we come to a leaf, the sorting algorithm has established the ordering <I>a</I><img src="../images/piuc.gif"><I>(1)<SUB><img src="../images/lteq12.gif"> </SUB></I>a<I><img src="../images/piuc.gif">(2)<SUB> <img src="../images/lteq12.gif"> . . . <img src="../images/lteq12.gif"> </I>a</SUB><FONT FACE="Courier New" SIZE=2><I><SUB><img src="../images/piuc.gif"></I>(<I>n</I>).</FONT></SUB> Each of the <I>n</I>! permutations on <I>n</I> elements must appear as one of the leaves of the decision tree for the sorting algorithm to sort properly.<P>
<P>
<h2>A lower bound for the worst case</h2><P>
The length of the longest path from the root of a decision tree to any of its leaves represents the worst-case number of comparisons the sorting algorithm performs. Consequently, the worst-case number of comparisons for a comparison sort corresponds to the height of its decision tree. A lower bound on the heights of decision trees is therefore a lower bound on the running time of any comparison sort algorithm. The following theorem establishes such a lower bound.<P>
<a name="078d_137d">Theorem 9.1<a name="078d_137d"><P>
<a name="078d_137c">Any decision tree that sorts <I>n</I> elements has height <img src="../images/omega12.gif">(<I>n </I>lg <I>n</I>).<P>
<I><B>Proof </I></B>Consider a decision tree of height <I>h</I> that sorts <I>n</I> elements. Since there are <I>n</I>! permutations of <I>n</I> elements, each permutation representing a distinct sorted order, the tree must have at least <I>n</I>! leaves. Since a binary tree of height <I>h</I> has no more than 2<I><SUP>h</I></SUP> leaves, we have<P>
<pre><I>n</I>! <img src="../images/lteq12.gif"> 2<I><SUP>h</I></SUP> ,</sub></sup></pre><P>
which, by taking logarithms, implies<P>
<pre><I>h</I> <img src="../images/gteq.gif"> lg(<I>n</I>!) ,</sub></sup></pre><P>
since the lg function is monotonically increasing. From Stirling's approximation (2.11), we have<P>
<img src="174_a.gif"><P>
where <I>e</I> = 2.71828 . . . is the base of natural logarithms; thus<P>
<img src="174_b.gif"><P>
<a name="078d_137e">Corollary 9.2<a name="078d_137e"><P>
Heapsort and merge sort are asymptotically optimal comparison sorts.<P>
<I><B>Proof </I></B>The <I>O</I>(<I>n </I>lg <I>n</I>) upper bounds on the running times for heapsort and merge sort match the <img src="../images/omega12.gif">(<I>n </I>lg <I>n</I>) worst-case lower bound from Theorem 9.1. <P>
<P>
<h2><a name="078e_1380">Exercises<a name="078e_1380"></h2><P>
<a name="078e_1381">9.1-1<a name="078e_1381"><P>
<a name="078e_137d"><a name="078e_137e"><a name="078e_137f">What is the smallest possible depth of a leaf in a decision tree for a sorting algorithm?<P>
<a name="078e_1382">9.1-2<a name="078e_1382"><P>
Obtain asymptotically tight bounds on lg(<I>n</I>!) without using Stirling's approximation. Instead, evaluate the summation <img src="174_c.gif"> using techniques from Section 3.2.<P>
<a name="078e_1383">9.1-3<a name="078e_1383"><P>
Show that there is no comparison sort whose running time is linear for at least half of the <I>n</I>! inputs of length <I>n</I>. What about a fraction of 1/<I>n</I> of the inputs of length <I>n</I>? What about a fraction 1/2<I><SUP>n</I></SUP>?<P>
<a name="078e_1384">9.1-4<a name="078e_1384"><P>
Professor Solomon claims that the <img src="../images/omega12.gif">(<I>n </I>lg <I>n</I>) lower bound for sorting <I>n</I> numbers does not apply to his computer environment, in which the control flow of a program can split three ways after a single comparison <I>a<SUB>i </I></SUB>:<I><SUB> </SUB>a<SUB>j</I>, according to whether </SUB><I>a<SUB>i</I></SUB> < <I>a<SUB>j</I></SUB>, <I>a<SUB>i</I></SUB> = <I>a<SUB>j</I></SUB>, or <I>a<SUB>i</I></SUB> > <I>a<SUB>j</I></SUB>. Show that the professor is wrong by proving that the number of three-way comparisons required to sort <I>n</I> elements is still <img src="../images/omega12.gif">(<I>n </I>lg <I>n</I>).<P>
<a name="078e_1385">9.1-5<a name="078e_1385"><P>
Prove that 2<I>n</I> - 1 comparisons are necessary in the worst case to merge two sorted lists containing <I>n</I> elements each.<P>
<a name="078e_1386">9.1-6<a name="078e_1386"><P>
You are given a sequence of<I> n</I> elements to sort. The input sequence consists of <I>n</I>/<I>k</I> subsequences, each containing <I>k</I> elements. The elements in a given subsequence are all smaller than the elements in the succeeding subsequence and larger than the elements in the preceding subsequence. Thus, all that is needed to sort the whole sequence of length <I>n</I> is to sort the <I>k</I> elements in each of the <I>n</I>/<I>k</I> subsequences. Show an <img src="../images/omega12.gif">(<I>n </I>lg <I>k</I>) lower bound on the number of comparisons needed to solve this variant of the sorting problem. (<I>Hint</I>: It is not rigorous to simply combine the lower bounds for the individual subsequences.)<P>
<P>
<P>
<h1><a name="078f_1386">9.2 Counting sort<a name="078f_1386"></h1><P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -