📄 chap09.htm
字号:
<a name="078f_1380"><a name="078f_1381"><a name="078f_1382"><a name="078f_1383"><I><B>Counting sort</I></B> assumes that each of the <I>n</I> input elements is an integer in the range 1 to <I>k</I>, for some integer <I>k</I>. When <I>k</I> = <I>O</I>(<I>n</I>), the sort runs in <I>O</I>(<I>n</I>) time.<P>
The basic idea of counting sort is to determine, for each input element <I>x</I>, the number of elements less than <I>x</I>. This information can be used to place element <I>x</I> directly into its position in the output array. For example, if there are 17 elements less than <I>x</I>, then <I>x</I> belongs in output position 18. This scheme must be modified slightly to handle the situation in which several elements have the same value, since we don't want to put them all in the same position.<P>
In the code for counting sort, we assume that the input is an array <I>A</I>[1 . . <I>n</I>], and thus <I>length</I>[<I>A</I>] = <I>n</I>. We require two other arrays: the array <I>B</I>[1 . . <I>n</I>] holds the sorted output, and the array <I>C</I>[1 . . <I>k</I>] provides temporary working storage.<P>
<img src="176_a.gif"><P>
<h4><a name="078f_1387">Figure 9.2 The operation of <FONT FACE="Courier New" SIZE=2>COUNTING<FONT FACE="Times New Roman" SIZE=2>-<FONT FACE="Courier New" SIZE=2>SORT</FONT></FONT></FONT> on an input array A[1 . . 8], where each element ofA is a positive integer no larger than k = 6. (a) The array A and the auxiliary array C after line 4. (b) The array C after line 7. (c)-(e) The output array B and the auxiliary array C after one, two, and three iterations of the loop in lines 9-11, respectively. Only the lightly shaded elements of array B have been filled in. (f) The final sorted output array B.<a name="078f_1387"></sub></sup></h4><P>
<img src="176_b.gif"><P>
<a name="078f_1384">Counting sort is illustrated in Figure 9.2. After the initialization in lines 1-2, we inspect each input element in lines 3-4. If the value of an input element is <I>i</I>, we increment <I>C</I>[<I>i</I>]. Thus, after lines 3-4, <I>C</I>[<I>i</I>] holds the number of input elements equal to <I>i</I> for each integer <I>i</I> = 1, 2, . . . , <I>k</I>. In lines 6-7, we determine for each <I>i </I>= 1, 2, . . . , <I>k</I>, how many input elements are less than or equal to <I>i</I>; this is done by keeping a running sum of the array <I>C</I>.<P>
Finally, in lines 9-11, we place each element <I>A</I>[<I>j</I>] in its correct sorted position in the output array <I>B</I>. If all <I>n</I> elements are distinct, then when we first enter line 9, for each <I>A</I>[<I>j</I>], the value <I>C</I>[<I>A</I>[<I>j</I>]] is the correct final position of <I>A</I>[<I>j</I>] in the output array, since there are <I>C</I>[<I>A</I>[<I>j</I>]] elements less than or equal to <I>A</I>[<I>j</I>]. Because the elements might not be distinct, we decrement <I>C</I>[<I>A</I>[<I>j</I>]] each time we place a value<I> A</I>[<I>j</I>]<I> </I>into the B array; this<I> </I>causes the next input element with a value equal to<I> A</I>[<I>j</I>], if one exists, to<I> </I>go to the position immediately before<I> A</I>[<I>j</I>] in the output array.<P>
How much time does counting sort require? The <B>for</B> loop of lines 1-2 takes time<I> O</I>(<I>k</I>), the <B>for</B> loop of lines 3-4 takes time <I>O</I>(<I>n</I>), the <B>for</B> loop of lines 6-7 takes time<I> O</I>(<I>k</I>), and the <B>for</B> loop of lines 9-11 takes time <I>O</I>(<I>n</I>). Thus, the overall time is <I>O</I>(<I>k + n</I>). In practice, we usually use counting sort when we have <I>k = O</I>(<I>n</I>), in which case the running time is <I>O</I>(<I>n</I>).<P>
Counting sort beats the lower bound of <img src="../images/omega12.gif">(<I>n </I>1g <I>n</I>) proved in Section 9.1 because it is not a comparison sort. In fact, no comparisons between input elements occur anywhere in the code. Instead, counting sort uses the actual values of the elements to index into an array. The <img src="../images/omega12.gif">(<I>n</I> lg <I>n</I>) lower bound for sorting does not apply when we depart from the comparison-sort model.<P>
<a name="078f_1385">An important property of counting sort is that it is <I><B>stable:</I></B> numbers with the same value appear in the output array in the same order as they do in the input array. That is, ties between two numbers are broken by the rule that whichever number appears first in the input array appears first in the output array. Of course, the property of stability is important only when satellite data are carried around with the element being sorted. We shall see why stability is important in the next section.<P>
<h2><a name="0790_1387">Exercises<a name="0790_1387"></h2><P>
<a name="0790_1388">9.2-1<a name="0790_1388"><P>
Using Figure 9.2 as a model, illustrate the operation of <FONT FACE="Courier New" SIZE=2>COUNTING</FONT>-<FONT FACE="Courier New" SIZE=2>SORT </FONT>on the array A = <img src="../images/lftwdchv.gif">7, 1, 3, 1, 2, 4, 5, 7, 2, 4, 3<img src="../images/wdrtchv.gif">.<P>
<a name="0790_1389">9.2-2<a name="0790_1389"><P>
<a name="0790_1386">Prove that <FONT FACE="Courier New" SIZE=2>COUNTING</FONT>-<FONT FACE="Courier New" SIZE=2>SORT</FONT> is stable.<P>
<a name="0790_138a">9.2-3<a name="0790_138a"><P>
Suppose that the <B>for</B> loop in line 9 of the <FONT FACE="Courier New" SIZE=2>COUNTING</FONT>-<FONT FACE="Courier New" SIZE=2>SORT</FONT> procedure is rewritten:<P>
<pre>9 <B>for</B><I> j </I><img src="../images/arrlt12.gif"> 1 <B>to</B> <I>length</I>[<I>A</I>]</sub></sup></pre><P>
Show that the algorithm still works properly. Is the modified algorithm stable?<P>
<a name="0790_138b">9.2-4<a name="0790_138b"><P>
Suppose that the output of the sorting algorithm is a data stream such as a graphics display. Modify <FONT FACE="Courier New" SIZE=2>COUNTING</FONT>-<FONT FACE="Courier New" SIZE=2>SORT</FONT> to produce the output in sorted order without using any substantial additional storage besides that in <I>A</I> and <I>C. </I>(<I>Hint:</I> Link elements of <I>A</I> that have the same key into lists. Where is a "free" place to keep the pointers for the linked list?)<P>
<a name="0790_138c">9.2-5<a name="0790_138c"><P>
Describe an algorithm that, given<I> n</I> integers in the range 1 to <I>k,</I> preprocesses its input and then answers any query about how many of the <I>n </I>integers fall into a range [<I>a</I> . . <I>b</I>] in <I>O</I>(1) time. Your algorithm should use <I>O</I>(<I>n </I>+ <I>k</I>) preprocessing time.<P>
<P>
<P>
<h1><a name="0791_138b">9.3 Radix sort<a name="0791_138b"></h1><P>
<a name="0791_1387"><a name="0791_1388"><a name="0791_1389"><I><B>Radix sort</I></B> is the algorithm used by the card-sorting machines you now find only in computer museums. The cards are organized into 80 columns, and in each column a hole can be punched in one of 12 places. The sorter can be mechanically "programmed" to examine a given column of each card in a deck and distribute the card into one of 12 bins depending on which place has been punched. An operator can then gather the cards bin by bin, so that cards with the first place punched are on top of cards with the second place punched, and so on.<P>
For decimal digits, only 10 places are used in each column. (The other two places are used for encoding nonnumeric characters.) A <I>d</I>-digit number would then occupy a field of <I>d</I> columns. Since the card sorter can look at only one column at a time, the problem of sorting <I>n</I> cards on a <I>d</I>-digit number requires a sorting algorithm.<P>
Intuitively, one might want to sort numbers on their <I>most significant </I>digit, sort each of the resulting bins recursively, and then combine the decks in order. Unfortunately, since the cards in 9 of the 10 bins must be put aside to sort each of the bins, this procedure generates many intermediate piles of cards that must be kept track of. (See Exercise 9.3-5.)<P>
Radix sort solves the problem of card sorting counterintuitively by sorting on the <I>least significant</I> digit first. The cards are then combined into a single deck, with the cards in the 0 bin preceding the cards in the 1 bin preceding the cards in the 2 bin, and so on. Then the entire deck is sorted again on the second least-significant digit and recombined in a like manner. The process continues until the cards have been sorted on all <I>d</I> digits. Remarkably, at that point the cards are fully sorted on the <I>d</I>-digit number. Thus, only <I>d</I> passes through the deck are required to sort. Figure 9.3 shows how radix sort operates on a "deck" of seven 3-digit numbers.<P>
It is essential that the digit sorts in this algorithm be stable. The sort performed by a card sorter is stable, but the operator has to be wary about not changing the order of the cards as they come out of a bin, even though all the cards in a bin have the same digit in the chosen column.<P>
In a typical computer, which is a sequential random-access machine, radix sort is sometimes used to sort records of information that are keyed by multiple fields. For example, we might wish to sort dates by three keys: year, month, and day. We could run a sorting algorithm with a comparison function that, given two dates, compares years, and if there is a tie, compares months, and if another tie occurs, compares days. Alternatively, we could sort the information three times with a stable sort: first on day, next on month, and finally on year.<P>
<pre>329 720 720 329</sub></sup></pre><P>
<pre>457 355 329 355</sub></sup></pre><P>
<pre>657 436 436 436</sub></sup></pre><P>
<pre>839 <img src="../images/rtbigar.gif"> 457 <img src="../images/rtbigar.gif"> 839 <img src="../images/rtbigar.gif"> 457</sub></sup></pre><P>
<pre>436 657 355 657</sub></sup></pre><P>
<pre>720 329 457 720</sub></sup></pre><P>
<pre>355 839 657 839</sub></sup></pre><P>
<pre> <img src="../images/arrup.gif"> <img src="../images/arrup.gif"> <img src="../images/arrup.gif"></sub></sup></pre><P>
<h4><a name="0791_138c">Figure 9.3 The operation of radix sort on a list of seven 3-digit numbers. The first column is the input. The remaining columns show the list after successive sorts on increasingly significant digit positions. The vertical arrows indicate the digit position sorted on to produce each list from the previous one.<a name="0791_138c"></sub></sup></h4><P>
<a name="0791_138a">The code for radix sort is straightforward. The following procedure assumes that each element in the <I>n</I>-element array <I>A</I> has <I>d </I>digits, where digit 1 is the lowest-order digit and digit <I>d</I> is the highest-order digit.<P>
<pre>RADIX-SORT(<I>A, d</I>)</sub></sup></pre><P>
<pre>1 <B>for</B> <I>i </I><img src="../images/arrlt12.gif"> 1 to <I>d</I></sub></sup></pre><P>
<pre>2 <B>do</B> use a stable sort to sort array <I>A</I> on digit <I>i</I></sub></sup></pre><P>
The correctness of radix sort follows by induction on the column being sorted (see Exercise 9.3-3). The analysis of the running time depends on the stable sort used as the intermediate sorting algorithm. When each digit is in the range <I>1</I> to <I>k</I>, and <I>k</I> is not too large, counting sort is the obvious choice. Each pass over<I> n</I> <I>d</I>-digit numbers then takes time <img src="../images/bound.gif">(<I>n + k</I>)<I>.</I> There are <I>d</I> passes, so the total time for radix sort is <img src="../images/bound.gif">(<I>dn + kd</I>)<I>.</I> When <I>d</I> is constant and <I>k = O</I>(<I>n</I>)<I>,</I> radix sort runs in linear time.<P>
Some computer scientists like to think of the number of bits in a computer word as being <img src="../images/bound.gif">(lg <I>n</I>). For concreteness, let's say that <I>d</I> lg <I>n</I> is the number of bits, where <I>d</I> is a positive constant. Then, if each number to be sorted fits in one computer word, we can treat it as a <I>d</I>-digit number in radix-<I>n</I> notation. As a concrete example, consider sorting 1 million 64-bit numbers. By treating these numbers as four-digit, radix-2<SUP>l6</SUP> numbers, we can sort them in just four passes using radix sort. This compares favorably with a typical <img src="../images/bound.gif">(<I>n</I> 1g <I>n</I>) comparison sort, which requires approximately 1g <I>n</I> = 20 operations per number to be sorted. Unfortunately, the version of radix sort that uses counting sort as the intermediate stable sort does not sort in place, which many of the <img src="../images/bound.gif">(<I>n</I> lg <I>n</I>) comparison sorts do. Thus, when primary memory storage is at a premium, an algorithm such as quicksort may be preferable.<P>
<h2><a name="0792_1390">Exercises<a name="0792_1390"></h2><P>
<a name="0792_1391">9.3-1<a name="0792_1391"><P>
Using Figure 9.3 as a model, illustrate the operation of <FONT FACE="Courier New" SIZE=2>RADIX</FONT>-<FONT FACE="Courier New" SIZE=2>SORT</FONT> on the following list of English words: COW, DOG, SEA, RUG, ROW, MOB, BOX, TAB, BAR, EAR, TAR, DIG, BIG, TEA, NOW, FOX.<P>
<a name="0792_1392">9.3-2<a name="0792_1392"><P>
<a name="0792_138b"><a name="0792_138c"><a name="0792_138d"><a name="0792_138e"><a name="0792_138f">Which of the following sorting algorithms are stable: insertion sort, merge sort, heapsort, and quicksort? Give a simple scheme that makes any sorting algorithm stable. How much additional time and space does your scheme entail?<P>
<a name="0792_1393">9.3-3<a name="0792_1393"><P>
Use induction to prove that radix sort works. Where does your proof need the assumption that the intermediate sort is stable?<P>
<a name="0792_1394">9.3-4<a name="0792_1394"><P>
Show how to sort<I> n</I> integers in the range 1 to <I>n</I><SUP>2</SUP> in <I>O</I>(<I>n</I>) time.<P>
<a name="0792_1395">9.3-5<a name="0792_1395"><P>
In the first card-sorting algorithm in this section, exactly how many sorting passes are needed to sort <I>d</I>-digit decimal numbers in the worst case? How many piles of cards would an operator need to keep track of in the worst case?<P>
<P>
<P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -