📄 chap22.htm
字号:
<h2>Heuristics to improve the running time</h2><P>
So far, we have not improved on the linked-list implementation. A sequence of <I>n</I> - 1 <FONT FACE="Courier New" SIZE=2>UNION</FONT> operations may create a tree that is just a linear chain of <I>n</I> nodes. By using two heuristics, however, we can achieve a running time that is almost linear in the total number of operations <I>m</I>.<P>
<a name="0892_16fa"><a name="0892_16fb">The first heuristic, <I><B>union by rank</I></B>, is similar to the weighted-union heuristic we used with the linked-list representation. The idea is to make the root of the tree with fewer nodes point to the root of the tree with more nodes. Rather than explicitly keeping track of the size of the subtree rooted at each node, we shall use an approach that eases the analysis. For each node, we maintain a <I><B>rank</I></B> that approximates the logarithm of the subtree size and is also an upper bound on the height of the node. In union by rank, the root with smaller rank is made to point to the root with larger rank during a <FONT FACE="Courier New" SIZE=2>UNION</FONT> operation.<P>
<a name="0892_16fc">The second heuristic, <I><B>path compression</I></B>, is also quite simple and very effective. As shown in Figure 22.5, we use it during <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> operations to make each node on the find path point directly to the root. Path compression does not change any ranks.<P>
<img src="448_a.gif"><P>
<h4><a name="0892_16fd">Figure 22.5 Path compression during the operation <FONT FACE="Courier New" SIZE=2>FIND-SET</FONT>. Arrows and self-loops at roots are omitted. (a) A tree representing a set prior to executing <FONT FACE="Courier New" SIZE=2>FIND<FONT FACE="Times New Roman" SIZE=2>-<FONT FACE="Courier New" SIZE=2>SET</FONT></FONT></FONT>(a). Triangles represent subtrees whose roots are the nodes shown. Each node has a pointer to its parent. (b) The same set after executing <FONT FACE="Courier New" SIZE=2>FIND<FONT FACE="Times New Roman" SIZE=2>-<FONT FACE="Courier New" SIZE=2>SET </FONT></FONT></FONT>(a). Each node on the find path now points directly to the root.<a name="0892_16fd"></sub></sup></h4><P>
<P>
<h2>Pseudocode for disjoint-set forests</h2><P>
To implement a disjoint-set forest with the union-by-rank heuristic, we must keep track of ranks. With each node <I>x</I>, we maintain the integer value <I>rank</I>[<I>x</I>], which is an upper bound on the height of <I>x</I> (the number of edges in the longest path between <I>x</I> and a descendant leaf). When a singleton set is created by <FONT FACE="Courier New" SIZE=2>MAKE</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT>, the initial rank of the single node in the corresponding tree is 0. Each <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> operation leaves all ranks unchanged. When applying <FONT FACE="Courier New" SIZE=2>UNION</FONT> to two trees, we make the root of higher rank the parent of the root of lower rank. In case of a tie, we arbitrarily choose one of the roots as the parent and increment its rank.<P>
Let us put this method into pseudocode. We designate the parent of node <I>x</I> by <I>p</I>[<I>x</I>]. The <FONT FACE="Courier New" SIZE=2>LINK</FONT> procedure, a subroutine called by <FONT FACE="Courier New" SIZE=2>UNION</FONT>, takes pointers to two roots as inputs.<P>
<pre><a name="0893_16fd">MAKE-SET(<I>x</I>)</sub></sup></pre><P>
<pre>1 <I>p</I>[<I>x</I>] <img src="../images/arrlt12.gif"> <I>x</I></sub></sup></pre><P>
<pre>2 <I>rank</I>[<I>x</I>]<img src="../images/arrlt12.gif"> 0</sub></sup></pre><P>
<pre></sub></sup></pre><P>
<pre><a name="0893_16fe">UNION(<I>x,y</I>)</sub></sup></pre><P>
<pre>1 LINK(FIND-SET(<I>x</I>), FIND-SET(<I>y</I>))</sub></sup></pre><P>
<pre></sub></sup></pre><P>
<pre><a name="0893_16ff">LINK(<I>x</I>,<I>y</I>)</sub></sup></pre><P>
<pre>1 <B>if</B> <I>rank</I>[<I>x</I>] > <I>rank</I>[<I>y</I>]</sub></sup></pre><P>
<pre>2 <B>then</B> <I>p</I>[<I>y</I>] <img src="../images/arrlt12.gif"> <I>x</I></sub></sup></pre><P>
<pre>3 <B>else</B> <I>p</I>[<I>x</I>] <img src="../images/arrlt12.gif"> <I>y</I></sub></sup></pre><P>
<pre>4 <B>if</B> <I>rank</I>[<I>x</I>] = <I>rank</I>[<I>y</I>]</sub></sup></pre><P>
<pre>5 <B>then </B><I>rank</I>[<I>y</I>] <img src="../images/arrlt12.gif"> <I>rank</I>[<I>y</I>] + 1</sub></sup></pre><P>
<a name="0893_1700">The <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> procedure with path compression is quite simple.<P>
<pre>FIND-SET(<I>x</I>)</sub></sup></pre><P>
<pre>1 <B>if</B> <I>x</I> <img src="../images/noteq.gif"> <I>p</I>[<I>x</I>]</sub></sup></pre><P>
<pre>2 <B>then </B><I>p</I>[<I>x</I>] <img src="../images/arrlt12.gif"> FIND-SET(<I>p</I>[<I>x</I>])</sub></sup></pre><P>
<pre>3 <B>return</B> <I>p</I>[<I>x</I>]</sub></sup></pre><P>
<a name="0893_1701">The <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> procedure is a <I><B>two-pass method</I></B>: it makes one pass up the find path to find the root, and it makes a second pass back down the find path to update each node so that it points directly to the root. Each call of <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT>(<I>x</I>) returns <I>p</I>[<I>x</I>] in line 3. If <I>x</I> is the root, then line 2 is not executed and <I>p</I>[<I>x</I>] =<I> x </I>is returned. This is the case in which the recursion bottoms out. Otherwise, line 2 is executed, and the recursive call with parameter <I>p</I>[<I>x</I>] returns a pointer to the root. Line 2 updates node <I>x</I> to point directly to the root, and this pointer is returned in line 3.<P>
<P>
<h2>Effect of the heuristics on the running time</h2><P>
Separately, either union by rank or path compression improves the running time of the operations on disjoint-set forests, and the improvement is even greater when the two heuristics are used together. Alone, union by rank yields the same running time as we achieved with the weighted union heuristic for the list representation: the resulting implementation runs in time <I>O</I>(<I>m</I> lg <I>n</I>) (see Exercise 22. 4-3). This bound is tight (see Exercise 22.3-3). Although we shall not prove it here, if there are <I>n</I> <FONT FACE="Courier New" SIZE=2>MAKE</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> operations (and hence at most <I>n</I> - 1 <FONT FACE="Courier New" SIZE=2>UNION</FONT> operations) and â <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> operations, the path-compression heuristic alone gives a worst-case running time of <img src="../images/bound.gif">(â log(<SUB>1 + â/<I>n</I>)</SUB><I>n</I>) if â <img src="../images/gteq.gif"> <I>n</I> and <img src="../images/bound.gif">(<I>n </I>+ â lg <I>n</I>) if â < <I>n</I>.<P>
When we use both union by rank and path compression, the worst-case running time is <I>O</I>(<I>m</I> <img src="../images/alpha12.gif"><I>(</I>m<I>, </I>n<I>)), where <img src="../images/alpha12.gif"></I>(<I>m</I>, <I>n</I>) is the <I>very</I> slowly growing inverse of Ackermann's function, which we define in Section 22.4. In any conceivable application of a disjoint-set data structure, <img src="../images/alpha12.gif"><I></I>(<I>m</I>, <I>n</I>) <img src="../images/lteq12.gif"> 4; thus, we can view the running time as linear in <I>m</I> in all practical situations. In Section 22.4, we prove the slightly weaker bound of <I>O</I>(<I>m</I> lg* <I>n</I>).<P>
<P>
<h2><a name="0895_1705">Exercises<a name="0895_1705"></h2><P>
<a name="0895_1706">22.3-1<a name="0895_1706"><P>
<a name="0895_1702"><a name="0895_1703">Do Exercise 22.2-2 using a disjoint-set forest with union by rank and path compression.<P>
<a name="0895_1707">22.3-2<a name="0895_1707"><P>
<a name="0895_1704">Write a nonrecursive version of <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> with path compression.<P>
<a name="0895_1708">22.3-3<a name="0895_1708"><P>
Give a sequence of <I>m</I> <FONT FACE="Courier New" SIZE=2>MAKE</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT>, <FONT FACE="Courier New" SIZE=2>UNION</FONT>, and <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> operations, <I>n</I> of which are <FONT FACE="Courier New" SIZE=2>MAKE</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> operations, that takes <img src="../images/omega12.gif">(<I>m </I>lg <I>n</I>) time when we use union by rank only.<P>
<a name="0895_1709">22.3-4<a name="0895_1709"><P>
Show that any sequence of <I>m</I> <FONT FACE="Courier New" SIZE=2>MAKE</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT>, <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT>, and <FONT FACE="Courier New" SIZE=2>UNION</FONT> operations, where all the <FONT FACE="Courier New" SIZE=2>UNION</FONT> operations appear before any of the <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>SET</FONT> operations, takes only <I>O</I>(<I>m</I>) time if both path compression and union by rank are used. What happens in the same situation if only the path-compression heuristic is used?<P>
<P>
<P>
<h1><a name="0896_0001">* 22.4 Analysis of union by rank with path compression<a name="0896_0001"></h1><P>
As noted in Section 22.3, the running time of the combined union-by-rank and path-compression heuristic is <I>O</I>(<I>m</I> <img src="../images/alpha12.gif"><I></I>(<I>m</I>, <I>n</I>)) for <I>m</I> disjoint-set operations on <I>n</I> elements. In this section, we shall examine the function <img src="../images/alpha12.gif"> to see just how slowly it grows. Then, rather than presenting the very complex proof of the <I>O</I>(<I>m</I> <img src="../images/alpha12.gif"><I></I>(<I>m</I>, <I>n</I>)) running time, we shall offer a simpler proof of a slightly weaker upper bound on the running time: <I>O</I>(<I>m</I> lg* <I>n</I>).<P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -