📄 chap30.htm

📁 程序设计的艺术,此书是英文版的,经久不衰,非常的经典
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<pre>1<B>  for</B> each processor <I>i</I>, in parallel</sub></sup></pre><P>
<pre>2<B>      do</B> <I>y</I>[<I>i</I>] <img src="../images/arrlt12.gif"> <I>x</I>[<I>i</I>]</sub></sup></pre><P>
<pre>3<B>  while</B> there exists an object <I>i</I> such that <I>next</I>[<I>i</I>] <img src="../images/noteq.gif"> NIL</sub></sup></pre><P>
<pre>4<B>      do for</B> each processor <I>i,</I> in parallel</sub></sup></pre><P>
<pre>5<B>             do if</B> <I>next</I>[<I>i</I>] <img src="../images/noteq.gif"> NIL</sub></sup></pre><P>
<pre>6<B>                   then</B> <I>y</I>[<I>next</I>[<I>i</I>]] <img src="../images/arrlt12.gif"> <I>y</I>[<I>i</I>] <img src="../images/circx.gif"> <I>y</I>[<I>next</I>[<I>i</I>]]</sub></sup></pre><P>
<pre>7                        <I>next</I>[<I>i</I>] <img src="../images/arrlt12.gif"> <I>next</I>[<I>next</I>[<I>i</I>]]</sub></sup></pre><P>
The pseudocode and Figure 30.3 indicate the similarity between this algorithm and L<FONT FACE="Courier New" SIZE=2>ist</FONT>-R<FONT FACE="Courier New" SIZE=2>ank</FONT>. The only differences are the initialization and the updating of <I>d</I> or <I>y</I> values. In L<FONT FACE="Courier New" SIZE=2>ist-</FONT>R<FONT FACE="Courier New" SIZE=2>ank</FONT>, processor <I>i</I> updates <I>d</I>[<I>i</I>]--its own <I>d</I> value--whereas in <FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>PREFIX</FONT>, processor <I>i</I> updates <I>y</I>[<I>next</I>[<I>i</I>]]--another processor's <I>y</I> value. Note that L<FONT FACE="Courier New" SIZE=2>IST-</FONT><FONT FACE="Courier New" SIZE=2>PREFIX</FONT> is EREW for the same reason as L<FONT FACE="Courier New" SIZE=2>ist</FONT>-R<FONT FACE="Courier New" SIZE=2>ank</FONT>: pointer jumping maintains the invariant that for distinct objects <I>i</I> and <I>j</I>, either <I>next</I>[<I>i</I>]<I> </I><img src="../images/noteq.gif"><I> next</I>[<I>j</I>]<I> </I>or<I> next</I>[<I>i</I>]<I> = next</I>[<I>j</I>]<I> =<FONT FACE="Courier New" SIZE=2> NIL</FONT>.</I><P>
Figure 30.3 shows the state of the list before each iteration of the <B>while </B>loop. The procedure maintains the invariant that at the end of the <I>t</I>th execution of the <B>while</B> loop, the <I>k</I>th processor stores [max(1,<I> k - </I>2<I><SUP>t</SUP> + </I>1),<I> k</I>],<I> </I>for <I>k</I> = 1, 2, . . . , <I>n</I>. In the first iteration, the <I>k</I>th list object points initially to the (<I>k</I> + 1)st object, except that the last object has a <FONT FACE="Courier New" SIZE=2>NIL</FONT> pointer. Line 6 causes the <I>k</I>th object, for <I>k</I> = 1, 2, . . . , <I>n</I> - 1, to fetch the value [<I>k </I>+ 1, <I>k </I>+ 1] from its successor. It then performs the operation [<I>k, k</I>] <img src="../images/circx.gif"> [<I>k + </I>1,<I> k </I>+ 1], yielding [<I>k, k </I>+ 1], which it stores back into its successor. The <I>next</I> pointers are then jumped as in <FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>RANK</FONT>, and the result of the first iteration appears in Figure 30.3(b). We can view the second iteration similarly. For <I>k</I> = 1, 2, . . . , <I>n</I> - 2, the <I>k</I>th object fetches the value [<I>k + </I>1,<I> k </I>+ 2] from its successor (as defined by the new value in its field <I>next</I>), and then it stores [<I>k - </I>1,<I> k</I>]<I> </I><img src="../images/circx.gif"> <I></I>[<I>k + </I>1,<I> k + </I>2]<I> = </I>[<I>k - </I>1,<I> k + </I>2] into its successor. The result is shown in Figure 30.3(c). In the third and final iteration, only the first two list objects have non-<FONT FACE="Courier New" SIZE=2>NIL</FONT> pointers, and they fetch values from their successors in their respective lists. The final result appears in Figure 30.3(d). The key observation that makes <FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>PREFIX</FONT> work is that at each step, if we perform a prefix computation on each of the several existing lists, each object obtains its correct value.<P>
<img src="697_a.gif"><P>
<h4><a name="0951_19d6">Figure 30.3 The parallel prefix algorithm <FONT FACE="Courier New" SIZE=2>LIST-PREFIX</FONT> on a linked list. (a) The initial y value of the kth object in the list is [k, k]. The next pointer of the kth object points to the (k + 1)st object, or <FONT FACE="Courier New" SIZE=2>NIL</FONT> for the last object. (b)-(d) The y and next values before each test in line 3. The final answer is in part (d), in which the y value for the kth object is [1, k] for all k.<a name="0951_19d6"></sub></sup></h4><P>
Since the two algorithms use the same pointer-jumping mechanism, <FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>PREFIX</FONT> has the same analysis as <FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>RANK</FONT>: the running time is <I>O</I>(lg<I> n</I>) on an EREW PRAM, and the total work performed is <img src="../images/bound.gif">(<I>n </I>1g<I> n</I>).<P>
<P>







<h2><a name="0952_19d9">30.1.3 The Euler-tour technique<a name="0952_19d9"></h2><P>
<a name="0952_19d5"><a name="0952_19d6">In this section, we shall introduce the Euler-tour technique and show how it can be applied to the problem of computing the depth of each node in an <I>n</I>-node binary tree. A key step in this <I>O</I>(lg<I> n</I>)-time EREW algorithm is a parallel prefix computation.<P>
To store binary trees in a PRAM, we use a simple binary-tree representation of the sort presented in Section 11.4. Each node <I>i</I> has fields <I>parent</I>[<I>i</I>], <I>left</I>[<I>i</I>], and <I>right</I>[<I>i</I>], which point to node <I>i</I>'s parent, left child, and right child, respectively. Let us assume that each node is identified by a non-negative integer. For reasons that will soon become apparent, we associate not one but three processors with each node; we call these the node's <I>A, B</I>, and <I>C</I> processors. We should be able to map between a node and its three processors easily; for example, node <I>i</I> might be associated with processors 3<I>i</I>, 3<I>i </I>+ 1, and 3<I>i </I>+ 2.<P>
<a name="0952_19d7">Computing the depth of each node in an <I>n</I>-node tree takes <I>O</I>(<I>n</I>) time on a serial RAM. A simple parallel algorithm to compute depths propagates a &quot;wave&quot; downward from the root of the tree. The wave reaches all nodes at the same depth simultaneously, and thus by incrementing a counter carried along with the wave, we can compute the depth of each node. This parallel algorithm works well on a complete binary tree, since it runs in time proportional to the tree's height. The height of the tree could be as large as <I>n</I> - 1, however, in which case the algorithm would run in <img src="../images/bound.gif">(<I>n</I>) time--no better than the serial algorithm. Using the Euler-tour technique, however, we can compute node depths in <I>O</I>(l<I>g n</I>) time on an EREW PRAM, whatever the height of the tree.<P>
<a name="0952_19d8">An <I><B>Euler tour</I></B> of a graph is a cycle that traverses each edge exactly once, although it may visit a vertex more than once. By Problem 23-3, a connected, directed graph has an Euler tour if and only if for all vertices <I>v</I>, the in-degree of <I>v</I> equals the out-degree of <I>v</I>. Since each undirected edge (<I>u, v</I>) in an undirected graph maps to two directed edges (<I>u, v</I>) and (<I>v, u</I>) in the directed version, the directed version of any connected, undirected graph--and therefore of any undirected tree--has an Euler tour.<P>
To compute the depths of nodes in a binary tree <I>T</I>, we first form an Euler tour of the directed version of <I>T</I> (viewed as an undirected graph). The tour corresponds to a walk of the tree and is represented in Figure 30.4(a) by a linked list running through the nodes of the tree. Its structure is as follows:<P>
<img src="../images/dot12.gif">     A node's <I>A</I> processor points to the <I>A</I> processor of its left child, if it exists, and otherwise to its own <I>B</I> processor.<P>
<img src="../images/dot12.gif">     A node's <I>B</I> processor points to the <I>A</I> processor of its right child, if it exists, and otherwise to its own <I>C</I> processor.<P>
<img src="../images/dot12.gif">     A node's <I>C</I> processor points to the <I>B</I> processor of its parent if it is a left child and to the <I>C</I> processor of its parent if it is a right child. The root's <I>C</I> processor points to <FONT FACE="Courier New" SIZE=2>NIL</FONT>.<P>
Thus, the head of the linked list formed by the Euler tour is the root's <I>A</I> processor, and the tail is the root's <I>C</I> processor. Given the pointers composing the original tree, an Euler tour can be constructed in <I>O</I>(1) time.<P>
Once we have the linked list representing the Euler tour of <I>T</I>, we place a 1 in each <I>A</I> processor, a 0 in each <I>B</I> processor, and a - 1 in each <I>C </I>processor, as shown in Figure 30.4(a). We then perform a parallel prefix computation using ordinary addition as the associative operation, as we did in Section 30.1.2. Figure 30.4(b) shows the result of the parallel prefix computation.<P>
<img src="699_a.gif"><P>
<h4><a name="0952_19da">Figure 30.4 Using the Euler-tour technique to compute the depth of each node in a binary tree. (a) The Euler tour is a list corresponding to a walk of the tree. Each processor contains a number used by a parallel prefix computation to compute node depths. (b) The result of the parallel prefix computation on the linked list from (a). The C processor of each node (blackened) contains the node's depth. (You can verify the result of this prefix computation by computing it serially.)<a name="0952_19da"></sub></sup></h4><P>
We claim that after performing the parallel prefix computation, the depth of each node resides in the node's <I>C</I> processor. Why? The numbers are placed into the <I>A</I>, <I>B</I>, and <I>C </I>processors in such a way that the net effect of visiting a subtree is to add 0 to the running sum. The <I>A</I> processor of each node <I>i</I> contributes 1 to the running sum in <I>i</I>'s left subtree, reflecting the depth of <I>i</I>'s left child being one greater than the depth of <I>i</I>. The <I>B </I>processor contributes 0 because the depth of node <I>i</I>'s left child equals the depth of node <I>i</I>'s right child. The <I>C</I> processor contributes - 1, so that from the perspective of node <I>i</I>'s parent, the entire visit to the subtree rooted at node <I>i</I> has no effect on the running sum.<P>
The list representing the Euler tour can be computed in <I>O</I>(1) time. It has 3<I>n</I> objects, and thus the the parallel prefix computation takes only <I>O</I>(lg <I>n</I>)<I> </I>time. Thus, the total amount of time to compute all node depths is <I>O</I>(lg<I> n</I>). Because no concurrent memory accesses are needed, the algorithm is an EREW algorithm.<P>
<P>







<h2><a name="0953_0001">Exercises<a name="0953_0001"></h2><P>
<a name="0953_0002">30.1-1<a name="0953_0002"><P>
Give an <I>O</I>(lg <I>n</I>)-time EREW algorithm that determines for each object in an <I>n</I>-object list whether it is the middle (<img src="../images/hfbrdl12.gif"><I>n</I>/2<img src="../images/hfbrdr12.gif">th) object.<P>
<a name="0953_0003">30.1-2<a name="0953_0003"><P>
Give an <I>O</I>(lg <I>n</I>)-time EREW algorithm to perform the prefix computation on an array <I>x</I>[1. . <I>n</I>]. Do not use pointers, but perform index computations directly.<P>
<a name="0953_0004">30.1-3<a name="0953_0004"><P>
Suppose that each object in an <I>n</I>-object list <I>L</I> is colored either red or blue. Give an efficient EREW algorithm to form two lists from the objects in <I>L</I>: one consisting of the blue objects and one consisting of the red objects.<P>
<a name="0953_0005">30.1-4<a name="0953_0005"><P>
An EREW PRAM has <I>n</I> objects distributed among several disjoint circular lists. Give an efficient algorithm that determines an arbitrary representative object for each list and acquaints each object in the list with the identity of the representative. Assume that each processor knows its own unique index.<P>
<a name="0953_0006">30.1-5<a name="0953_0006"><P>
Give an <I>O</I>(lg<I> n</I>)-time EREW algorithm to compute the size of the subtree rooted at each node of an <I>n</I>-node binary tree. (<I>Hint</I>: Take the difference of two values in a running sum along an Euler tour.)<P>
<a name="0953_0007">30.1-6<a name="0953_0007"><P>
Give an efficient EREW algorithm to compute preorder, inorder, and post-order numberings for an arbitrary binary tree.<P>
<a name="0953_0008">30.1-7<a name="0953_0008"><P>
Extend the Euler-tour technique from binary trees to ordered trees with arbitrary node degrees. Specifically, describe a representation for ordered trees that allows the Euler-tour technique to be applied. Give an EREW algorithm to compute the node depths of an <I>n</I>-node ordered tree in <I>O</I>(lg <I>n</I>) time.<P>
<a name="0953_0009">30.1-8<a name="0953_0009"><P>
Describe an <I>O</I>(l<I>g n</I>)-time EREW implementation of <FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>RANK</FONT> that performs the loop-termination test explicitly. (<I>Hint</I>: Interleave the test with the loop body.)<P>
<P>


<P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -