📄 chap30.htm

📁 程序设计的艺术,此书是英文版的,经久不衰,非常的经典
💻 HTM
📖 第 1 页 / 共 5 页
字号:
An efficient parallel solution, requiring only <I>O</I>(lg <I>n</I>) time, is given by the following parallel pseudocode.<P>
<pre><a name="094e_19cc">LIST-RANK(<I>L</I>)</sub></sup></pre><P>
<pre>1<B>  for</B> each processor <I>i</I>, in parallel</sub></sup></pre><P>
<pre>2<B>       do if</B> <I>next</I>[<I>i</I>] = NIL</sub></sup></pre><P>
<pre>3<B>             then</B> <I>d</I>[<I>i</I>] <img src="../images/arrlt12.gif"> 0</sub></sup></pre><P>
<pre>4<B>             else</B> <I>d</I>[<I>i</I>] <img src="../images/arrlt12.gif"> 1</sub></sup></pre><P>
<pre>5<B>  while</B> there exists an object <I>i</I> such that <I>next</I>[<I>i</I>] <img src="../images/noteq.gif"> NIL</sub></sup></pre><P>
<pre>6<B>      do for</B> each processor <I>i,</I> in parallel</sub></sup></pre><P>
<pre>7<B>             do if</B> <I>next</I>[<I>i</I>] <img src="../images/noteq.gif"> NIL</sub></sup></pre><P>
<pre>8<B>                   then</B> <I>d</I>[<I>i</I>] <img src="../images/arrlt12.gif"> <I>d</I>[<I>i</I>] + <I>d</I>[<I>next</I>[<I>i</I>]]</sub></sup></pre><P>
<pre>9                        <I>next</I>[i] <img src="../images/arrlt12.gif"> <I>next</I>[<I>next</I>[<I>i</I>]]</sub></sup></pre><P>
Figure 30.2 shows how the algorithm computes the distances. Each part of the figure shows the state of the list before an iteration of the <B>while </B> loop of lines 5-9. Part (a) shows the list just after initialization. In the first iteration, the first 5 list objects have non-<FONT FACE="Courier New" SIZE=2>nil</FONT> pointers, so that lines 8-9 are executed by their responsible processors. The result appears in part (b) of the figure. In the second iteration, only the first 4 objects have non-<FONT FACE="Courier New" SIZE=2>NIL</FONT> pointers; the result of this iteration is shown in part (c). In the third iteration, only the first 2 objects are operated on, and the final result, in which all objects have <FONT FACE="Courier New" SIZE=2>NIL</FONT> pointers, appears in part (d).<P>
The idea implemented by line 9, in which we set <I>next</I>[<I>i</I>]<I> </I><img src="../images/arrlt12.gif"><I> next</I>[<I>next</I>[<I>i</I>]] for all non-<FONT FACE="Courier New" SIZE=2>nil</FONT> pointers <I>next</I>[<I>i</I>]<I>,</I> is called <I><B>pointer jumping</I></B>. Note that th  e pointer fields are changed by pointer jumping, thus destroying the structure of the list. If the list structure must be preserved, then we make copies of the <I>next</I> pointers and use the copies to compute the distances.<P>





<h3>Correctness</h3><P>
<FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>RANK</FONT> maintains the invariant that at the beginning of each iteration of the <B>while</B> loop of lines 5-9, for each object <I>i,</I> if we add the <I>d</I> values in the sublist headed by <I>i,</I> we obtain the correct distance from <I>i</I> to the end of the original list <I>L</I>. In Figure 30.2(b), for example, the sublist headed by object 3 is the sequence <img src="../images/lftwdchv.gif">3,6,0<img src="../images/wdrtchv.gif"> whose <I>d</I> values 2, 2, and 1 sum to 5, its distance from the end of the original list. The reason the invariant is maintained is that when each object &quot;splices out&quot; its successor in the list, it adds its successor's <I>d</I> value to its own.<P>
Observe that for this pointer-jumping algorithm to work correctly, the parallel memory accesses must be synchronized. Each execution of line 9 can update several <I>next</I> pointers. We rely on all the memory reads on the right-hand side of the assignment (reading <I>next</I>[<I>next</I>[<I>i</I>]]) occurring before any of the memory writes (writing <I>next</I>[<I>i</I>]) on the left-hand side.<P>
Now let us see why <FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>RANK</FONT> is an EREW algorithm. Because each processor is responsible for at most one object, every read and write in lines 2-7 is exclusive, as are the writes in lines 8-9. Observe that pointer jumping maintains the invariant that for any two distinct objects <I>i</I> and <I>j, </I>either <I>next</I>[<I>i</I>] <img src="../images/noteq.gif"><I> next</I>[<I>j</I>]<I> or next</I>[<I>i</I>]<I> = next</I>[<I>j</I>] = <FONT FACE="Courier New" SIZE=2>NIL</FONT>. This invariant is certainly true for the initial list, and it is maintained by line 9. Because all non-<FONT FACE="Courier New" SIZE=2>NIL</FONT> <I>next</I> values are distinct, all reads in line 9 are exclusive.<P>
We do need to assume that some synchronization is performed in line 8 if all reads are to be exclusive. In particular, we require that all processors <I>i </I>read <I>d</I>[<I>i</I>] and then <I>d</I>[<I>next</I>[<I>i</I>]]<I>.</I> With this synchronization, if an object <I>i </I>has <I>next</I>[<I>i</I>] <img src="../images/noteq.gif"> <FONT FACE="Courier New" SIZE=2>NIL</FONT> and there is another object <I>j</I> pointing to <I>i</I> (that is, <I>next</I>[<I>j</I>]<I> = i</I>)<I>,</I> then the first read fetches <I>d</I>[<I>i</I>] for processor <I>i</I> and the second read fetches <I>d</I>[<I>i</I>] for processor <I>j.</I> Thus, L<FONT FACE="Courier New" SIZE=2>ist</FONT>-R<FONT FACE="Courier New" SIZE=2>ank</FONT> is an EREW algorithm.<P>
From here on, we ignore such details of synchronization and assume that the PRAM and its pseudocode programming environment act in a consistent, synchronized manner, with all processors executing reads and writes at the same time.<P>
<P>







<h3>Analysis</h3><P>
We now show that if there are <I>n</I> objects in list <I>L,</I> then L<FONT FACE="Courier New" SIZE=2>ist</FONT>-R<FONT FACE="Courier New" SIZE=2>ank</FONT> takes <I>O</I>(lg <I>n</I>) time. Since the initialization takes <I>O</I>(1) time and each iteration of the <B>while</B> loop takes <I>O</I>(1) time, it suffices to show that there are exactly <FONT FACE="Courier New" SIZE=2><img src="../images/hfbrul14.gif"></FONT>lg <I>n</I><FONT FACE="Courier New" SIZE=2><img src="../images/hfbrur14.gif"></FONT> iterations. The key observation is that each step of pointer jumping transforms each list into two interleaved lists: one consisting of the objects in even positions and the other consisting of objects in odd positions. Thus, each pointer-jumping step doubles the number of lists and halves their lengths. By the end of <FONT FACE="Courier New" SIZE=2><img src="../images/hfbrul14.gif"></FONT>lg <I>n</I><FONT FACE="Courier New" SIZE=2><img src="../images/hfbrur14.gif"></FONT> iterations, therefore, all lists contain only one object.<P>
We are assuming that the termination test in line 5 takes <I>O</I>(1) time, presumably due to a control network in the EREW PRAM. Exercise 30.1-8 asks you to describe an<I> O</I>(1g <I>n</I>)-time EREW implementation of L<FONT FACE="Courier New" SIZE=2>ist</FONT>-R<FONT FACE="Courier New" SIZE=2>ank </FONT>that performs the termination test explicitly in the pseudocode.<P>
<a name="0950_19cd">Besides parallel running time, there is another interesting performance measure for parallel algorithms. We define the <I><B>work</I></B> performed by a parallel algorithm as the product of its running time and the number of processors it requires. Intuitively, the work is the amount of computing that a serial RAM performs when it simulates the parallel algorithm.<P>
The procedure L<FONT FACE="Courier New" SIZE=2>ist</FONT>-R<FONT FACE="Courier New" SIZE=2>ank</FONT> performs <img src="../images/bound.gif">(<I>n</I> lg <I>n</I>) work, since it requires <I>n </I>processors and runs in <img src="../images/bound.gif">(lg <I>n</I>) time. The straightforward serial algorithm for the list-ranking problem runs in <img src="../images/bound.gif">(<I>n</I>) time, indicating that more work is performed by L<FONT FACE="Courier New" SIZE=2>ist</FONT>-R<FONT FACE="Courier New" SIZE=2>ank</FONT> than is absolutely necessary, but only by a logarithmic factor.<P>
<a name="0950_19ce">We define a PRAM algorithm <I>A</I> to be <I><B>work-efficient</I></B> with respect to another (serial or parallel) algorithm <I>B</I> for the same problem if the work performed by <I>A</I> is within a constant factor of the work performed by <I>B</I>. We also say more simply that a PRAM algorithm <I>A </I>is <I><B>work-efficient</I></B> if  it is work-efficient with respect to the best possible algorithm on a serial RAM. Since the best possible serial algorithm for list ranking runs in <img src="../images/bound.gif">(<I>n</I>) time on a serial RAM, <FONT FACE="Courier New" SIZE=2>LIST</FONT>-<FONT FACE="Courier New" SIZE=2>RANK</FONT> is not work-efficient. We shall present a work-efficient parallel algorithm for list ranking in Section 30.4.<P>
<P>


<P>







<h2><a name="0951_19d5">30.1.2 Parallel prefix on a list<a name="0951_19d5"></h2><P>
<a name="0951_19cf"><a name="0951_19d0">The technique of pointer jumping extends well beyond the application of list ranking. Section 29.2.2 shows how, in the context of arithmetic circuits, a &quot;prefix&quot; computation can be used to perform binary addition quickly. We now investigate how pointer jumping can be used to perform prefix computations. Our EREW algorithm for the prefix problem runs in <I>O</I>(lg<I> n</I>) time on <I>n</I>-object lists.<P>
<a name="0951_19d1"><a name="0951_19d2"><a name="0951_19d3">A <I><B>prefix computation</I></B> is defined in terms of a binary, associative operator <img src="../images/circx.gif">. The computation takes as input a sequence <img src="../images/lftwdchv.gif"><I>x</I><SUB>1</SUB>, <I>x<SUB>2</I></SUB>, . . . , <I>x<SUB>n</I></SUB><img src="../images/wdrtchv.gif"> and produces as output a sequence <img src="../images/lftwdchv.gif"><I>y</I><SUB>1</SUB>, <I>y</I>, . . . , <I>y<SUB>n</SUB></I><img src="../images/wdrtchv.gif"><I> </I>such that <I>y</I><SUB>1</SUB> = <I>x</I><SUB>1</SUB> and<P>
<pre>y<I><SUB>k  </SUB>=  y<SUB>k</I>-1</SUB><img src="../images/circx.gif"> <I>x<SUB>k</I></sub></sup></pre><P>
<pre>=  <I>x</I><SUB>1</SUB> <img src="../images/circx.gif"> <I>x</I><SUB>2 </SUB><img src="../images/circx.gif"> . . . <img src="../images/circx.gif"> <I>x<SUB>k</I></sub></sup></pre><P>
for <I>k</I> = 2,3, . . ., <I>n.</I> In other words, each <I>y<SUB>k</I></SUB> is obtained by &quot;multiplying&quot; together the first <I>k</I> elements of the sequence of<I> x<SUB>k</I></SUB>--hence, the term &quot;prefix.<FONT FACE="CG Times (W1)" SIZE=2>&quot;</FONT> (The definition in Chapter 29 indexes the sequences from 0, whereas this definitionindexes from 1--an inessential difference.)<P>
As an example of a prefix computation, suppose that every element of an <I>n</I>-object list contains the value 1, and let <img src="../images/circx.gif"> be ordinary addition. Since the<I> k</I>th element of the list contains the value<I> x<SUB>k</I></SUB> = 1 for <I>k</I> = 1,2, . . ., <I>n,</I> a prefix computation produces <I>y<SUB>k</I></SUB> = <I>k,</I> the index of the <I>k</I>th element. Thus, another way to perform list ranking is to reverse the list (which can be done in <I>O</I>(1) time), perform this prefix computation, and subtract 1 from each value computed.<P>
We now show how an EREW algorithm can compute parallel prefixes in <I>O</I>(lg <I>n</I>) time on <I>n</I>-object lists. For convenience, we define the notation<P>
<pre>[<I>i, j</I>]<I> = x<SUB>i </I></SUB><img src="../images/circx.gif"><I><SUB> </SUB>x<SUB>i+</I>1</SUB><img src="../images/circx.gif"><I> <img src="../images/dot10.gif"><img src="../images/dot10.gif"><img src="../images/dot10.gif"> </I><img src="../images/circx.gif"><I> x<SUB>j  </I></sub></sup></pre><P>
for integers <I>i</I> and <I>j</I> in the range 1 <img src="../images/lteq12.gif"> <I>i </I><img src="../images/lteq12.gif"> j <I><img src="../images/lteq12.gif"> n.</I> Then, [<I>k,k</I>]<I> = x<SUB>k</I></SUB> for<P>
<pre><I>k = </I>1, 2,<I>...</I>, <I>n</I>, and</sub></sup></pre><P>
<pre>[<I>i,k</I>]<I> = </I>[<I>i</I>,<I> j</I>]<I> </I><img src="../images/circx.gif"><I> </I>[<I>j+</I>1, <I>k</I>]</sub></sup></pre><P>
for 0 <img src="../images/lteq12.gif"><I> i</I> <img src="../images/lteq12.gif"> <I>j </I>&lt; <I>k</I> <img src="../images/lteq12.gif"> <I>n</I>. In terms of this notation, the goal of a prefix computation is to compute <I>y<SUB>k</SUB> = </I>[1,<I> k</I>]<I> for k = </I>1, 2,<I> . . . </I>,<I> n.</I><P>
When we perform a prefix computation on a list, we wish the order of the input sequence <img src="../images/lftwdchv.gif"><I>x</I><SUB>1</SUB>, <I>x</I><SUB>2</SUB>, . . . , <I>x</I><SUB>n</SUB><img src="../images/wdrtchv.gif"> to be determined by how the objects are linked together in the list, and not by the index of the object in the array of memory that stores objects. (Exercise 30.1-2 asks for a prefix algorithm for arrays.) The following EREW algorithm starts with a value <I>x</I>[<I>i</I>] in each object <I>i</I> in a list<I> L</I>. If object<I> i</I> is the <I>k</I>th object from the beginning of the list, then <I>x</I>[<I>i</I>]<I> = x<SUB>k</I></SUB> is the <I>k</I>th element of the input sequence. Thus, the parallel prefix computation produces <I>y</I>[<I>i</I>]<I> = y<SUB>k</SUB> = </I>[1,<I> k</I>].<P>
<pre><a name="0951_19d4">LIST-PREFIX(<I>L</I>)</sub></sup></pre><P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -