⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chap30.htm

📁 介绍算法的书籍
💻 HTM
📖 第 1 页 / 共 5 页
字号:
The EREW model does not have this powerful AND facility. Any EREW algorithm that computes the maximum of <I>n</I> elements takes <img src="../images/omega12.gif">(lg <I>n</I>) time. The proof is conceptually similar to the lower-bound argument for finding the root of a binary tree. In that proof, we looked at how many nodes can &quot;know&quot; the identity of the root and showed that it at most doubles for each step. For the problem of computing the maximum of <I>n</I> elements, we consider how many elements &quot;think&quot; that they might possibly be the maximum. Intuitively, with each step of an EREW PRAM, this number can at most halve, which leads to the <img src="../images/omega12.gif">(lg <I>n</I>) lower bound.<P>
Remarkably, the <img src="../images/omega12.gif">(lg <I>n</I>) lower bound for computing the maximum holds even if we permit concurrent reading; that is, it holds for CREW algorithms. Cook, Dwork, and Reischuk [50] show, in fact, that any CREW algorithm for finding the maximum of <I>n</I> elements must run in <img src="../images/omega12.gif">(lg <I>n</I>) time, even with an unlimited number of processors and unlimited memory. Their lower bound also holds for the problem of computing the AND of <I>n</I> boolean values.<P>
<P>







<h2>Simulating a CRCW algorithm with an EREW algorithm</h2><P>
<a name="0957_19e4"><a name="0957_19e5">We now know that CRCW algorithms can solve some problems more quickly than can EREW algorithms. Moreover, any EREW algorithm can be executed on a CRCW PRAM. Thus, the CRCW model is strictly more powerful than the EREW model. But how much more powerful is it? In Section 30.3, we shall show that a <I>p</I>-processor EREW PRAM can sort <I>p </I>numbers in <I>O</I>(lg <I>p</I>) time. We now use this result to provide a theoretical upper bound on the power of a CRCW PRAM over an EREW PRAM.<P>
<a name="0957_19e6">Theorem 30.1<a name="0957_19e6"><P>
A <I>p</I>-processor CRCW algorithm can be no more than <I>O</I>(lg <I>p</I>) times faster than the best <I>p</I>-processor EREW algorithm for the same problem.<P>
<I><B>Proof</I></B>     The proof is a simulation argument. We simulate each step of the CRCW algorithm with an <I>O</I>(lg <I>p</I>)-time EREW computation. Because the processing power of both machines is the same, we need only focus on memory accessing. We only present the proof for simulating concurrent writes here. Implementation of concurrent reading is left as Exercise 30.2-8.<P>
The <I>p</I> processors in the EREW PRAM simulate a concurrent write of the CRCW algorithm using an auxiliary array <I>A</I> of length <I>p</I>. Figure 30.7 illustrates the idea. When CRCW processor P<I><SUB>i</I></SUB>, for <I>i</I> = 0, 1, . . . , <I>p</I> - 1, desires to write a datum <I>x<SUB>i</I></SUB> to a location <I>l<SUB>i</I></SUB>,<SUB> </SUB>each corresponding EREW processor <I>P<SUB>i</I></SUB> instead writes the ordered pair (<I>l<SUB>i</I></SUB>, <I>x<SUB>i</I></SUB>) to location <I>A</I>[<I><SUB>i</I></SUB>]. These writes are exclusive, since each processor writes to a distinct memory location. Then, the array <I>A</I> is sorted by the first coordinate of the ordered pairs in <I>O</I>(lg <I>p</I>) time, which causes all data written to the same location to be brought together in the output.<P>
<img src="707_a.gif"><P>
<h4><a name="0957_19e7">Figure 30.7 Simulating a concurrent write on an EREW PRAM. (a) A step of a common-CRCW algorithm in which 6 processors write concurrently to global memory. (b) Simulating the step on an EREW PRAM. First, ordered pairs containing location and data are written to an array A. The array is then sorted. By comparing adjacent elements in the array, we ensure that only the first of each group of identical writes into global memory is implemented. In this case, processors P<SUB>0</SUB>, P<SUB>2</SUB>, and P<SUB>5</SUB> perform the write.<a name="0957_19e7"></sub></sup></h4><P>
Each EREW processor <I>P<SUB>i</I></SUB>, for <I>i</I> = 1, 2, . . . , <I>p</I> - 1, now inspects <I>A</I>[<I>i</I>] = (<I>l<SUB>j</I></SUB>, <I>x<SUB>j</I></SUB>) and <I>A</I>[<I>i </I>- 1] = (<I>l<SUB>k</I></SUB>, <I>x<SUB>k</I></SUB>), where <I>j</I> and <I>k</I> are values in the range 0 <img src="../images/lteq12.gif"> <I>j</I>, <I>k</I> <img src="../images/lteq12.gif"> <I>p </I>- 1. If <I>l<SUB>j </I></SUB><img src="../images/noteq.gif"> <I>l<SUB>k</SUB> </I>or <I>i</I> = 0, then processor <I>P<SUB>i</I></SUB>, for <I>i</I> = 0, 1, . . . , <I>p</I> - 1, writes the datum <I>x<SUB>j</I></SUB> to location <I>l<SUB>j</I></SUB> in global memory. Otherwise, the processor does nothing. Since the array <I>A</I> is sorted by first coordinate, only one of the processors writing to any given location actually succeeds, and thus the write is exclusive. This process thus implements each step of concurrent writing in the common-CRCW model in <I>O</I>(lg <I>p</I>) time.      <P>
Other models for concurrent writing can be simulated as well. (See Exercise 30.2-9.)<P>
The issue arises, therefore, of which model is preferable--CRCW or EREW--and if CRCW, which CRCW model. Advocates of the CRCW models point out that they are easier to program than the EREW model and that their algorithms run faster. Critics contend that hardware to implement concurrent memory operations is slower than hardware to implement exc lusive memory operations, and thus the faster running time of CRCW algorithms is fictitious. In reality, they say, one cannot find the maximum of <I>n</I> values in <I>O</I>(1) time.<P>
Others say that the PRAM--either EREW or CRCW--is the wrong model entirely. Processors must be interconnected by a communication network, and the communication network should be part of the model. Processors should only be able to communicate with their neighbors in the network.<P>
It is quite clear that the issue of the &quot;right&quot; parallel model is not going to be easily settled in favor of any one model. The important thing to realize, however, is that these models are just that: models. For a given real-world situation, the various models apply to differing extents. The degree to which the model matches the engineering situation is the degree to which algorithmic analyses in the model will predict real-world phenomena. It is important to study the various parallel models and algorithms, therefore, so that as the field of parallel computing grows, an enlightened consensus on which paradigms of parallel computing are best suited for implementation can emerge.<P>
<P>







<h2><a name="0958_19f7">Exercises<a name="0958_19f7"></h2><P>
<a name="0958_19f8">30.2-1<a name="0958_19f8"><P>
<a name="0958_19e6"><a name="0958_19e7"><a name="0958_19e8">Suppose we know that a forest of binary trees consists of only a single tree with <I>n</I> nodes. Show that with this assumption, a CREW implementation of <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>ROOTS</FONT> can be made to run in <I>O</I>(1) time, independent of the depth of the tree. Argue that any EREW algorithm takes <img src="../images/omega12.gif">(lg <I>n</I>) time.<P>
<a name="0958_19f9">30.2-2<a name="0958_19f9"><P>
<a name="0958_19e9"><a name="0958_19ea"><a name="0958_19eb"><a name="0958_19ec">Give an EREW algorithm for <FONT FACE="Courier New" SIZE=2>FIND</FONT>-<FONT FACE="Courier New" SIZE=2>ROOTS</FONT> that runs in <I>O</I>(lg <I>n</I>) time on a forest of <I>n</I> nodes.<P>
<a name="0958_19fa">30.2-3<a name="0958_19fa"><P>
<a name="0958_19ed"><a name="0958_19ee">Give an <I>n</I>-processor CRCW algorithm that can compute the OR of <I>n</I> boolean values in <I>O</I>(1) time.<P>
<a name="0958_19fb">30.2-4<a name="0958_19fb"><P>
Describe an efficient CRCW algorithm to multiply two <I>n</I> <img src="../images/mult.gif"> <I>n</I> boolean matrices using <I>n</I><SUP>3</SUP> processors.<P>
<a name="0958_19fc">30.2-5<a name="0958_19fc"><P>
<a name="0958_19ef"><a name="0958_19f0"><a name="0958_19f1"><a name="0958_19f2">Describe an <I>O</I>(lg <I>n</I>)-time EREW algorithm to multiply two <I>n</I> <img src="../images/mult.gif"> <I>n</I> matrices of real numbers using <I>n</I><SUP>3</SUP> processors. Is there a faster common-CRCW algorithm? Is there a faster algorithm in one of the stronger CRCW models?<P>
<a name="0958_19fd">30.2-6<a name="0958_19fd"><P>
<a name="0958_19f3"><a name="0958_19f4">Prove that for any constant <FONT FACE="Courier New" SIZE=2><img src="../images/memof12.gif"></FONT> &gt; 0, there is an <I>O</I>(1)-time CRCW algorithm using <I>O</I>(<I>n</I><SUP>1+</SUP><img src="../images/memof12.gif">) processors to find the maximum element of an <I>n</I>-element array.<P>
<a name="0958_19fe">30.2-7<a name="0958_19fe"><P>
<a name="0958_19f5"><a name="0958_19f6">Show how to merge two sorted arrays, each with <I>n</I> numbers, in <I>O</I>(1) time using a priority-CRCW algorithm. Describe how to use this algorithm to sort in <I>O</I>(lg <I>n</I>) time. Is your sorting algorithm work-efficient?<P>
<a name="0958_19ff">30.2-8<a name="0958_19ff"><P>
Complete the proof of Theorem 30.1 by describing how a concurrent read on a <I>p</I>-processor CRCW PRAM is implemented in <I>O</I>(lg <I>p</I>) time on a <I>p</I>-processor EREW PRAM.<P>
<a name="0958_1a00">30.2-9<a name="0958_1a00"><P>
Show how a <I>p</I>-processor EREW PRAM can implement a <I>p</I>-processor combining-CRCW PRAM with only <I>O</I>(lg <I>p</I>) performance loss. (<I>Hint:</I>Use a parallel prefix computation.)<P>
<P>


<P>







<h1><a name="0959_1a0a">30.3 Brent's theorem and work efficiency<a name="0959_1a0a"></h1><P>
Brent's theorem shows how we can efficiently simulate a combinational circuit by a PRAM. Using this theorem, we can adapt many of the results for sorting networks from Chapter 28 and many of the results for arithmetic circuits from Chapter 29 to the PRAM model. Readers unfamiliar with combinational circuits may wish to review Section 29.1.<P>
<a name="0959_19f7"><a name="0959_19f8"><a name="0959_19f9"><a name="0959_19fa"><a name="0959_19fb"><a name="0959_19fc"><a name="0959_19fd"><a name="0959_19fe"><a name="0959_19ff">A <I><B>combinational circuit</I></B> is an acyclic network of <I><B>combinational elements</I></B>. Each combinational element has one or

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -