http:^^www.cs.cornell.edu^info^people^barber^516fin^pcmrivl.html
来自「This data set contains WWW-pages collect」· HTML 代码 · 共 544 行 · 第 1/3 页
HTML
544 行
to a process handler's virtual address, which is then invoked from Active-Messages.<p> <hr><a name="5.0"><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><!WA44><a href="#home">Go Back</a><h2>5.0 Performance Results</h2>We ran our shared-memory experiments on a Quad-Processor SparcStation 10 running SunOS. Our Networked Implementation was tested by using 4 ATM-connected SparcStation 20s running SunOS. We constructed two different test cases, named <b>Test 1</b> and <b>Test 2</b>. The two tests perform the following image operations:<p><ul><li><b>Test 1:</b>There are 2 input sequences of images. The first image sequence (IM1) is scaled, rotated, and copied four times. The resulting output is then overlayed onto the second image sequence (IM2), and then output.<p><li><b>Test 2:</b>There are 2 input sequences of images. IM1 is scaled, rotated, and copiedfour times. IM2 is smoothed. The output from IM1 is then overlayed ontop of the output forIM2.</ul>Overall, Test 2 is a more computationally expensive set of operations than Test 1. This fact is illustrated by our experimental results.<p><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><!WA45><img src = "http://www.cs.cornell.edu/Info/People/barber/516fin/presen/sld011.gif"><p>From our graphed results above, the shared-memory implementation performssomewhat better than our Networked implementation. Both implementations, however, perform better than their serial counterparts (the green bar graph). One observation was that the networkedimplementation exhibited a large spread of timings for different frames, and this we attributedto our process getting preempted. The behavior was not visible on the shared memory implementationas our process was sleeping, waiting for the semaphores to change, while the process in the networkimplementation busy-waits. Hopefully, an interrupt driven implementation of active messageswould cure this.<p><b>Note: In all tests, the processor speed is relatively equal.</b>Results:<p> <ul><li><b>Shared Memory:</b> In both tests 1 and 2, the performance gains exhibited the followingpatterns:<p><ul><li><i>From 1 to 2 Processors</i>: Performance is nearly doubled.<li><i>From 2 to 3 Processors</i>: Again, our performance is nearly doubled.<li><i>From 3 to 4 Processors</i>: The performance increase is negligble. Performance is not increasing either because the communication overhead exceeds the performance gain, or because the processors are un-optimally load-balanced (probably the latter).</ul><p><li><b>Networked Implementation:</b> <p><ul><li><i>From 1 to 2 Processors</i>: Performance is nearly doubled.<li><i>From 2 to 3 Processors</i>: There is a small improvement in performance, however, the shared memory implementation appears to do a little better.<li><i>From 3 to 4 Processors</i>: The performance increase is again negligble. The explanation for this is probably the same as in the shared memory experiment.</ul> <hr><a name="6.0"><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><!WA46><a href="#home">Go Back</a><h2>6.0 Extensions & Robustness</h2>There are a number of improvements that can and should be made to improve overall performance and robustness of our parallelization scheme.<p><ol><li><b>Improve the Load-Balance:</b> The largest improvement involves improving the load-balance among all of the RIVL-processes by using a "Hungry Puppy Strategy" for dividing up the work. Our current implementations statically allocate work to each RIVL process. The location and the amount of data that is needed for each RIVL process is determined as a function of the number of processes and the process ID. As indicated from our experimental results, there is no significant boost from 3 to 4 RIVL processes using our shared-memory implementation. We can partly attribute this problem to an un-optimal load balance.<p> Modifying the Networked implementation should prove more trouble-some, and while improving the overall load-balance, will probably increase the communication overhead, as more Active-Message will have to be sent and processed.<p>Modifying the Shared-Memory version should be easier. The current synchronization mechanism is implemented by using UNIX semaphores. No RIVL process is allowed to begin executing the next frame until all RIVL processes have completed execution of the current frame. The output-image is currently divided up by the number processes available for work. We could improve the load-balance for this implementation by doing two things: (1) by dividing up the output-image work regions into more numerous smaller segments; and (2) for a current frame, allow RIVL processes to complete executing their output segment, and grab another segment from the Still-Need-to-be-Computed Queue residing on the Master process. This implementation will improve load-balance by allowing less-busy processes to contribute equally to the entire output image, while giving busier processors the time they need to compute their data without becoming a bottleneck for the entire output image.<p><li><b>Improve Reliability and Fault-Tolerance:</b> In real-time systems, it is not uncommon for things to go wrong. Specifically, what should happen in the even that a slave RIVL process crashes? Our current implementations do not account for such mishaps. If a process were to malfunction, due to either hardware or communication failure, our implementation would fail.<p> <li><b>Port our ATM-Sparc Implementations over to Fast Ethernet PC:</b> In designing any system, cost is always an issue. The purpose for implementing PRIVL over Active-Messages was to utilize the lower cost of workstations and networks as compared to expensive parallel machines. The cost of higher performance PCs is rapidly on the decline. Adapting our implementations to Fast Ethernet is a natural step in reducing the cost of high-performance CM PRIVL. The actual transition from ATM-Sparc to Fast Ethernet-PC is merely a matter of getting Active-Messages to work over Fast Ethernet.<p></ul> <hr><a name="7.0"><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><!WA47><a href="#home">Go Back</a><h2>7.0 Conclusions</h2>We were looking for significant speedups in Parallel CM RIVL as we moved from 1 to N processors (N being no more than 4). Our results are definitely encouraging. In both our shared-memory implementation and our networked implementation, we obtained good speedupsup to four processors. In order to process real-time data, we need to approach a frame-processingrate of close to 30 frames per second, or rougly 33 ms per frame. For the operations we have tested,we will require upwards of 30 similar processors to achieve the desired frame rate.<p> We do not have results for more than four processors. However, by examining our results, we candetermine that under the current implementations, the processes running Parallel CM RIVLwill not be load-balanced.<p> Unfortunately, we must conclude that our implemenations as they stand will not scale to upwards of 30 processors to achieve the desired frame rate. However, further work is under way to address this load-balancing problem. Furthermore, a "Hungry-Puppy" object-tracking algorithm is currently being incorporated into PRIVL. The experimental results from this should be available shortly.<p>We have however made significant progress in parallelizing CM RIVL. CM RIVL is a non-trivial application, and our parallelization scheme works for most of the standard RIVL image operations.<p> <hr> <a name="8.0"><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><!WA48><a href="#home">Go Back</a><h2>8.0 References</h2><ul><li> Jonathan Swartz, Brian C. Smith <!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><!WA49><a href="http://www.cs.cornell.edu/Info/Projects/zeno/Rivl/Rivl-mm95/mm-95.html"> <i>A Resolution Independent Video Language</i> </A> Proc. of the Third ACM International Conference on Multimedia, San Francisco, CA, November 5-9, 1995. <p><li> Lawrence A. Rowe, Brian C. Smith, <I>Continuous Media Player</I>, Third International Workshop on Network and Operating Systems Support for Digital Audio and Video, Nov. 12-13, 1992, San Diego, CA.<p><li> Brian C. Smith, Lawrence A. Rowe, Stephen C. Yen <!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><!WA50><a href="http://www.cs.cornell.edu/Info/Projects/zeno/Tcl-DP/tcl-dp.ps"> <I>Tcl Distributed Programming</I></a>, Proc. of the 1993 Tcl/TK Workshop, Berkeley, CA, June 1993.<p><li> von Eicken, T., D. E. Culler, S. C. Goldstein, and K. E. Schauser, <!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><!WA51><a href="http://www.cs.cornell.edu/Info/Projects/CAM/isca92.ps"> <i>Active Messages: a Mechanism for Integrated Communication and Computation</i></a>. Proceedings of the 19th Int'l Symp. on Computer Architecture, May 1992, Gold Coast, Australia.<p><li> Anindya Basu, Vineet Buch, Werner Vogels, Thorsten von Eicken, <!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><!WA52><a href="http://www.cs.cornell.edu/Info/Projects/U-Net/sosp.ps"> <i>U-Net: A User-Level Network Interface for Parallel and Distributed Computing</i></a>, Proc. of the 15th ACM Symposium on Operating Systems Principles, Copper Mountain, Colorado, December 3-6, 1995.<p><li> Sugata Mukhopadhyay, Arun Verma, <!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><!WA53><a href="http://www.cs.cornell.edu/Info/Courses/Fall-95/CS631/final-projects/Integratig-Rivl-and-CMT/final.html"> <i>CMRivL - A Programmable Video Gateway</a></i>, Cornell University, Spring '96<p> </ul></body></html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?