⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 http:^^www.cs.utexas.edu^users^less^research.html

📁 This data set contains WWW-pages collected from computer science departments of various universities
💻 HTML
字号:
MIME-Version: 1.0
Server: CERN/3.0
Date: Tuesday, 07-Jan-97 15:44:32 GMT
Content-Type: text/html
Content-Length: 11186
Last-Modified: Friday, 13-Dec-96 15:35:10 GMT

<HTML><HEAD><TITLE>LESS Research Agenda</TITLE><LINK REV="made" HREF="mailto:rdb@cs.utexas.edu"></HEAD><BODY text="#0a6080" bgcolor="#fff0cc"><H1 align=center>Laboratory for Experimental Software Systems<br>Research Agenda</H1><p>The <!WA0><!WA0><!WA0><!WA0><!WA0><A HREF=http://www.cs.utexas.edu/users/less/Welcome.html>Laboratory for Experimental SoftwareSystems (LESS)</A> at the <!WA1><!WA1><!WA1><!WA1><!WA1><A HREF=http://www.cs.utexas.edu>Universityof Texas at Austin's Department of Computer Sciences was</A> formed inSeptember 1996 by four new faculty members --- <!WA2><!WA2><!WA2><!WA2><!WA2><AHREF=http://www.cs.utexas.edu/users/lorenzo>Lorenzo Alvisi</A>, <!WA3><!WA3><!WA3><!WA3><!WA3><AHREF="http://www.cs.utexas.edu/users/rdb">Robert Blumofe</A>, <!WA4><!WA4><!WA4><!WA4><!WA4><AHREF="http://www.cs.utexas.edu/users/dahlin">Mike Dahlin</A>, and <!WA5><!WA5><!WA5><!WA5><!WA5><AHREF="http://www.cs.utexas.edu/users/lin">Calvin Lin</A> --- toaggregate resources and promote collaboration on research inexperimental software systems, particularly in the areas ofprogramming support and fault tolerance for cluster and web-basedapplications.  This document gives a brief overview of research beingconducted in the LESS lab.</p><p><b>Fault tolerant parallel computing with distributed shared memory(Alvisi and Blumofe).</b> Prior work has shown that the combination ofa &quot;well structured&quot; parallel programming model, therandomized &quot;work-stealing&quot; scheduling algorithm, and the&quot;dag consistency&quot; coherence model of distributed sharedmemory (a combination that form the basis for the <it>Cilk</it>parallel language and runtime system) yields efficient and predictableperformance both in theory and in practice.  Furthermore, we claimthat by using an <it>end-to-end design</it>, algorithmic properties ofthis combination can be leveraged to make such a system fault tolerantwith extremely low overhead and without redundant computation (exceptduring recovery).</p><p>We propose to use a combination of two new techniques ---&quot;return transactions&quot; and &quot;causal logging ofreconciles&quot; --- that take advantage of the following keyalgorithmic property of the well structuring, work stealing, and dagconsistency combination.  When a procedure activation is stolen, allmodifications made to shared memory by the stolen activation and allof its descendants do not need to be seen by any other extantactivation except for the stolen activation's parent.  Moreover, thesemodifications do not need to be seen by the parent until after thestolen activation returns.</p><p>The <it>return transactions</it> technique uses this fact to turneach stolen activation into an atomic transaction.  This technique,coupled with uncoordinated checkpoints, has already been shown to beeffective for a functional programming model.  In general, however,with distributed shared memory, this technique is not sufficient as itrequires that all modifications to shared memory made by a stolenactivation and all of its descendants are buffered to create an atomictransaction when the stolen activation returns.</p><p>To avoid potentially huge amounts of buffering, <it>causal loggingof reconciles</it> will use causal message-logging techniques to allowmodifications to shared memory to be flushed (reconciled) to backingstore even before the stolen activation returns.  In general, causalmessage-logging requires that extra information of a fixed size ispiggy-backed on each message that effectively logs the message(without requiring a synchronous write to stable storage).  With wellstructuring, work stealing, and dag consistency however, this loggingonly needs to be done for a specific subset of the reconcile messages,and this overhead can be amortized against the cost of workstealing.</p><p><b>Reliable parallel scientific subroutine libraries (Blumofe).</b>Traditionally, parallel scientific subroutine libraries, such asvarious parallel implementations of the Basic Linear AlgebraSubroutines (BLAS), have been coded by statically partitioning workamong a static set of processes or threads.  This approach has beenvery successful for traditional parallel platforms in which eachprogram runs on a static set of (effectively) dedicated processors.With the growing use and acceptance of SMPs and clusters for parallelcomputation, however, this assumption of dedicated resources is nolonger valid, and it has been shown that applications and librariescoded with static partitioning have very unreliable performance whenrun on non-dedicated resources.  On the other hand, it has been shownthat by using <it>wait-free synchronization</it> techniques and adynamic partitioning (such as with work stealing), performance becomesvery reliable.  To make this point, we propose to code and makeavailable a set of libraries, including BLAS, for SMPs (and laterclusters) that use these techniques to deliver reliable andpredictable performance on shared resources.</p><p><b>wFS: An adaptive data framework for web computing (Dahlin).</b>Although an increasing amount of valuable data resides on the web,current &quot;browser-centric&quot; data-access protocols limit itsuse. This project seeks to provide stronger cache consistency and dataupdate guarantees that will enable new classes of web-basedapplications.  Because the physical characteristics of the Internetmake it expensive to provide some of these guarantees, wFS will pursuean adaptive and application-specific approach. The system will providea range of consistency and update options with different guaranteesand different costs, and applications will pay for only the guaranteesthat they require. For example, a web browser may emphasizescalability and continue to use the current read-only and weak cacheconsistency approach. Conversely, a distributed parallel computationmay require transactional updates and strict cache consistency even ifthese guarantees limit its scalability to a few hundred nodes. Two keyaspects of the project will be providing a framework for instantiatingdifferent consistency and update algorithms under a common interfaceand providing quantitative criteria that applications can use toselect appropriate algorithms.</p><p><b>Lightweight fault-tolerance (Alvisi and Vin).</b>The objective of this research is to support and enable a new class oftruly distributed and fault-tolerant applications in which distributedagents communicate through messages as well as files.  Our proposed<it>lightweight fault-tolerance<it> will have the following properties.<ul><li>It will integrate with applications in a way that is transparentto the application programmer.</li><li>Its use will require few additional resources and have anegligible impact on performance during failure-free executions.</li><li>Its cost will be very low for the most common failures, and itwill scale depending on the severity and number of failures that needto be tolerated.</li><li>It will address software-generated faults effectively.</li></ul>To achieve transparency, we plan to engineer our solution as amiddleware.  To minimize dedicated resources, we plan to use rollbackrecovery techniques.  To minimize the impact on applicationperformance and to scale the cost of our solution with the number offailures that need to be tolerated, we plan to use <it>causallogging</it>.</p><p>Using current techniques, tolerating hardware-generated faults ispossible, but at the cost of potentially forcing the application toblock for every I/O operation while data critical to recovery arelogged to stable storage. Specifically, one cannot assume that a fileread during the execution will still be available in its original formduring recovery. Hence, input from the file system must besynchronously logged to stable storage. Furthermore, since the filesystem in general cannot roll back, the application must delay outputto the file system until it executes an <it>output commit</it>protocol, which requires synchronous logging to stablestorage. Tolerating transient software-generated faults --- theso-called <it>Heisenbugs</it> --- through rollback-based techniquesbecomes more problematic as well, since frequent writes to the filesystem can limit the extent by which a process can roll back.</p><p>To address these problems, the middleware that we plan to buildwill present the file system to the application not as a detachedcomponent of the external environment, but as an integrated partnerthat can be trusted to provide the data needed during recovery.  Weexpect that this will drastically reduce the costs incurred by theapplication in performing file I/O. Specifically, our solution willhave the following benefits.<UL><li>Avoid synchronous logging of input data. If a client fails, themiddleware and the file system cooperate to guarantee that duringrecovery, the client will receive the same data as it received beforefailing.</li><li>Avoid synchronous writes to the file server due to filesharing. In our solution, clients can pass dirty data directly to eachother without using the file server to make the data stable. Themiddleware guarantees that any dirty data kept in the volatile memoryof a client <it>c</it>, and passed to another client without firstbeing saved to the file server, can be regenerated during recovery if<it>c</it> fails.</li><li>Avoid a synchronous output commit protocol before writing afile. The middleware and the file system cooperate to guarantee that,if the client crashes, the application's state in which the output wasgenerated will never be rolled back.</li><li>Enhance the effectiveness of rollback-based techniques forsoftware fault-tolerance. The middleware allows a client thatexperiences a Heisenbug to roll back past its last write to the filesystem, increasing the likelihood of successful recovery.</li></UL></p><p><b>Parallel computing on the world-wide web with Java (Alvisi,Blumofe, Dahlin, and Lin).</b> This project will use Java as the basisfor a new parallel computing infrastructure, to be called <it>Jem</it>(pronounced &quot;gem&quot;) for the world-wide web.  The Jem languagewill augment Java with simple primitives to express parallelism whilemaintaining the well structured property.  The Jem virtual machineruntime system will use work stealing and dag consistency, and it willprovide transparent light-weight fault tolerance as described above.These properties in further combination with existing Java technologywill allow Jem programs to run across heterogeneous resources anduntrusting resources.  Thus, applications of national andinternational importance, such as climate modeling, can be coded inJem and run reliably on the aggregated resources of the entireworld-wide web, and applications of corporate importance, such asscheduling, data mining, and simulation, can be coded in Jem and runreliably on the aggregated resources of the enterprise intranet.</p><HR><p>Back to <!WA6><!WA6><!WA6><!WA6><!WA6><AHREF="http://www.cs.utexas.edu/users/less/Welcome.html">LESS</A></p><ADDRESS>Last modified: December 13, 1996<BR><!WA7><!WA7><!WA7><!WA7><!WA7><A HREF="http://www.cs.utexas.edu/users/rdb">Robert Blumofe<BR></A><!WA8><!WA8><!WA8><!WA8><!WA8><A HREF="mailto:rdb@cs.utexas.edu">rdb@cs.utexas.edu</A> </ADDRESS></BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -