📄 rfc992.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页

                                                       K. P. Birman (Cornell)
Network Working Group                                  T. A. Joseph (Cornell)
Request for Comments: 992                              November 1986



       On Communication Support for Fault Tolerant Process Groups

                     K. P. Birman and T. A. Joseph
             Dept. of Computer Science, Cornell University
                           Ithaca, N.Y. 14853
                              607-255-9199


1. Status of this Memo.

   This memo describes a collection of multicast communication primi-
   tives integrated with a mechanism for handling process failure and
   recovery.  These primitives facilitate the implementation of fault-
   tolerant process groups, which can be used to provide distributed
   services in an environment subject to non-malicious crash failures.
   Unlike other process group approaches, such as Cheriton's "host
   groups" (RFC's 966, 988, [Cheriton]), our approach provides powerful
   guarantees about the behavior of the communication subsystem when
   process group membership is changing dynamically, for example due to
   process or site failures, recoveries, or migration of a process from
   one site to another.  Our approach also addresses delivery ordering
   issues that arise when multiple clients communicate with a process
   group concurrently, or a single client transmits multiple multicast
   messages to a group without pausing to wait until each is received.
   Moreover, the cost of the approach is low.  An implementation is be-
   ing undertaken at Cornell as part of the ISIS project.

   Here, we argue that the form of "best effort" reliability provided by
   host groups may not address the requirements of those researchers who
   are building fault tolerant software.  Our basic premise is that re-
   liable handling of failures, recoveries, and dynamic process migra-
   tion are important aspects of programming in distributed environ-
   ments, and that communication support that provides unpredictable
   behavior in the presence of such events places an unacceptable burden
   of complexity on higher level application software.  This complexity
   does not arise when using the fault-tolerant process group alterna-
   tive.

   This memo summarizes our approach and briefly contrasts it with other
   process group approaches.  For a detailed discussion, together with
   figures that clarify the details of the approach, readers are re-
   ferred to the papers cited below.

   Distribution of this memo is unlimited.




Birman & Joseph                                                 [Page 1]

RFC 992                                                    November 1986


2. Acknowledgments

   This memo was adopted from a paper presented at the Asilomar workshop
   on fault-tolerant distributed computing, March 1986, and summarizes
   material from a technical report that was issued by Cornell Universi-
   ty, Dept. of Computer Science, in August 1985, which will appear in
   ACM Transactions on Computer Systems in February 1987 [Birman-b].
   Copies of these paper, and other relevant papers, are available on
   request from the author: Dept. of Computer Science, Cornell Universi-
   ty, Ithaca, New York 14853. (birman@gvax.cs.cornell.edu).  The ISIS
   project also maintains a mailing list.  To be added to this list,
   contact M. Schmizzi (schiz@gvax.cs.cornell.edu).

   This work was supported by the Defense Advanced Research Projects
   Agency (DoD) under ARPA order 5378, Contract MDA903-85-C-0124, and by
   the National Science Foundation under grant DCR-8412582.  The views,
   opinions and findings contained in this report are those of the au-
   thors and should not be construed as an official Department of De-
   fense position, policy, or decision.

3. Introduction

   At Cornell, we recently completed a prototype of the ISIS system,
   which transforms abstract type specifications into fault-tolerant
   distributed implementations, while insulating users from the mechan-
   isms by which fault-tolerance is achieved.  This version of ISIS, re-
   ported in [Birman-a], supports transactional resilient objects as a
   basic programming abstraction.  Our current work undertakes to pro-
   vide a much broader range of fault-tolerant programming mechanisms,
   including fault-tolerant distributed bulletin boards [Birman-c] and
   fault-tolerant remote procedure calls on process groups [Birman-b].
   The approach to communication that we report here arose as part of
   this new version of the ISIS system.

   Unreliable communication primitives, such as the multicast group com-
   munication primitives proposed in RFC's 966 and 988 and in [Cheri-
   ton], leave some uncertainty in the delivery status of a message when
   failures and other exceptional events occur during communication.
   Instead, a form of "best effort" delivery is provided, but with the
   possibility that some member of a group of processes did not receive
   the message if the group membership was changing just as communica-
   tion took place.  When we tried to use this sort of primitive in our
   original work on ISIS, which must behave reliably in the presence of
   such events, we had to address this aspect at an application level.
   The resulting software was complex, difficult to reason about, and
   filled with obscure bugs, and we were eventually forced to abandon
   the entire approach as infeasible.

   A wide range of reliable communication primitives have been proposed
   in the literature, and we became convinced that by using them, the
   complexity of our software could be greatly reduced.  These range



Birman & Joseph                                                 [Page 2]

RFC 992                                                    November 1986


   from reliable and atomic broadcast [Chang] [Cristian] [Schneider] to
   Byzantine agreement [Strong].  For several reasons, however, the ex-
   isting work does not solve the problem at hand.  The most obvious is
   that they do not provide a mechanism for sending a message to all the
   members of a group when the membership is changing dynamically (the
   "group addressing" problem).  In addition, one can identify delivery
   ordering issues and questions regarding the detection of communica-
   tion failures that should be handled within the broadcast mechanism.
   These motivate a careful reexamination of the entire reliable broad-
   cast problem.

   The multicast primitives we report here are designed to respect
   several sorts of ordering constraints, and have cost and latency that
   varies depending on the nature of the constraint required [Birman-b]
   [Joseph-a] [Joseph-b].  Failure and recovery are integrated into the
   communication subsystem by treating these events as a special sort of
   multicast issued on behalf of a process that has failed or recovered.
   The primitives are presented in the context of fault tolerant process
   groups: groups of processes that cooperate to implement some distri-
   buted algorithm or service, and which need to see consistent order-
   ings of system events in order to achieve mutually consistent
   behavior.  Such groups are similar to the host groups of the V system
   and the ones described in RFC's 966 and 988, but provide guarantees
   of consistency in just the situations where a host group provides a
   "best effort" delivery which may sometimes be erroneous.

   It is helpful to think of our primitives as providing a logical or
   "virtual" form of reliability: rather than addressing physical
   delivery issues, they ensure that a client will never observe a sys-
   tem state "inconsistent" with the assumption that reliable delivery
   has occurred.  Readers familiar with serializability theory may want
   to think of this as a weaker analog: in serializability, one allows
   interleaved executions of operations provided that the resulting sys-
   tem state is consistent with the assumption that execution was
   sequential.  Similarly, reliable communication primitives permit de-
   viations from the reliable delivery abstraction provided that the
   resulting system state is indistinguishable from one in which reli-
   able delivery actually did occur.

   Using our primitives, the ISIS system achieved both high levels of
   concurrency and suprisingly good performance.  Equally important, its
   structure was made suprisingly simple, making it feasible to reason
   about the correctness of the algorithms that are needed to maintain
   high availability even when failures, recoveries, or process migra-
   tion occurs.  More recently, we have applied the same approach to a
   variety of other problems in distributed computing, and even designed
   a consistent, fault tolerant, distributed bulletin board data struc-
   ture (a generalized version of the blackboards used in artificial in-
   telligence programs), with equally good results [Birman-c].  Thus, we
   feel that the approach has been shown to work in a variety of set-
   tings where unreliable primitives simply could not be used.



Birman & Joseph                                                 [Page 3]

RFC 992                                                    November 1986


   In the remainder of this memo we summarize the issues and alterna-
   tives that the designer of a distributed system is presented with,
   focusing on two styles of support for fault-tolerant computing: re-
   mote procedure calls coupled with a transactional execution facility,
   such as is used in the ARGUS system [Liskov], and the fault-tolerant
   process group mechanism mentioned above.  We argue that transactional
   interactions are too restrictive to support the sort of mechanism
   needed, and then show how our primitives can be used to provide such
   a mechanism.  We conclude by speculating on future directions in
   which this work might be taken.

4. Issues in fault-tolerance

   The difficulty of constructing fault-tolerant distributed software
   can be traced to a number of interrelated issues.  The list that fol-
   lows is not exhaustive, but attempts to touch on the principal con-
   siderations that must be addressed in any such system:

      [1]Synchronization.  Distributed systems offer the potential for
      large amounts of concurrency, and it is usually desirable to
      operate at as high a level of concurrency as possible.  However,
      when we move from a sequential execution environment to a con-
      current one, it becomes necessary to synchronize actions that may
      conflict in their access to shared data or entail communication
      with overlapping sets of processes.  Thus, a mechanism is needed
      for ordering conflicting events.  Additional problems that can
      arise in this context include deadlock avoidance or detection,
      livelock avoidance, etc.

      [2]Failure detection.  It is usually necessary for a fault-
      tolerant application to have a consistent picture of which com-
      ponents fail, and in what order. Timeout, the most common mechan-
      ism for detecting failure, is unsatisfactory, because there are
      many situations in which a healthy component can timeout with
      respect to one component without this being detected by some
      another.  Failure detection under more rigorous requirements
      requires an agreement protocol that is related to Byzantine agree-
      ment [Strong] [Hadzilacos].  Regardless of how this problem is
      solved, some sort of reliable failure detection mechanism will be
      needed in any fault-tolerant distributed system.

      [3] Consistency.  When a group of processes cooperate in a distri-
      buted system, it is necessary to ensure that the operational
      processes have consistent views of the state of the group as a
      whole.  For example, if process p believes that some property A
      holds, and on the basis of this interacts with process q, the
      state of q should not contradict the fact that p believes A to be
      true.  This problem is closely related to notions of knowledge and
      consistency in distributed systems [Halpern] [Lamport].  In our
      context, A will often be the assertion that a multicast has been
      received by q, or that q saw some sequence of events occur in the



Birman & Joseph                                                 [Page 4]

RFC 992                                                    November 1986


      same order as did p.  Thus, it is necessary to be able to specify
      the precise consistency constraints on a distributed software sys-
      tem, and system support should be available to facilitate the
      attainment of these constraints.

      [4] Serializability.  Many distributed systems are partitioned
      into data manager processes, which implement shared variables, and
      transaction manager processes, which issue requests to data
      managers [Bernstein].  If transaction managers can execute con-
      currently, it is desirable to ensure that transactions produce
      serializable outcomes [Eswaren] [Papadimitrou].  Serializability
      is increasingly viewed as an important property in "object-
      oriented" distributed systems that package services as abstract
      objects with which clients communicate by remote procedure calls
      (RPC).  On the other hand, there are systems for which serializa-
      bility is either too strong a constraint, or simply inappropriate.
      Thus, one needs a way to achieve serializability in applications
      where it will be needed, without imposing system-wide restrictions
      that would prevent the design of software subsystems for which
      serializability is not needed.

   Jointly, these problems render the design of fault-tolerant distri-
   buted software daunting in the absence of adequate support.  The
   correctness of any proposed design and of its implementation become
   serious, if not insurmountable, concerns.  In Sec. 7, we will show
   how the primitives of Sec. 6 provide simple ways to overcome all of
   these issues.

5. Existing alternatives
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -