📄 rfc992.txt
字号:
at all clear how to use ethernet style broadcast mechanisms to optim-
ize the performance of this sort of protocol, although it should be
possible. We view this as an interesting area for research.
A forthcoming paper will describe higher level software that we are
building on top of the basic fault-tolerant process group mechanism
described above.
9. Conclusions
The experience of implementing a substantial fault-tolerant system
left us with insights into the properties to be desired from a com-
munication subsystem. In particular, we became convinced that to
build a reliable distributed system, one must start with a reliable
communication subsystem. The multicast primitives described in this
memo present a simple interface, achieve a high level of concurrency,
can be used in both local and wide area networks, and are applicable
to software ranging from distributed database systems to the fault-
tolerant objects and bulletin boards provided by ISIS. Because they
are integrated with failure handling mechanisms and respect desired
event orderings, they introduce a desirable form of determinism into
Birman & Joseph [Page 14]
RFC 992 November 1986
distributed computation without compromising efficiency. A conse-
quence is that high-level algorithms are greatly simplified, reducing
the probability of error. We believe that this is a very promising
and practical approach to building large fault-tolerant distributed
systems, and it is the only one we know of that leads to a rigorous
form of confidence in the resulting software.
NOTES:
<1> A problem arises if a process p fails without receiving some mes-
sage after that message has already been delivered to some other pro-
cess q: q's VIEW when it received the message would show p to be
operational; hence, q will assume that p received the message,
although p is physically incapable of doing so. However, the state
of the system is now equivalent to one in which p did receive the
message, but failed before acting on it. In effect, there exists an
interpretation of the actual system state that is consistent with q's
assumption. Thus, GBCAST satisfies the sort of logical delivery pro-
perty cited in the introduction.
Birman & Joseph [Page 15]
RFC 992 November 1986
10. References
[RFC966] Deering, S. and Cheriton, D. Host groups: A multicast exten-
sion to the internet protocol. Stanford University, December
1985.
[RFC988] Deering, S. Host extensions for IP multicasting. Stanford
University, July 1986.
[Allchin] Allchin, J., McKendry, M. Synchronization and recovery of
actions. Proc. 2nd ACM SIGACT/SIGOPS Principles of Distributed
Computing, Montreal, Canada, 1983.
[Babaoglu] Babaoglu, O., Drummond, R. The streets of Byzantium: Network
architectures for fast reliable multicast. IEEE Trans. on
Software Engineering TSE-11, 6 (June 1985).
[Bernstein] Bernstein, P., Goodman, N. Concurrency control algorithms
for replicated database systems. ACM Computing Surveys 13, 2
(June 1981), 185-222.
[Birman-a] Birman, K. Replication and fault-tolerance in the ISIS sys-
tem. Proc. 10th ACM SIGOPS Symposium on Operating Systems Princi-
ples. Orcas Island, Washington, Dec. 1985, 79-86.
[Birman-b] Birman, K., Joseph, T. Reliable communication in the pres-
ence of failures. Dept. of Computer Science, Cornell Univ., TR
85-694, Aug. 1985. To appear in ACM TOCS (Feb. 1987).
[Birman-c] Birman, K., Joseph, T., Stephenson, P. Programming with
fault tolerant bulletin boards in asynchronous distributed sys-
tems. Dept. of Computer Science, Cornell Univ., TR 85-788, Aug.
1986.
[Birrell] Birrell, A., Nelson, B. Implementing remote procedure calls.
ACM Transactions on Computer Systems 2, 1 (Feb. 1984), 39-59.
[Chang] Chang, J., Maxemchuck, M. Reliable multicast protocols. ACM
TOCS 2, 3 (Aug. 1984), 251-273.
[Cheriton] Cheriton, D. The V Kernel: A software base for distributed
systems. IEEE Software 1 12, (1984), 19-43.
[Cooper] Cooper, E. Replicated procedure call. Proc. 3rd ACM Symposium
on Principles of Distributed Computing., August 1984, 220-232.
(May 1985).
[Cristian] Cristian, F. et al Atomic multicast: From simple diffusion to
Byzantine agreement. IBM Technical Report RJ 4540 (48668), Oct.
1984.
Birman & Joseph [Page 16]
RFC 992 November 1986
[Eswaren] Eswaren, K.P., et al The notion of consistency and predicate
locks in a database system. Comm. ACM 19, 11 (Nov. 1976), 624-
633.
[Hadzilacos] Hadzilacos, V. Byzantine agreement under restricted types
of failures (not telling the truth is different from telling of
lies). Tech. ARep. TR-19-83, Aiken Comp. Lab., Harvard University
(June 1983).
[Halpern] Halpern, J., and Moses, Y. Knowledge and common knowledge in
a distributed environment. Tech. Report RJ-4421, IBM San Jose
Research Laboratory, 1984.
[Joseph-a] Joseph, T. Low cost management of replicated data. Ph.D.
dissertation, Dept. of Computer Science, Cornell Univ., Ithaca
(Dec. 1985).
[Joseph-b] Joseph, T., Birman, K. Low cost management of replicated
data in fault-tolerant distributed systems. ACM TOCS 4, 1 (Feb
1986), 54-70.
[Lamport] Lamport, L. Time, clocks, and the ordering of events in a
distributed system. CACM 21, 7, July 1978, 558-565.
[Lazowska] Lazowska, E. et al The architecture of the EDEN system.
Proc. 8th Symposium on Operating Systems Principles, Dec. 1981,
148-159.
[Liskov] Liskov, B., Scheifler, R. Guardians and actions: Linguistic
support for robust, distributed programs. ACM TOPLAS 5, 3 (July
1983), 381-404.
[Moss] Moss, E. Nested transactions: An approach to reliable, distri-
buted computing. Ph.D. thesis, MIT Dept of EECS, TR 260, April
1981.
[Papadimitrou] Papadimitrou, C. The serializability of concurrent data-
base updates. JACM 26, 4 (Oct. 1979), 631-653.
[Popek] Popek, G. et al. Locus: A network transparent, high reliability
distributed system. Proc. 8th Symposium on Operating Systems
Principles, Dec. 1981, 169-177.
[Schlicting] Schlicting, R, Schneider, F. Fail-stop processors: An
approach to designing fault-tolerant distributed computing sys-
tems. ACM TOCS 1, 3, August 1983, 222-238.
[Schneider] Schneider, F., Gries, D., Schlicting, R. Reliable multicast
protocols. Science of computer programming 3, 2 (March 1984).
[Skeen-a] Skeen, D. Determining the last process to fail. ACM TOCS 3,
Birman & Joseph [Page 17]
RFC 992 November 1986
1, Feb. 1985, 15-30.
[Skeen-b] Skeen, D. A reliable multicast protocol. Unpublished.
[Spector] Spector, A., et al Distributed transactions for reliable sys-
tems. Proc. 10th ACM SIGOPS Symposium on Operating Systems Prin-
ciples, Dec. 1985, 127-146.
[Strong] Strong, H.R., Dolev, D. Byzantine agreement. Digest of papers,
Spring Compcon 83, San Francisco, CA, March 1983, 77-81.
Birman & Joseph [Page 18]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -