📄 rfc672.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
Network Working Group                               Richard Schantz (BBN-TENEX)
Request for Comments: 672                                              Dec 1974
NIC #31440



                     A Multi-Site Data Collection Facility




        Preface:

        This RFC reproduces most of a working document
        prepared during the design and implementation of the
        protocols for the TIP-TENEX integrated system for
        handling TIP accounting. Bernie Cosell (BBN-TIP)
        and Bob Thomas (BBN-TENEX) have contributed to
        various aspects of this work. The system has been
        partially operational for about a month on selected
        hosts. We feel that the techniques described here
        have wide applicability beyond TIP accounting.


Section I

Protocols for a Multi-site Data Collection Facility


Introduction


     The development of computer networks has provided the
groundwork for distributed computation: one in which a job or task
is comprised of components from various computer systems. In a
single computer system, the unavailability or malfunction of any of
the job components (e.g. program, file, device, etc.) usually
necessitates job termination. With computer networks, it becomes
feasible to duplicate certain job components which previously had no
basis for duplication. (In a single system, it does not matter how
many times a process that performs a certain function is duplicated;
a system crash makes all unavailable). It is such resource
duplication that enables us to utilize the network to achieve high
reliability and load leveling. In order to realize the potential of
resource duplication, it is necessary to have protocols which
provide for the orderly use of these resources. In this document,
we first discuss in general terms a problem of protocol definition
for interacting with a multiply defined resource (server). The
problem deals with providing a highly reliable data collection
facility, by supporting it at many sites throughout the network. In
the second section of this document, we describe in detail a
particular implementation of the protocol which handles the problem
of utilizing multiple data collector processes for collecting
accounting data generated by the network TIPs. This example also
illustrates the specialization of hosts to perform parts of a
computation they are best equipped to handle. The large network
hosts (TENEX systems) perform the accounting function for the small
network access TiPs.

     The situation to be discussed is the following: a data
generating process needs to use a data collection service which is
duplicately provided by processes on a number of network machines.
A request to a server involves sending the data to be collected.


An Initial Approach


     The data generator could proceed by selecting a particular
server and sending its request to that server. It might also take
the attitude that if the message reaches the destination host (the
communication subsystem will indicate this) the message will be
properly processed to completion. Failure of the request Message
would then lead to selecting another server, until the request
succeeds or all servers have been tried.



                                      -2-


     Such a simple strategy is a poor one. It makes sense to
require that the servicing process send a positive acknowledgement
to the requesting process. If nothing else, the reply indicates
that the server process itself is still functioning. Waiting for
such a reply also implies that there is a strategy for selecting
another server if the reply is not forthcoming. Herein lies a
problem. If the expected reply is timed out, and then a new request
is sent to another server, we run the risk of receiving the
(delayed) original acknowledgement at a later time. This could
result in having the data entered into the collection system twice
(data duplication). If the request is re-transmitted to the same
server only, we face the possibility of not being able to access a
collector (data loss). In addition, for load leveling purposes, we
may wish to send new requests to some (or all) servers. We can then
use their reply (or lack of reply) as an indicator of load on that
particular instance of the service. Doing this without data
duplication requires more than a simple request and acknowledgement
protocol*.


Extension of the Protocol


     The general protocol developed to handle multiple collection
servers involves having the data generator send the data request to
some (or all) data collectors. Those willing to handle the request
reply with an "I've got it" message. They then await further
notification before finalizing the processing of the data. The data
generator sends a "go ahead" message to one of the replying
collectors, and a "discard" message to all other replying
collectors. The "go ahead" message is the signal to process the
data (i.e. collect permanently), while the "discard" message
indicates that the data is being collected elsewhere and should not
be retained.

     The question now arises as to whether or not the collector
process should acknowledge receipt of the "go ahead" message with a
reply of its own, and then should the generator process acknowledge
this acknowledgement, etc. We would like to send as few messages as
possible to achieve reliable communication. Therefore, when a state
--------------------

* If the servers are independent of each other to the extent that if
two or more servers all act on the same request, the end result is
the same as having a single server act on the request, then a simple
request/acknowledgement protocol is adequate. Such may be the case,
for example, if we subject the totality of collected data (i.e. all
data collected by all collectors for a certain period) to a
duplicate detection scan. If we could store enough context in each
entry to be able to determine duplicates, then having two or more
servers act on the data would be functionally equivalent to
processing by a single server.


                                      -3-


is reached for which further acknowledgements lead to a previously
visited state, or when the cost of further acknowledgements outweigh
the increase in reliability they bring, further acknowledgements
become unnecessary.

     The initial question was should the collector process
acknowledge the "go ahead" message? Assume for the moment that it
should not send such an acknowledgement. The data generator could
verify, through the communication subsystem, the transmission of the
"go ahead" message to the host of the collector. If this message
did not arrive correctly, the generator has the option of
re-transmitting it or sending a "go ahead" to another collector
which has acknowledged receipt of the data. Either strategy
involves no risk of duplication. If the "go ahead" message arrives
correctly, and a collector acknowledgement to the "go ahead" message
is not required, then we incur a vulnerability to (collector host)
system crash from the time the "go ahead" message is accepted by the
host until the time the data is totally processed. Call the data
processing time P. Once the data generator has selected a
particular collector (on the basis of receiving its "I've got it"
message), we also incur a vulnerability to malfunction of this
collector process. The vulnerable period is from the time the
collector sends its "i've got it" message until the time the data is
processed. This amounts to two network transit times (2N) plus IMP
and host overhead for message delivery (0) plus data processing time
(P). [Total time=2N+P+O]. A malfunction (crash) in this period can
cause the loss of data. There is no potential for duplication.

     Now, assume that the data collector process must acknowledge
the "go ahead" message. The question then arises as to when such an
acknowledgement should be sent. The reasonable choices are either
immediately before final processing of the data (i.c. before the
data is permanently recorded) or immediately after final processing.
It can be argued that unless another acknowledgement is required (by
the generator to the collector) to this acknowledgement BEFORE the
actual data update, then the best time for the collector to
acknowledge the "go ahead" is after final processing. This is so
because receiving the acknowledgement conveys more information if it
is sent after processing, while not receiving it (timeout), in
either case, leaves us in an unknown state with respect to the data
update. Depending on the relative speeds of various network and
system components, the data may or may not be permanently entered.
Therefore if we interpret the timeout as a signal to have the data
processed at another site, we run the risk of duplication of data.
To avoid data duplication, the timeout strategy must only involve
re-sending the "go ahead" message to the same collector. This will
only help if the lack of reply is due to a lost network message.
Our vulnerability intervals to system and process malfunction remain
as before.

     It is our conjecture (to be analyzed further) that any further
acknowledgements to these acknowledgements will have virtually no
effect on reducing the period of vulnerability outlined above. As
such, the protocol with the fewest messages required is superior.


                                      -4-

Data Dependent Aspects of the Protocol


     As discussed above, a main issue is which process should be the
last to respond (send an acknowledgement). If the data generator
sends the last message (i.e. "go ahead"), we can only check on its
correct arrival at the destination host. We must "take on faith"
the ability of the collector to correctly complete the transaction.
This strategy is geared toward avoiding data duplication. If on the
other hand, the protocol specifies that the collector is to send the
last message, with the timeout of such a message causing the data
generator to use another collector, then the protocol is geared
toward the best efforts of recording the data somewhere, at the
expense of possible duplication.

     Thus, the nature of the problem will dictate which of the
protocols is appropriate for a given situation. The next section
deals in the specifics of an implement;tion of a data collection
protocol to handle the problem of collecting TIP accounting data by
using the TENEX systems for running the collection server processes.
It is shown how the general protocol is optimized for the accounting
data collection.




Section II

Protocol for TIP-TENEX Accounting Server Information Exchange


Overview of the Facility


     When a user initially requests service from a TIP, the TIP will
perform a broadcast ICP to find an available RSEXEC which maintains
an authentication data base. The user must then complete s login
sequence in order to authenticate himself. If he is successful the
RSEXEC will transmit his unique ID code to the TIP. Failure will
cause the RSEXEC to close the connection and the TIP to hang up on
the user. After the user is authenticated, the TIP will accumulate
accounting data for the user session. The data includes a count of
messages sent on behalf of the user, and the connect time for the
user. From time to time the TIP will transmit intermediate
accounting data to Accounting Server (ACTSER) processes scattered
throughout the network. These accounting servers will maintain
files containing intermediate raw accounting data. The raw
accounting data will periodically be collected and sorted to produce
an accounting data base. Providing a number of accounting servers
reduces the possibility of being unable to find a repository for the
intermediate data, which otherwise would be lost due to buffering
limitations in the TiPs. The multitude of accounting servers can
also serve to reduce the load on the individual hosts providing this
facility.


                                      -5-

The rest of this document details the protocol that has been
developed to ensure delivery of TIP accounting data to one of the
available accounting servers for storage in the intermediate
accounting files.


Adapting the Protocol


The TIP to Accounting Server data exchange uses a protocol that
allows the TIP to select for data transmission one, some, or all
server hosts either sequentially or in parallel, yet insures that
the data that becomes part of the accounting file does not contain
duplicate information. The protocol also minimizes the amount of
data buffering that must be done by the limited capacity TiPs. The
protocol is applicable to a wide class of data collection problems
which use a number of data generators and collectors. The following
describes how the protocol works for TIP accounting.
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -