📄 rfc663.txt
字号:
Network Working Group Rajendra K. Kanodia
Request for Comments #663 MIT, Project MAC
NIC #31387 November 29, 1974
A LOST MESSAGE DETECTION AND RECOVERY PROTOCOL
1.0 INTRODUCTION
The current Host-to-Host protocol does not provide for the
following three aspects of network communication:
1. detection of messages lost in the transmission path
2. detection of errors in the data
3. procedures for recovery in the event of lost messages or
data errors.
In this memo we propose an extension to the Host-to-Host protocol
that will allow detection of lost messages and an orderly
recovery from this situation. If Host-to-Host protocol were to
be amended to allow for detection of errors in the data, it is
expected that the recovery procedures proposed here will apply.
With the present protocol, it may some times be possible to
detect loss of messages in the transmission path. However, often
a lost message (especially one on a control link) simply results
in an inconsistent state of a network connection. One frequent
(and frustrating) symptom of a message loss on a control link has
been the "lost allocate" problem which results in a "paralyzed"
connection. The NCP (Network Control Program) at the receiving
site believes that sender has sufficient allocation for a
connection, whereas the NCP of the sending host believes that it
has no allocation (due to either loss of or error in a message
that contained the allocate command). The result is that the
sending site can not transmit any more messages over the
connection. This problem was reported in the NWG-RFC #467 by
Burchfiel and Tomlinson. They also proposed an extension to the
Host-to-Host protocol which allows for resynchronization of the
connection status. Their proposed solution was opposed by Edwin
Meyer (NWG-RFC #492) and Wayne Hathaway (NWG-RFC #512) on the
grounds that it tended to mask the basic problem of loss of
messages and they suggested that the fundamental problem of
message loss should be solved rather than its symptoms. As an
alternative to the solution proposed in NWG-RFC #467, Wayne
Hathaway suggested that Host-to-Host protocol header could be
extended to include a "Sequence Control Byte" to allow detection
of lost messages. At about the same time Jon Postel suggested a
similar scheme using message numbers (NWG-RFC #516). A little
later David Walden proposed that four unused bits of the message
sequence number (in the IMP leader) be utilized for sequencing
- 1 -
messages (NWG-RFC #534). His scheme is similar to those proposed
by Postel and Hathaway; however it has the advantage that
Host-to-Host protocol mechanisms can be tied into the IMP-to-Host
protocol mechanisms.
The protocol extension proposed here uses the four bits of the
message sequence number in the message leader for detection of
lost messages. However, to facilitate recovery, it uses another
eight bit field (presently unused) in the 72 bit header of the
regular messages. In the next section of this paper we discuss
some of the basic ideas underlying our protocol. In section 3,
we provide a description of the protocol. It is our intention
that section 3 be a self-contained and complete description of
the protocol.
2.0 BASIC IDEAS
The purpose of this section is to provide a gentle introduction
to the central ideas on which this protocol is based. Roughly
speaking, our protocol can be divided into three major
components. First is the mechanism for detecting loss of
messages. Second is the exchange of information between the
sender and the receiver in the event of a message loss. For
reasons that will soon become obvious, we have termed this area
as "Exchange of Control Messages". The third component of our
protocol is the method of retransmission of lost messages. In
this section, we have reversed the order of discussion for the
second and third components, because the mechanisms for exchange
of control messages depend heavily upon the retransmission
methods.
A careful reader will find that several minor issues have been
left unresolved in this section. He (or she) should remember
that this section is not intended to be a complete description of
the protocol. Hopefully, we have resolved most of these issues
in the formal description of the protocol provided in the section
3.
2.1 DETECTION OF LOSS OF MESSAGES
The 32 bit Host-to-IMP and IMP-to-Host leaders contain a 12 bit
message-id in bit positions 17 to 28 (BBN Report #1822). The
Host-to-Host protocol (NIC 8246) uses 8 bits of the message-id
(bit positions 17 to 24) as a link number. The remaining 4 bits
of the message-id (bits 25 to 28) are presently unused. For the
purposes of the protocol to be presented here, we define these
- 2 -
four bits to be the message sequence number (MSN in short)
associated with the link. Thus message-id consists of an eight
bit link number and a four bit message sequence number. The four
bit MSN provides a sixteen element sequence number for each link.
A network connection has a sending host (referred to as "sender"
henceforth), a receiving host (referred to as receiver
henceforth), and a link on which messages are transmitted. In
our protocol the sender starts communication with the value of
MSN set to one (i.e. the first message on any link has one in its
MSN field.) For the next message on the same link the value of
MSN is increased by one. When the value of MSN becomes 15 the
next value chosen is one. This results in the following sequence
1, 2, ...., 13, 14, 15, 1, 2, ...., etc. The receiver can detect
loss of messages by examining this sequence. Each hole
corresponds to a lost message. Notice that the detection
mechanism will fail if a sequence of exactly 15 messages were to
be lost. For the time being, we shall assume that the
probability of loosing a sequence of exactly 15 messages is
negligible. However, we shall later provide a status exchange
mechanism (Section 2.6) that can be used to prevent this failure.
Notice that in the sequence described above we have omitted the
value zero. Following a suggestion made by Hathaway (NWG-RFC
#512) and Walden (NWG-RFC #534) the new protocol uses the value
zero to indicate to the receiving host that the sending host is
not using message sequence numbers. We, in fact, extend the
meaning associated with the MSN value zero to imply that the
sending host has not implemented the detection and error recovery
protocol being proposed here.
2.2 COMPATIBILITY
The discussion above brings us to the issue of compatibility
between the present and the new protocols. Let us define the
hosts with the present protocol to be type A and the hosts with
the new protocol to be type B. We have three situations:
1. Type A communicating with type A: there is no
difference from the present situation.
2. Type A communicating with type B: from the zero value
MSNs in messages sent by the type A host, the type B
host can detect the fact that the other host is a type A
host. Therefore the type B host can simulate the
behaviour of a type A host in its communication with the
other host, and the type A host will not be confused.
As we will see later that this simulation is really
simple and can be easily applied selectively.
3. Type B host communicating with type B: Both hosts can
detect the fact that the other host is a type B host and
- 3 -
use the message detection and error recovery protocol.
There is one difficulty here that we have not yet resolved. When
starting communication how does a type B host know whether the
other host is type A or type B? This difficulty can be resolved
by assuming that a type A host will not be confused by a non-zero
MSN in the messages that it receives. This assumption is not
unreasonable because a type A host can easily meet this
requirement by making a very simple change to its NCP (the
Network Control Program), if it does not already satisfy this
requirement. Another assumption that is crucial to our protocol,
is that the type A hosts always set the MSN field of messages
(they send out) to zero. As of this writing, the author believes
that no hosts are using the MSN field and therefore no
compatibility problem should arise.
2.3 RETRANSMISSION OF MESSAGES
Before getting down to the details of the actual protocol, we
will attempt here to explain the essential ideas underlying this
protocol by considering a somewhat simplified situation.
Consider a logical communication channel X, which has at its
disposal an inexhaustible supply of physical communication
channels C(1), C(2), C(3), ........, etc. (See footnote #1)
Channel X is to be used for transmission of messages. In
addition to carrying the data, these messages contain (1) the
channel name X, and (2) a Message Sequence Number (MSN). Let us
denote the sender on this channel by S and the receiver by R.
Let us also assume that at the start of the communication, R and
S are synchronized such that R is prepared to receive messages
for logical channel X on the physical channel C(1) and S is
prepared for sending these messages on C(1). S starts by pumping
a sequence of messages M(1), M(2), M(3), ........, M(n) into
channel C(1). Since these messages contain sequence numbers, R
is able to detect loss of messages on the channel C(1). Suppose
now that R discovers that message number K (where K <n) was lost
in the transmission path. Let us further assume that having
_________________________________________________________________
(1) One method of recovery may be to let the receiver save all
properly received messages and require the sender to retransmit
only those messages that were lost. This method requires the
receiver to have the ability to reassemble the messages to build
the data stream. A second method of recovery may be to abort and
restart the transmission at the error point. This method
requires that the receiving host be able to distinguish between
legitimate messages and messages to be ignored. For simplicity
we have chosen the second method and an inexhaustible supply of
physical channels serves to provide the distinction among
messages.
- 4 -
discovered loss of a message, R can communicate this fact to S by
sending an appropriate control message on another logical channel
that is explicitly reserved for transmission of control messages
from R to S. This channel, named Y, is assumed to be completely
reliable.
We now provide a rather simplistic recovery protocol for the
scenario sketched above. Having detected the loss of message M(K)
on channel X, R takes the following series of actions:
1- R stops reading messages on C(1),
2- R discards those messages that were received on C1 and
are placed after M(K) in the logical message sequence,
3- R prepares itself to read messages M(K), M(K+1), .....,
etc. on the physical channel C(2),
and 4- R sends a control message to S on control channel Y,
which will inform S to the effect that there was an
error on logical channel X while using physical channel
C(1) in message number K.
When S receives this control message on Y, it takes the following
action:
1- S stops sending messages on C(1),
and 2- begins transmission of messages starting with the
sequence number K, on the physical channel C(2).
This resynchronization protocol is executed every time R detects
an error. If physical channel C(CN) was being used at the time
of the error, then the next channel to be used is C(CN+1). We
can define a "receiver synchronization state" for the channel X,
as the triplet R(C, CN, MSN), where C is the name of the group of
physical channels, CN is the number of the physical channel in
use, and MSN is the number of message expected. (See footnote #1)
We can specify a message received on a given C-channel as M(MSN).
When R receives the message M(R.MSN) on the channel C(R.CN), the
synch-state changes from R(C, CN, MSN) to R(C, CN, MSN+1).
However if M.MSN for the message received is greater than R.MSN
then a message has been lost, and R changes the synch-state to
R(C, CN+1, MSN). What really happens may be described as
follows: upon detection of error in a logical channel X, we
merely discard the physical channel that was in use at the time
of error, and restart communication on a new physical channel at
the point where break occurred.
_________________________________________________________________
(1) Notice that we have prefixed this triplet by the letter R
(for the receiver.) We will prefix other similarly defined
quantities by different letters. For example M can be used for
messages. This notation permits us to write expressions like
M.MSN = R.MSN, where M.MSN stands for the message sequence number
of the message.
- 5 -
This scheme provides a reliable transmission path X, even though
the physical channels involved are unreliable. In this scheme we
have assumed that (1) a completely reliable channel Y is
available for exchange of control messages, and (2) that there is
a large supply of physical channels available for use of X. In
the paragraphs that follow we shall revise our protocol to use a
single physical channel and then apply this protocol to the
channel Y in such a way that Y would become "self-correcting."
Now suppose that channel X has only one physical channel (named
X') available for its use rather than the inexhaustible supply of
physical channels. Our protocol would still work, if we could
somehow simulate the effect of a large number of C-channels using
the single channel X'. One method of providing this simulation
is to include in each message the name of the C-channel on which
it is being sent, and send it on X'. Now the receiver must
examine each message received on X' to determine the C-channel on
which this message was sent. Our protocol still works except for
one minor difference, namely, the receiver must now discard
messages corresponding to C-channels that are no longer in use,
whereas in the previous system the C-channels no longer being
used were simply discarded. To be sure, X' can be multiplexed
among only a finite number of C-channels; however, we can provide
a sufficiently large number of C-channels so that during the life
time of the logical channel X, the probability of exhausting the
supply of C-channels would be very low. And even if we were to
exhaust the supply of C-channels, we could recycle them just as
we recycle the message sequence numbers.
A physical message received on X' can now be characterized by a
pair of C-channel number and a message sequence number, as M(CN,
MSN). The receiver synchronization state becomes a triplet R(X',
CN, MSN). This state tells us that R is ready to receive a
message for X on the physical channel X' and for this message
M.CN should be equal to R.CN and M.MSN should be equal to R.MSN.
All messages with M.CN less than R.CN will be ignored. If for
the next message received on X', M.CN = R.CN and M.MSN = R.MSN,
then R changes the synch state to R(X', CN, MSN+1). If M.CN =
R.CN but M.MSN > R.MSN then a message has been lost and the
synch-state R(X', CN, MSN) changes to R(X', CN+1, MSN). Notice
that we have not yet said anything about the situation M.CN >
R.CN. We will later describe a scheme for using this case to
provide for error correction on the control channel itself.
2.4 EXCHANGE OF CONTROL INFORMATION
So far we have discussed two schemes for the detection and
retransmission aspects of the lost-message problem. In this
- 6 -
section, we discuss methods by which the receiver communicates to
the sender the fact of loss of messages.
We continue with the scenario developed in the above section with
a small change. For the purposes of the discussion that is about
to follow we shall assume that there are actually two perfect
channels available for exchange of control messages. One channel
from S to R named S->R, and the other from R to S named R->S.
The purpose of S->R will become clear in a moment. In order to
let R communicate the fact of loss of messages to S, We provide a
control message called L__o_s_t__M_e_s_s_a_g_e__f_r_o_m__R_e_c_e_i_v_e_r (LMR) which is
of the following form: LMR(X, CN, MSN), where X is the name of
the channel, CN is the new C-channel number, and MSN is the
message sequence number of the lost message. If more than one
message has been lost, then R uses the MSN of the first message
only. When S receives this message, it can restart communication
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -