📄 rfc663.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 3 页
字号:
12 3 下一页




Network Working Group                         Rajendra K. Kanodia
Request for Comments #663                        MIT, Project MAC
NIC #31387                                      November 29, 1974



        A LOST MESSAGE DETECTION AND RECOVERY PROTOCOL


1.0 INTRODUCTION


The current  Host-to-Host  protocol  does  not  provide  for  the
following three aspects of network communication:

     1. detection of messages lost in the transmission path
     2. detection of errors in the data
     3. procedures for recovery in the event of lost messages or
        data errors.


In this memo we propose an extension to the Host-to-Host protocol
that  will  allow  detection  of  lost  messages  and  an orderly
recovery from this situation.  If Host-to-Host protocol  were  to
be  amended  to  allow for detection of errors in the data, it is
expected that the recovery procedures proposed here  will  apply.


With  the  present  protocol,  it  may  some times be possible to
detect loss of messages in the transmission path.  However, often
a lost message (especially one on a control link) simply  results
in  an  inconsistent state of a network connection.  One frequent
(and frustrating) symptom of a message loss on a control link has
been the "lost allocate" problem which results in  a  "paralyzed"
connection.   The  NCP (Network Control Program) at the receiving
site  believes  that  sender  has  sufficient  allocation  for  a
connection,  whereas the NCP of the sending host believes that it
has no allocation (due to either loss of or error  in  a  message
that  contained  the  allocate  command).  The result is that the
sending  site  can  not  transmit  any  more  messages  over  the
connection.   This  problem  was  reported in the NWG-RFC #467 by
Burchfiel and Tomlinson.  They also proposed an extension to  the
Host-to-Host  protocol  which allows for resynchronization of the
connection status.  Their proposed solution was opposed by  Edwin
Meyer  (NWG-RFC  #492)  and  Wayne Hathaway (NWG-RFC #512) on the
grounds that it tended to mask  the  basic  problem  of  loss  of
messages  and  they  suggested  that  the  fundamental problem of
message loss should be solved rather than its  symptoms.   As  an
alternative  to  the  solution  proposed  in  NWG-RFC #467, Wayne
Hathaway suggested that Host-to-Host  protocol  header  could  be
extended  to include a "Sequence Control Byte" to allow detection
of lost messages.  At about the same time Jon Postel suggested  a
similar  scheme  using  message numbers (NWG-RFC #516).  A little
later David Walden proposed that four unused bits of the  message
sequence  number  (in  the IMP leader) be utilized for sequencing


                    - 1 -










messages (NWG-RFC #534).  His scheme is similar to those proposed
by Postel and  Hathaway;   however  it  has  the  advantage  that
Host-to-Host protocol mechanisms can be tied into the IMP-to-Host
protocol mechanisms.


The  protocol  extension  proposed here uses the four bits of the
message sequence number in the message leader  for  detection  of
lost  messages.  However, to facilitate recovery, it uses another
eight bit field (presently unused) in the 72 bit  header  of  the
regular  messages.   In the next section of this paper we discuss
some of the basic ideas underlying our protocol.  In  section  3,
we  provide  a  description of the protocol.  It is our intention
that section 3 be a self-contained and  complete  description  of
the protocol.



2.0 BASIC IDEAS


The  purpose  of this section is to provide a gentle introduction
to the central ideas on which this protocol  is  based.   Roughly
speaking,   our   protocol   can  be  divided  into  three  major
components.   First  is  the  mechanism  for  detecting  loss  of
messages.   Second  is  the  exchange  of information between the
sender and the receiver in the event  of  a  message  loss.   For
reasons  that  will soon become obvious, we have termed this area
as "Exchange of Control Messages".  The third  component  of  our
protocol  is  the  method of retransmission of lost messages.  In
this section, we have reversed the order of  discussion  for  the
second  and third components, because the mechanisms for exchange
of  control  messages  depend  heavily  upon  the  retransmission
methods.


A  careful  reader  will find that several minor issues have been
left unresolved in this section.  He (or  she)  should   remember
that this section is not intended to be a complete description of
the  protocol.   Hopefully, we have resolved most of these issues
in the formal description of the protocol provided in the section
3.


2.1 DETECTION OF LOSS OF MESSAGES


The 32 bit Host-to-IMP and IMP-to-Host leaders contain a  12  bit
message-id  in  bit  positions  17 to 28 (BBN Report #1822).  The
Host-to-Host protocol (NIC 8246) uses 8 bits  of  the  message-id
(bit  positions 17 to 24) as a link number.  The remaining 4 bits
of the message-id (bits 25 to 28) are presently unused.  For  the
purposes  of  the  protocol to be presented here, we define these


                    - 2 -










four bits to be  the  message  sequence  number  (MSN  in  short)
associated  with  the link.  Thus message-id consists of an eight
bit link number and a four bit message sequence number.  The four
bit MSN provides a sixteen element sequence number for each link.
A network connection has a sending host (referred to as  "sender"
henceforth),   a   receiving   host   (referred  to  as  receiver
henceforth), and a link on which messages  are  transmitted.   In
our  protocol  the  sender starts communication with the value of
MSN set to one (i.e. the first message on any link has one in its
MSN field.)  For the next message on the same link the  value  of
MSN  is  increased  by one.  When the value of MSN becomes 15 the
next value chosen is one.  This results in the following sequence
1, 2, ...., 13, 14, 15, 1, 2, ...., etc.  The receiver can detect
loss  of  messages  by  examining  this  sequence.    Each   hole
corresponds  to  a  lost  message.   Notice  that  the  detection
mechanism will fail if a sequence of exactly 15 messages were  to
be   lost.   For  the  time  being,  we  shall  assume  that  the
probability of loosing a  sequence  of  exactly  15  messages  is
negligible.   However,  we  shall later provide a status exchange
mechanism (Section 2.6) that can be used to prevent this failure.


Notice that in the sequence described above we have  omitted  the
value  zero.   Following  a  suggestion made by Hathaway (NWG-RFC
#512) and Walden (NWG-RFC #534) the new protocol uses  the  value
zero  to  indicate to the receiving host that the sending host is
not using message sequence numbers.   We,  in  fact,  extend  the
meaning  associated  with  the  MSN  value zero to imply that the
sending host has not implemented the detection and error recovery
protocol being proposed here.



2.2 COMPATIBILITY


The discussion above brings us  to  the  issue  of  compatibility
between  the  present  and  the new protocols.  Let us define the
hosts with the present protocol to be type A and the  hosts  with
the new protocol to be type B.  We have three situations:

     1. Type  A  communicating  with  type  A:   there   is   no
        difference from the present situation.
     2. Type A communicating with type B: from  the  zero  value
        MSNs  in  messages  sent  by the type A host, the type B
        host can detect the fact that the other host is a type A
        host.  Therefore  the  type  B  host  can  simulate  the
        behaviour of a type A host in its communication with the
        other  host,  and  the type A host will not be confused.
        As we will see later  that  this  simulation  is  really
        simple and can be easily applied selectively.
     3. Type B host communicating with type B:  Both  hosts  can
        detect the fact that the other host is a type B host and


                    - 3 -










        use  the  message detection and error recovery protocol.


There is one difficulty here that we have not yet resolved.  When
starting communication how does a type B host  know  whether  the
other  host is type A or type B?  This difficulty can be resolved
by assuming that a type A host will not be confused by a non-zero
MSN in the messages that it receives.   This  assumption  is  not
unreasonable   because  a  type  A  host  can  easily  meet  this
requirement by making a  very  simple  change  to  its  NCP  (the
Network  Control  Program),  if  it does not already satisfy this
requirement.  Another assumption that is crucial to our protocol,
is that the type A hosts always set the  MSN  field  of  messages
(they send out) to zero.  As of this writing, the author believes
that   no  hosts  are  using  the  MSN  field  and  therefore  no
compatibility problem should arise.


2.3 RETRANSMISSION OF MESSAGES


Before getting down to the details of  the  actual  protocol,  we
will  attempt here to explain the essential ideas underlying this
protocol  by  considering  a   somewhat   simplified   situation.
Consider  a  logical  communication  channel  X, which has at its
disposal  an  inexhaustible  supply  of  physical   communication
channels  C(1),  C(2),  C(3),  ........,  etc.  (See footnote #1)
Channel X is  to  be  used  for  transmission  of  messages.   In
addition  to  carrying  the  data, these messages contain (1) the
channel name X, and (2) a Message Sequence Number (MSN).  Let  us
denote  the  sender  on  this channel by S and the receiver by R.
Let us also assume that at the start of the communication, R  and
S  are  synchronized  such that R is prepared to receive messages
for logical channel X  on the physical  channel  C(1)  and  S  is
prepared for sending these messages on C(1).  S starts by pumping
a  sequence  of  messages  M(1),  M(2), M(3), ........, M(n) into
channel C(1).  Since these messages contain sequence  numbers,  R
is  able to detect loss of messages on the channel C(1).  Suppose
now that R discovers that message number K (where K <n) was  lost
in  the  transmission  path.   Let  us further assume that having
_________________________________________________________________

(1) One method of recovery may be to let the  receiver  save  all
properly  received  messages and require the sender to retransmit
only those messages that were lost.   This  method  requires  the
receiver  to have the ability to reassemble the messages to build
the data stream.  A second method of recovery may be to abort and
restart  the  transmission  at  the  error  point.   This  method
requires  that  the receiving host be able to distinguish between
legitimate messages and messages to be ignored.   For  simplicity
we  have  chosen the second method and an inexhaustible supply of
physical  channels  serves  to  provide  the  distinction   among
messages.


                    - 4 -










discovered loss of a message, R can communicate this fact to S by
sending an appropriate control message on another logical channel
that is explicitly reserved for transmission of control  messages
from  R to S.  This channel, named Y, is assumed to be completely
reliable.


We now provide a rather  simplistic  recovery  protocol  for  the
scenario sketched above. Having detected the loss of message M(K)
on channel X, R takes the following series of actions:

      1- R stops reading messages on C(1),
      2- R discards those messages that were received on C1  and
are  placed after M(K) in the logical message sequence,
      3- R prepares itself to read messages M(K), M(K+1), .....,
etc.  on the physical channel C(2),
  and 4- R sends a control message to S on  control  channel  Y,
which  will  inform  S  to the effect that there was an
error on logical channel X while using physical channel
C(1) in message number K.

When S receives this control message on Y, it takes the following
action:

      1- S stops sending messages on C(1),
  and 2- begins  transmission  of  messages  starting  with  the
sequence number K, on the physical channel C(2).

This  resynchronization protocol is executed every time R detects
an error.  If physical channel C(CN) was being used at  the  time
of  the  error,  then the next channel to be used is C(CN+1).  We
can define a "receiver synchronization state"  for the channel X,
as the triplet R(C, CN, MSN), where C is the name of the group of
physical channels, CN is the number of the  physical  channel  in
use, and MSN is the number of message expected. (See footnote #1)
We can specify a message received on a given C-channel as M(MSN).
When R receives the message M(R.MSN) on the channel C(R.CN),  the
synch-state  changes  from  R(C,  CN,  MSN)  to  R(C, CN, MSN+1).
However if M.MSN for the message received is greater  than  R.MSN
then  a  message  has been lost, and R changes the synch-state to
R(C, CN+1,  MSN).   What  really  happens  may  be  described  as
follows:  upon  detection  of  error  in  a logical channel X, we
merely discard the physical channel that was in use at  the  time
of  error, and restart communication on a new physical channel at
the point where break occurred.
_________________________________________________________________

(1) Notice that we have prefixed this triplet  by  the  letter  R
(for  the  receiver.)   We  will  prefix  other similarly defined
quantities by different letters.  For example M can be  used  for
messages.   This  notation  permits  us to write expressions like
M.MSN = R.MSN, where M.MSN stands for the message sequence number
of the message.


                    - 5 -










This scheme provides a reliable transmission path X, even  though
the physical channels involved are unreliable.  In this scheme we
have  assumed  that  (1)  a  completely  reliable  channel  Y  is
available for exchange of control messages, and (2) that there is
a large supply of physical channels  available for use of X.   In
the  paragraphs that follow we shall revise our protocol to use a
single physical channel and  then  apply  this  protocol  to  the
channel Y in such a way that Y would become "self-correcting."


Now  suppose  that channel X has only one physical channel (named
X') available for its use rather than the inexhaustible supply of
physical channels.  Our protocol would still work,  if  we  could
somehow simulate the effect of a large number of C-channels using
the  single  channel X'.  One method of providing this simulation
is to include in each message the name of the C-channel on  which
it  is  being  sent,  and  send  it on X'.  Now the receiver must
examine each message received on X' to determine the C-channel on
which this message was sent.  Our protocol still works except for
one minor difference,  namely,  the  receiver  must  now  discard
messages  corresponding  to C-channels that are no longer in use,
whereas in the previous system the  C-channels  no  longer  being
used  were  simply  discarded.  To be sure, X' can be multiplexed
among only a finite number of C-channels; however, we can provide
a sufficiently large number of C-channels so that during the life
time of the logical channel X, the probability of exhausting  the
supply  of  C-channels would be very low.  And even if we were to
exhaust the supply of C-channels, we could recycle them  just  as
we recycle the message sequence numbers.


A  physical  message received on X' can now be characterized by a
pair of C-channel number and a message sequence number, as  M(CN,
MSN).  The receiver synchronization state becomes a triplet R(X',
CN,  MSN).   This  state  tells  us  that R is ready to receive a
message for X on the physical channel X'  and  for  this  message
M.CN  should be equal to R.CN and M.MSN should be equal to R.MSN.
All messages with M.CN less than R.CN will be  ignored.   If  for
the  next  message received on X', M.CN = R.CN and M.MSN = R.MSN,
then R changes the synch state to R(X', CN, MSN+1).   If  M.CN  =
R.CN  but  M.MSN  >  R.MSN  then a message has been lost and  the
synch-state R(X', CN, MSN) changes to R(X', CN+1,  MSN).   Notice
that  we  have  not  yet said anything about the situation M.CN >
R.CN.  We will later describe a scheme for  using  this  case  to
provide for error correction on the control channel itself.



2.4 EXCHANGE OF CONTROL INFORMATION


So  far  we  have  discussed  two  schemes  for the detection and
retransmission aspects of  the  lost-message  problem.   In  this


                    - 6 -










section, we discuss methods by which the receiver communicates to
the sender the fact of loss of messages.


We continue with the scenario developed in the above section with
a small change.  For the purposes of the discussion that is about
to  follow  we  shall  assume that there are actually two perfect
channels available for exchange of control messages.  One channel
from S to R named S->R, and the other from R  to  S  named  R->S.
The  purpose  of S->R will become clear in a moment.  In order to
let R communicate the fact of loss of messages to S, We provide a
control message called L__o_s_t__M_e_s_s_a_g_e__f_r_o_m__R_e_c_e_i_v_e_r (LMR) which  is
of  the  following  form: LMR(X, CN, MSN), where X is the name of
the channel, CN is the new  C-channel  number,  and  MSN  is  the
message  sequence  number  of the lost message.  If more than one
message has been lost, then R uses the MSN of the  first  message
only.  When S receives this message, it can restart communication
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -