📄 rfc663.txt
字号:
Network Working Group Rajendra K. KanodiaRequest for Comments #663 MIT, Project MACNIC #31387 November 29, 1974 A LOST MESSAGE DETECTION AND RECOVERY PROTOCOL1.0 INTRODUCTIONThe current Host-to-Host protocol does not provide for thefollowing three aspects of network communication: 1. detection of messages lost in the transmission path 2. detection of errors in the data 3. procedures for recovery in the event of lost messages or data errors.In this memo we propose an extension to the Host-to-Host protocolthat will allow detection of lost messages and an orderlyrecovery from this situation. If Host-to-Host protocol were tobe amended to allow for detection of errors in the data, it isexpected that the recovery procedures proposed here will apply.With the present protocol, it may some times be possible todetect loss of messages in the transmission path. However, oftena lost message (especially one on a control link) simply resultsin an inconsistent state of a network connection. One frequent(and frustrating) symptom of a message loss on a control link hasbeen the "lost allocate" problem which results in a "paralyzed"connection. The NCP (Network Control Program) at the receivingsite believes that sender has sufficient allocation for aconnection, whereas the NCP of the sending host believes that ithas no allocation (due to either loss of or error in a messagethat contained the allocate command). The result is that thesending site can not transmit any more messages over theconnection. This problem was reported in the NWG-RFC #467 byBurchfiel and Tomlinson. They also proposed an extension to theHost-to-Host protocol which allows for resynchronization of theconnection status. Their proposed solution was opposed by EdwinMeyer (NWG-RFC #492) and Wayne Hathaway (NWG-RFC #512) on thegrounds that it tended to mask the basic problem of loss ofmessages and they suggested that the fundamental problem ofmessage loss should be solved rather than its symptoms. As analternative to the solution proposed in NWG-RFC #467, WayneHathaway suggested that Host-to-Host protocol header could beextended to include a "Sequence Control Byte" to allow detectionof lost messages. At about the same time Jon Postel suggested asimilar scheme using message numbers (NWG-RFC #516). A littlelater David Walden proposed that four unused bits of the messagesequence number (in the IMP leader) be utilized for sequencing - 1 -messages (NWG-RFC #534). His scheme is similar to those proposedby Postel and Hathaway; however it has the advantage thatHost-to-Host protocol mechanisms can be tied into the IMP-to-Hostprotocol mechanisms.The protocol extension proposed here uses the four bits of themessage sequence number in the message leader for detection oflost messages. However, to facilitate recovery, it uses anothereight bit field (presently unused) in the 72 bit header of theregular messages. In the next section of this paper we discusssome of the basic ideas underlying our protocol. In section 3,we provide a description of the protocol. It is our intentionthat section 3 be a self-contained and complete description ofthe protocol.2.0 BASIC IDEASThe purpose of this section is to provide a gentle introductionto the central ideas on which this protocol is based. Roughlyspeaking, our protocol can be divided into three majorcomponents. First is the mechanism for detecting loss ofmessages. Second is the exchange of information between thesender and the receiver in the event of a message loss. Forreasons that will soon become obvious, we have termed this areaas "Exchange of Control Messages". The third component of ourprotocol is the method of retransmission of lost messages. Inthis section, we have reversed the order of discussion for thesecond and third components, because the mechanisms for exchangeof control messages depend heavily upon the retransmissionmethods.A careful reader will find that several minor issues have beenleft unresolved in this section. He (or she) should rememberthat this section is not intended to be a complete description ofthe protocol. Hopefully, we have resolved most of these issuesin the formal description of the protocol provided in the section3.2.1 DETECTION OF LOSS OF MESSAGESThe 32 bit Host-to-IMP and IMP-to-Host leaders contain a 12 bitmessage-id in bit positions 17 to 28 (BBN Report #1822). TheHost-to-Host protocol (NIC 8246) uses 8 bits of the message-id(bit positions 17 to 24) as a link number. The remaining 4 bitsof the message-id (bits 25 to 28) are presently unused. For thepurposes of the protocol to be presented here, we define these - 2 -four bits to be the message sequence number (MSN in short)associated with the link. Thus message-id consists of an eightbit link number and a four bit message sequence number. The fourbit MSN provides a sixteen element sequence number for each link.A network connection has a sending host (referred to as "sender"henceforth), a receiving host (referred to as receiverhenceforth), and a link on which messages are transmitted. Inour protocol the sender starts communication with the value ofMSN set to one (i.e. the first message on any link has one in itsMSN field.) For the next message on the same link the value ofMSN is increased by one. When the value of MSN becomes 15 thenext value chosen is one. This results in the following sequence1, 2, ...., 13, 14, 15, 1, 2, ...., etc. The receiver can detectloss of messages by examining this sequence. Each holecorresponds to a lost message. Notice that the detectionmechanism will fail if a sequence of exactly 15 messages were tobe lost. For the time being, we shall assume that theprobability of loosing a sequence of exactly 15 messages isnegligible. However, we shall later provide a status exchangemechanism (Section 2.6) that can be used to prevent this failure.Notice that in the sequence described above we have omitted thevalue zero. Following a suggestion made by Hathaway (NWG-RFC#512) and Walden (NWG-RFC #534) the new protocol uses the valuezero to indicate to the receiving host that the sending host isnot using message sequence numbers. We, in fact, extend themeaning associated with the MSN value zero to imply that thesending host has not implemented the detection and error recoveryprotocol being proposed here.2.2 COMPATIBILITYThe discussion above brings us to the issue of compatibilitybetween the present and the new protocols. Let us define thehosts with the present protocol to be type A and the hosts withthe new protocol to be type B. We have three situations: 1. Type A communicating with type A: there is no difference from the present situation. 2. Type A communicating with type B: from the zero value MSNs in messages sent by the type A host, the type B host can detect the fact that the other host is a type A host. Therefore the type B host can simulate the behaviour of a type A host in its communication with the other host, and the type A host will not be confused. As we will see later that this simulation is really simple and can be easily applied selectively. 3. Type B host communicating with type B: Both hosts can detect the fact that the other host is a type B host and - 3 - use the message detection and error recovery protocol.There is one difficulty here that we have not yet resolved. Whenstarting communication how does a type B host know whether theother host is type A or type B? This difficulty can be resolvedby assuming that a type A host will not be confused by a non-zeroMSN in the messages that it receives. This assumption is notunreasonable because a type A host can easily meet thisrequirement by making a very simple change to its NCP (theNetwork Control Program), if it does not already satisfy thisrequirement. Another assumption that is crucial to our protocol,is that the type A hosts always set the MSN field of messages(they send out) to zero. As of this writing, the author believesthat no hosts are using the MSN field and therefore nocompatibility problem should arise.2.3 RETRANSMISSION OF MESSAGESBefore getting down to the details of the actual protocol, wewill attempt here to explain the essential ideas underlying thisprotocol by considering a somewhat simplified situation.Consider a logical communication channel X, which has at itsdisposal an inexhaustible supply of physical communicationchannels C(1), C(2), C(3), ........, etc. (See footnote #1)Channel X is to be used for transmission of messages. Inaddition to carrying the data, these messages contain (1) thechannel name X, and (2) a Message Sequence Number (MSN). Let usdenote the sender on this channel by S and the receiver by R.Let us also assume that at the start of the communication, R andS are synchronized such that R is prepared to receive messagesfor logical channel X on the physical channel C(1) and S isprepared for sending these messages on C(1). S starts by pumpinga sequence of messages M(1), M(2), M(3), ........, M(n) intochannel C(1). Since these messages contain sequence numbers, Ris able to detect loss of messages on the channel C(1). Supposenow that R discovers that message number K (where K <n) was lostin the transmission path. Let us further assume that having_________________________________________________________________(1) One method of recovery may be to let the receiver save allproperly received messages and require the sender to retransmitonly those messages that were lost. This method requires thereceiver to have the ability to reassemble the messages to buildthe data stream. A second method of recovery may be to abort andrestart the transmission at the error point. This methodrequires that the receiving host be able to distinguish betweenlegitimate messages and messages to be ignored. For simplicitywe have chosen the second method and an inexhaustible supply ofphysical channels serves to provide the distinction amongmessages. - 4 -discovered loss of a message, R can communicate this fact to S bysending an appropriate control message on another logical channelthat is explicitly reserved for transmission of control messagesfrom R to S. This channel, named Y, is assumed to be completelyreliable.We now provide a rather simplistic recovery protocol for thescenario sketched above. Having detected the loss of message M(K)on channel X, R takes the following series of actions: 1- R stops reading messages on C(1), 2- R discards those messages that were received on C1 andare placed after M(K) in the logical message sequence, 3- R prepares itself to read messages M(K), M(K+1), .....,etc. on the physical channel C(2), and 4- R sends a control message to S on control channel Y,which will inform S to the effect that there was anerror on logical channel X while using physical channelC(1) in message number K.When S receives this control message on Y, it takes the followingaction: 1- S stops sending messages on C(1), and 2- begins transmission of messages starting with thesequence number K, on the physical channel C(2).This resynchronization protocol is executed every time R detectsan error. If physical channel C(CN) was being used at the timeof the error, then the next channel to be used is C(CN+1). Wecan define a "receiver synchronization state" for the channel X,as the triplet R(C, CN, MSN), where C is the name of the group ofphysical channels, CN is the number of the physical channel inuse, and MSN is the number of message expected. (See footnote #1)We can specify a message received on a given C-channel as M(MSN).When R receives the message M(R.MSN) on the channel C(R.CN), thesynch-state changes from R(C, CN, MSN) to R(C, CN, MSN+1).However if M.MSN for the message received is greater than R.MSNthen a message has been lost, and R changes the synch-state toR(C, CN+1, MSN). What really happens may be described asfollows: upon detection of error in a logical channel X, wemerely discard the physical channel that was in use at the timeof error, and restart communication on a new physical channel atthe point where break occurred._________________________________________________________________(1) Notice that we have prefixed this triplet by the letter R(for the receiver.) We will prefix other similarly definedquantities by different letters. For example M can be used formessages. This notation permits us to write expressions likeM.MSN = R.MSN, where M.MSN stands for the message sequence numberof the message. - 5 -This scheme provides a reliable transmission path X, even thoughthe physical channels involved are unreliable. In this scheme wehave assumed that (1) a completely reliable channel Y isavailable for exchange of control messages, and (2) that there isa large supply of physical channels available for use of X. Inthe paragraphs that follow we shall revise our protocol to use asingle physical channel and then apply this protocol to thechannel Y in such a way that Y would become "self-correcting."Now suppose that channel X has only one physical channel (namedX') available for its use rather than the inexhaustible supply ofphysical channels. Our protocol would still work, if we couldsomehow simulate the effect of a large number of C-channels usingthe single channel X'. One method of providing this simulationis to include in each message the name of the C-channel on whichit is being sent, and send it on X'. Now the receiver mustexamine each message received on X' to determine the C-channel onwhich this message was sent. Our protocol still works except forone minor difference, namely, the receiver must now discardmessages corresponding to C-channels that are no longer in use,whereas in the previous system the C-channels no longer beingused were simply discarded. To be sure, X' can be multiplexedamong only a finite number of C-channels; however, we can providea sufficiently large number of C-channels so that during the lifetime of the logical channel X, the probability of exhausting thesupply of C-channels would be very low. And even if we were toexhaust the supply of C-channels, we could recycle them just aswe recycle the message sequence numbers.A physical message received on X' can now be characterized by apair of C-channel number and a message sequence number, as M(CN,MSN). The receiver synchronization state becomes a triplet R(X',CN, MSN). This state tells us that R is ready to receive amessage for X on the physical channel X' and for this messageM.CN should be equal to R.CN and M.MSN should be equal to R.MSN.All messages with M.CN less than R.CN will be ignored. If forthe next message received on X', M.CN = R.CN and M.MSN = R.MSN,then R changes the synch state to R(X', CN, MSN+1). If M.CN =R.CN but M.MSN > R.MSN then a message has been lost and thesynch-state R(X', CN, MSN) changes to R(X', CN+1, MSN). Noticethat we have not yet said anything about the situation M.CN >R.CN. We will later describe a scheme for using this case toprovide for error correction on the control channel itself.2.4 EXCHANGE OF CONTROL INFORMATIONSo far we have discussed two schemes for the detection andretransmission aspects of the lost-message problem. In this - 6 -section, we discuss methods by which the receiver communicates tothe sender the fact of loss of messages.We continue with the scenario developed in the above section witha small change. For the purposes of the discussion that is aboutto follow we shall assume that there are actually two perfectchannels available for exchange of control messages. One channelfrom S to R named S->R, and the other from R to S named R->S.The purpose of S->R will become clear in a moment. In order tolet R communicate the fact of loss of messages to S, We provide acontrol message called L__o_s_t__M_e_s_s_a_g_e__f_r_o_m__R_e_c_e_i_v_e_r (LMR) which isof the following form: LMR(X, CN, MSN), where X is the name ofthe channel, CN is the new C-channel number, and MSN is themessage sequence number of the lost message. If more than onemessage has been lost, then R uses the MSN of the first messageonly. When S receives this message, it can restart communication
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -