📄 rfc636.txt
字号:
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490TIP/TENEX Reliability ImprovementsRFC 636 J. Burchfiel - BBN-TENEX B. Cosell - BBN-NETNIC 30490 R. Tomlinson - BBN-TENEX D. Walden - BBN-NET 10 June 1974 TIP/TENEX Reliability Improvements During the past months we have felt strong pressure to improve thereliability of TIP/TENEX network connection as improvement in thereliability of users' connections between TENEXs and TIPs would havemajor impact on the appearance of overall network reliability due to thelarge number and high visibility of TENEXs and TIPs. Despite theemphasis on TIP/TENEX interaction, all work done applies equally well tointeractions between Hosts of any type. The remainder of this RFC gives a sketch of our plan for improving thereliability of connections bettween TIPs and TENEXs. Major portions ofthis plan have already been implemented (TIP version 322; TENEX version1.32) and are now undergoing final test prior to release throughout thenetwork. Completion of the implementation of the plan is expected inthe next quarter. Our plan for improving the reliability of TIP/TENEX connections isconcerned with obtaining and maintaining TIP/TENEX connections,gracefully recovering from lost connections, and providing clearmessages to the user whenever the state of his connection changes. When a TIP user attempts to open a connection to any Host, the Host maybe down. In this case it would be helpful to provide the user withinformation about the extent of the Host's unavailability. To facilitatethis, we modified the IMP program to accept and utilize information froma Host about when the Host will be back up and for what reason it isdown. TENEX is to be modified to supply such information before it goesdown, or through manual means, after it has gone down. When the TIPuser then attempts to connect to the down TENEX, the IMP local to theTENEX returns the information about why and for how long TENEX will bedown. The TIP is to be modified to report this sort of information tothe user; e.g., "Host unavailable because of hardware maintenance --expected available Tuesday at 16:30 GMT". The TIP's logger is presently not reentrant. Thus, no single TIP usercan be allowed to tie up the logger for too long at a time; and the TIPNWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490TIP/TENEX Reliability Improvementstherefore enforces a timeout of arbitrary length (about 60 seconds) onlogger use. However, a heavily loaded Host cannot be guaranteed alwaysto respond within 60 seconds to a TIP login request, and at present TIPusers sometimes cannot get connected to a heavily loaded TENEX. Tocorrect this problem, the TIP logger will be made reentrant and thetimeout on logger use will be eliminated. One notorious soft spot in the Host/Host protocol which degrades thereliability of connections is the Host/Host protocol incrementalallocate mechanism. Low frequency software bugs, intermittant hardwarebugs, etc., can lead to the incremental allocates associated with aconnection getting out of synchronization. When this happens it usuallyappears to the user as if the connection just "hung up". A slightaddiition to the Host/Host protocol to allow connection allocates to beresynchronized has been designed and implemented for both the TIP andTENEX. TENEX has a number of internal consistency checks (called "bughalts")which occasionally cause TENEX to halt. Frequently, after diagnosis bysystem personnel, TENEX can be made to proceed without loss from theviewpoint of local users. A mechanism is being provided which allowsTENEX to proceed in this case from the point of view of TIP users ofTENEX. The appropriate mechanism entails the following: TENEX will not dropits ready line during a bughalt (from which TENEX can usually proceedsuccessfully), nor will it clear its NCP tables and abort allconnections. Instead, after a bughalt TENEX will: discard the messageit is currently receiving, as the IMP has returned an IncompleteTransmission to the source for this message; reinitialize the interfaceto the IMP; and resynchronize, on all connections possible, Host/Hostprotocol allocate inconsistencies due to lost messages, RFNMs etc. Thelatter is done with the same mechanism described above. This procedureis not guaranteed to save all data -- a tiny bit may be lost -- but thisis of secondary importance to maintaining the connection over the TENEXbughalt. The TIP user must be kept fully informed as TENEX halts and thencontinues. Therefore, the TIP has been modified to report "Host notresponding -- connection suspended" when it senses that TENEX has halted(it does this by properly interpreting messages returned by thedestination IMP). When TENEX resumes service after proceeding from abughalt, the above procedure notifies the TIP that service is restored,and the TIP has been modified to report "Service resumed" to all usersof that Host. On the other hand, the service interruption may not be proceedable and 1NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490TIP/TENEX Reliability ImprovementsTENEX may have to do a total system reload and restart. In this caseTENEX will clear its NCP connection tables and send a Host/Host protocolreset command to all other Hosts. On receiving this reset command, theTIP will report "Host reset -- connection closed" to all users of thatHost with suspended connections. The TIP user can then re-login to theTENEX or to some other Host. Of couse, the user may not have the patience to wait for service toresume after a TENEX bughalt. Instead, he may unilaterally choose toconnect to some other Host, ignoring the previously suspendedconnection. If TENEX is then able to proceed, its NCP will still thinkits connection to the TIP is good and suitable for use. Thus, we have aconnection which the TIP thinks is closed and TENEX thinks is open, aphenomenon known as the "half-closed connection". An automaticprocedure for cleanly completing the closing of such a connection hasbeen specified and implemented for the TIP and TENEX. Since TENEX will maintain connections across service interruptions, theTIP user will be required to take the security procedure telling the TIPto "forget" his suspended connection before abandoning his terminal.The command @H 0 (for example) will guarantee that his connection willnot be reestablished on resumpption of service. Otherwise, his jobwould be left at the mercy of anyone who acquires that terminal. An appendix follows which describes the Host/Host protocol changes made.These changes are backward compatible (with the exception that Hostswhich have not implemented these changes will sometimes receiveunrecognizable Host/Host protocol commands which they presumably discardwithout suffering harm). These protocol changes are ad hoc in naturebut in light of their backward compatibility and potential utility, ARPAokayed their addition to the TIP and TENEX NCPs without (we believe) anyimplication that other Hosts have to implement them (although we wouldencourage their widespread implementation). 2NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490TIP/TENEX Reliability Improvements Appendix - Ad Hoc Change to Host-Host Protocol A.1 Introduction The current Host-Host protocol (NIC #8246) contains no provisions for resynchronizing the status information kept at the two ends of each connection. In particular, if either host suffers a service interruption, or if a control message is lost or corrupted in an interface or in the subnet, the status information at the two ends of the connection will be inconsistent. Since the current protocol provides no way to correct this condition, the NCPs at the two ends stay "confused" forever. An occasional frustrating symptom of this effect is the "lost allocate" phenomenon, where the receiving NCP believes that it has bit and message allocations outstanding, while the sending NCP believes that it does not have any allocation. As a result, information flow over that connection can never be restarted. Use of the Host-Host RST (reset) command is inappropriate here, as it destroys all connections between the two hosts. What is needed is a way to resynchronize only the affected connection without disturbing any others. A second troublesome symptom of inconsistency in status information is the "half-closed" connection: after a service interruption or network partitioning, one NCP may believe that a connection is still open, while the other believes that the connection is closed (does not exist). When such an inconsistency is discovered, the "open" end of the connection should be closed. A.2 The RAR, RAS and RAP commands To achieve resynchronization of allocation, we add the following three commands to the host-host protocol. 8 bits 8 bits ------------------- ! ! ! 16 ! RAR ! link ! ! ! ! ------------------- Reset Allocation by Receiver 8 bits 8 bits ------------------- ! ! !
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -