📄 rfc528.txt
字号:
but the problem is easily understood to be of a general nature. In fact, we recently had another network-wide failure that was traced to a hardware error that resulted in erroneous routing messages, after we had installed a software checksum on all inter-IMP transmissions. The problem we had were due to a single broken instruction in the part of the IMP program that builds the routing message. As a result, the routing messages from that IMP were random data, and the neighboring IMPs interpreted these messages as routing update information. When this happened, traffic flow through the Network was completely disrupted and no useful work could be done until the failed IMP was halted. This kind of problem, the introduction of incorrect routing information into the Network, can happen in three ways: * The routing message is changed in transmission. The inter-IMP checksum should catch this. The bad routing messages we saw in the Network had good checksums. * The routing message is changed as it is constructed, say by a memory or processor failure, or before it is transmitted. This is what we termed above an intra-IMP failure.McQuillan [Page 5]RFC 528 SOFTWARE CHECKSUMMING IN THE IMP 20 June 1973 * The routing program is incorrect for hardware or software reasons. We have attempted to solve the last two kinds of problems by extending the concept of software checksums. The routing program has been modified to build a software checksum for the routing message as it builds the message, just as if it came from a Host. It is important that this checksum refer to the intended contents of the routing message, not the actual contents. That is, the program which generates the routing message builds its own software checksum as it proceeds, not by reading what has been stored in the routing message area, but by adding up the intended contents for each entry as it computes them. The process which sends out routing messages then always verifies the checksum before transmitting them. This scheme should detect all intra-IMP failures. Finally, the routing program itself can be checksummed to detect any changes in the code. The programs which copy in received routing messages, compute new routing tables, and send out routing messages each calculate the checksum of the code before executing it. If the program finds a discrepancy in the checksum of the program it is about to run, it immediately requests a program reload from an adjacent IMP. These checksums include the checksum computation itself, the routing program and any constants referenced. This modification should prevent a hardware failure at one IMP from affecting the Network at large by stopping the IMP before it does any damage in terms of spreading bad routing. A version of the IMP program with this added protection for routing was released on May 22. In the first few months of 1973, there have been several other efforts aimed at improving the reliability of the Network, in addition to software checksumming in the IMPs. At the same time that we were discovering inter-IMP failures with the software checksum packets, we began to notice a different kind of problem with intra- IMP failures. In these cases we were primarily faced with memory problems, and they often affected the IMP program itself, rather than the packets flowing through the IMP. Our first attack on this problem was to build a PDP-1 program to verify the running IMP and TIP programs at a site against the correct core images held at the PDP-1. The program interrogates the IMP with DDT messages, and prints out a list of discrepancies. Using this program, we have already found memory failures at one site.McQuillan [Page 6]RFC 528 SOFTWARE CHECKSUMMING IN THE IMP 20 June 19734. TIP Modifications The hardware difficulties which we began to experience during the first few months of 1973 had two effects on Host-to-Host communication. First, the intermittent modem interface failures, of the type seen at Belvoir, Aberdeen, and ETAC, meant that messages were occasionally lost by the network. This loss is reported to the transmitting Host by the "Incomplete Transmission" message generated by the source IMP; the Host must then decide whether to retransmit or to take some other action. Second, the higher than normal incidence of machine failures meant that the network sometimes "partitioned" so that there was no path between the two communicating Hosts. (It should be noted that, contrary to the original design, two sites are currently connected to the network by only a single path; other similar connections are planned. For any such sites, any failure along the single path will be seen as a partition.) Since a TIP acts as a Host for its users, its resilience when these types of failures occur has a major effect on user satisfaction. Prior to this time the TIP program "aborted" the user's connection if it received an Incomplete Transmission indication from the IMP program. In March the TIP program (and the programs of several other Hosts) was changed to retransmit messages for which the Incomplete Transmission indication was returned; some Hosts (e.g. MULTICs) have done this from the start. This modification has turned out to be relatively simple, and we urge other Hosts to consider implementing some sort of error recovery software. On the other hand, it has not seemed reasonable to continue attempting to transmit when the program receives a "Destination Unreachable" indication, since this could arise either from a network partition or from a failure at the destination site. The interactive user is, of course, free to try again manually. A different situation pertains to tape transfers involving TIPs with the magnetic tape option. In these cases, the user would like to start the process and then ignore it until the transfer is finished. Network partitions, even if infrequent, may occur when tape transfers many hours in length are in progress. Therefore, we made a significant modification to the TIP magnetic tape option to include a sequencing mechanism in the tape transfer protocol which permits automatic recovery and transmission continuation after most kinds of network transients. With this mechanism in effect, and assuming a tape is mounted at the "other end", the complete transfer of a tape is possible with a single command given at either end. If the connection goes dead in mid-transfer, the TIP magnetic tape software will attempt to reopen the connection until successful and then continue the transfer from where it was left off. In addition to modifying the TIP magnetic tape option as specified above, we alsoMcQuillan [Page 7]RFC 528 SOFTWARE CHECKSUMMING IN THE IMP 20 June 1973 modified the TENEX program which is able to communicate with the TIP magnetic tape option so that it remained compatible. These changes were installed in April.5. Future Plans We have been considering some of the issues of network reliability discussed above in connection with the development of the new High Speed Modular IMP. This design effort and the experiences with the current IMP system are, of course, linked together, and we have already decided on several approaches to be taken in the new line of IMPs: * The IMP will have a hardware CRC checksum generator which returns the checksum on a specified range of memory. * The IMP will use this facility to generate and check an end- to-end checksum on messages. This checksum will therefore be more comprehensive and better for error detection than the current software checksum. It will insure a high degree of reliability for Host transmissions. * In addition, the IMP will perform a verification of a packet checksum at each hop to provide diagnostic information. This check will be on an optional basis, whenever the system has available resources for the check. * The code for the new IMP system will be read-only (this is impractical for the present 516 and 316 IMPs), and the program will periodically checksum itself using the hardware CRC generator. We hope to design the program so that it can be reloaded in segments in the event of a detected error in the code, with no service interruption. * Finally, we are looking into the structure of an optional IMP- Host/Host-IMP checksum to complete Host/Host end-to-end checksum. Under such an arrangement, the IMP and Host could agree to verify the checksums on the messages transferred over the interface between them, and the appropriate signalling mechanisms would be provided to handled errors. With this technique in effect, two Hosts could be certain that their messages were delivered error-free or else they would be notified of an error, and could then retransmit their message if desired.McQuillan [Page 8]RFC 528 SOFTWARE CHECKSUMMING IN THE IMP 20 June 1973 More details on any such modifications to the IMP and to the IMP-Host interface will be published when appropriate. [This RFC was put into machine readable form for entry] [into the online RFC archives by Via Genie 12/1999]McQuillan [Page 9]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -