rfc528.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 508 行 · 第 1/2 页

TXT
508
字号
   but the problem is easily understood to be of a general nature.  In
   fact, we recently had another network-wide failure that was traced to
   a hardware error that resulted in erroneous routing messages, after
   we had installed a software checksum on all inter-IMP transmissions.
   The problem we had were due to a single broken instruction in the
   part of the IMP program that builds the routing message.  As a
   result, the routing messages from that IMP were random data, and the
   neighboring IMPs interpreted these messages as routing update
   information.  When this happened, traffic flow through the Network
   was completely disrupted and no useful work could be done until the
   failed IMP was halted.

   This kind of problem, the introduction of incorrect routing
   information into the Network, can happen in three ways:

      *  The routing message is changed in transmission.  The inter-IMP
         checksum should catch this.  The bad routing messages we saw in
         the Network had good checksums.

      *  The routing message is changed as it is constructed, say by a
         memory or processor failure, or before it is transmitted.  This
         is what we termed above an intra-IMP failure.





McQuillan                                                       [Page 5]

RFC 528             SOFTWARE CHECKSUMMING IN THE IMP        20 June 1973


      *  The routing program is incorrect for hardware or software
         reasons.

   We have attempted to solve the last two kinds of problems by
   extending the concept of software checksums.  The routing program has
   been modified to build a software checksum for the routing message as
   it builds the message, just as if it came from a Host.  It is
   important that this checksum refer to the intended contents of the
   routing message, not the actual contents.  That is, the program which
   generates the routing message builds its own software checksum as it
   proceeds, not by reading what has been stored in the routing message
   area, but by adding up the intended contents for each entry as it
   computes them.  The process which sends out routing messages then
   always verifies the checksum before transmitting them.  This scheme
   should detect all intra-IMP failures.

   Finally, the routing program itself can be checksummed to detect any
   changes in the code.  The programs which copy in received routing
   messages, compute new routing tables, and send out routing messages
   each calculate the checksum of the code before executing it.  If the
   program finds a discrepancy in the checksum of the program it is
   about to run, it immediately requests a program reload from an
   adjacent IMP.  These checksums include the checksum computation
   itself, the routing program and any constants referenced.  This
   modification should prevent a hardware failure at one IMP from
   affecting the Network at large by stopping the IMP before it does any
   damage in terms of spreading bad routing.  A version of the IMP
   program with this added protection for routing was released on May
   22.

   In the first few months of 1973, there have been several other
   efforts aimed at improving the reliability of the Network, in
   addition to software checksumming in the IMPs.  At the same time that
   we were discovering inter-IMP failures with the software checksum
   packets, we began to notice a different kind of problem with intra-
   IMP failures.  In these cases we were primarily faced with memory
   problems, and they often affected the IMP program itself, rather than
   the packets flowing through the IMP.  Our first attack on this
   problem was to build a PDP-1 program to verify the running IMP and
   TIP programs at a site against the correct core images held at the
   PDP-1.  The program interrogates the IMP with DDT messages, and
   prints out a list of discrepancies.  Using this program, we have
   already found memory failures at one site.








McQuillan                                                       [Page 6]

RFC 528             SOFTWARE CHECKSUMMING IN THE IMP        20 June 1973


4. TIP Modifications

   The hardware difficulties which we began to experience during the
   first few months of 1973 had two effects on Host-to-Host
   communication.  First, the intermittent modem interface failures, of
   the type seen at Belvoir, Aberdeen, and ETAC, meant that messages
   were occasionally lost by the network.  This loss is reported to the
   transmitting Host by the "Incomplete Transmission" message generated
   by the source IMP; the Host must then decide whether to retransmit or
   to take some other action.  Second, the higher than normal incidence
   of machine failures meant that the network sometimes "partitioned" so
   that there was no path between the two communicating Hosts. (It
   should be noted that, contrary to the original design, two sites are
   currently connected to the network by only a single path; other
   similar connections are planned.  For any such sites, any failure
   along the single path will be seen as a partition.) Since a TIP acts
   as a Host for its users, its resilience when these types of failures
   occur has a major effect on user satisfaction.

   Prior to this time the TIP program "aborted" the user's connection if
   it received an Incomplete Transmission indication from the IMP
   program.  In March the TIP program (and the programs of several other
   Hosts) was changed to retransmit messages for which the Incomplete
   Transmission indication was returned; some Hosts (e.g. MULTICs) have
   done this from the start.  This modification has turned out to be
   relatively simple, and we urge other Hosts to consider implementing
   some sort of error recovery software.  On the other hand, it has not
   seemed reasonable to continue attempting to transmit when the program
   receives a "Destination Unreachable" indication, since this could
   arise either from a network partition or from a failure at the
   destination site.  The interactive user is, of course, free to try
   again manually.

   A different situation pertains to tape transfers involving TIPs with
   the magnetic tape option.  In these cases, the user would like to
   start the process and then ignore it until the transfer is finished.
   Network partitions, even if infrequent, may occur when tape transfers
   many hours in length are in progress.  Therefore, we made a
   significant modification to the TIP magnetic tape option to include a
   sequencing mechanism in the tape transfer protocol which permits
   automatic recovery and transmission continuation after most kinds of
   network transients.  With this mechanism in effect, and assuming a
   tape is mounted at the "other end", the complete transfer of a tape
   is possible with a single command given at either end.  If the
   connection goes dead in mid-transfer, the TIP magnetic tape software
   will attempt to reopen the connection until successful and then
   continue the transfer from where it was left off.  In addition to
   modifying the TIP magnetic tape option as specified above, we also



McQuillan                                                       [Page 7]

RFC 528             SOFTWARE CHECKSUMMING IN THE IMP        20 June 1973


   modified the TENEX program which is able to communicate with the TIP
   magnetic tape option so that it remained compatible.  These changes
   were installed in April.

5. Future Plans

   We have been considering some of the issues of network reliability
   discussed above in connection with the development of the new High
   Speed Modular IMP.  This design effort and the experiences with the
   current IMP system are, of course, linked together, and we have
   already decided on several approaches to be taken in the new line of
   IMPs:

      *  The IMP will have a hardware CRC checksum generator which
         returns the checksum on a specified range of memory.

      *  The IMP will use this facility to generate and check an end-
         to-end checksum on messages.  This checksum will therefore be
         more comprehensive and better for error detection than the
         current software checksum.  It will insure a high degree of
         reliability for Host transmissions.

      *  In addition, the IMP will perform a verification of a packet
         checksum at each hop to provide diagnostic information.  This
         check will be on an optional basis, whenever the system has
         available resources for the check.

      *  The code for the new IMP system will be read-only (this is
         impractical for the present 516 and 316 IMPs), and the program
         will periodically checksum itself using the hardware CRC
         generator.  We hope to design the program so that it can be
         reloaded in segments in the event of a detected error in the
         code, with no service interruption.

      *  Finally, we are looking into the structure of an optional IMP-
         Host/Host-IMP checksum to complete Host/Host end-to-end
         checksum.  Under such an arrangement, the IMP and Host could
         agree to verify the checksums on the messages transferred over
         the interface between them, and the appropriate signalling
         mechanisms would be provided to handled errors.  With this
         technique in effect, two Hosts could be certain that their
         messages were delivered error-free or else they would be
         notified of an error, and could then retransmit their message
         if desired.







McQuillan                                                       [Page 8]

RFC 528             SOFTWARE CHECKSUMMING IN THE IMP        20 June 1973


         More details on any such modifications to the IMP and to the
         IMP-Host interface will be published when appropriate.


             [This RFC was put into machine readable form for entry]
               [into the online RFC archives by Via Genie 12/1999]













































McQuillan                                                       [Page 9]


⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?