📄 rfc816.txt

📁 RFC 相关的技术文档
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
RFC:  816                      FAULT ISOLATION AND RECOVERY                             David D. Clark                  MIT Laboratory for Computer Science               Computer Systems and Communications Group                               July, 1982     1.  Introduction     Occasionally, a network or a gateway will go down, and the sequenceof  hops  which the packet takes from source to destination must change.Fault isolation is that action which  hosts  and  gateways  collectivelytake  to  determine  that  something  is  wrong;  fault  recovery is theidentification and selection of an alternative route which will serve toreconnect the source to the destination.  In fact, the gateways  performmost  of  the  functions  of  fault  isolation and recovery.  There are,however, a few actions which hosts must take if they wish to  provide  areasonable  level  of  service.   This document describes the portion offault isolation and recovery which is the responsibility of the host.     2.  What Gateways Do     Gateways collectively implement an algorithm which  identifies  thebest  route  between  all pairs of networks.  They do this by exchangingpackets  which  contain  each  gateway's  latest   opinion   about   theoperational status of its neighbor networks and gateways.  Assuming thatthis  algorithm is operating properly, one can expect the gateways to gothrough a period of confusion immediately after some network or  gateway                                   2has  failed,  but  one  can assume that once a period of negotiation haspassed, the gateways are equipped with a consistent and correct model ofthe connectivity of the internet.  At present this period of negotiationmay actually take several minutes, and many TCP implementations time outwithin that period, but it is a design goal of  the  eventual  algorithmthat  the  gateway  should  be  able to reconstruct the topology quicklyenough that a TCP connection should be able to survive a failure of  theroute.     3.  Host Algorithm for Fault Recovery     Since  the gateways always attempt to have a consistent and correctmodel of the internetwork topology, the host strategy for fault recoveryis very simple.  Whenever the host feels that  something  is  wrong,  itasks the gateway for advice, and, assuming the advice is forthcoming, itbelieves  the  advice  completely.  The advice will be wrong only duringthe transient  period  of  negotiation,  which  immediately  follows  anoutage, but will otherwise be reliably correct.     In  fact,  it  is  never  necessary  for a host to explicitly ask agateway for advice, because the gateway will provide it as  appropriate.When  a  host  sends  a datagram to some distant net, the host should beprepared to receive back either  of  two  advisory  messages  which  thegateway  may  send.    The  ICMP  "redirect"  message indicates that thegateway to which the host sent the  datagram  is  not  longer  the  bestgateway  to  reach the net in question.  The gateway will have forwardedthe datagram, but the host should revise its routing  table  to  have  adifferent  immediate  address  for  this  net.    The  ICMP "destination                                   3unreachable"  message  indicates  that  as  a result of an outage, it iscurrently impossible to reach the addressed net or host in  any  manner.On  receipt  of  this  message, a host can either abandon the connectionimmediately without any further retransmission, or resend slowly to  seeif the fault is corrected in reasonable time.     If  a  host  could assume that these two ICMP messages would alwaysarrive when something was amiss in the network, then no other action  onthe  part  of the host would be required in order maintain its tables inan optimal condition.  Unfortunately, there are two circumstances  underwhich  the  messages  will  not  arrive  properly.    First,  during thetransient following a failure, error messages may  arrive  that  do  notcorrectly  represent  the  state of the world.  Thus, hosts must take anisolated error message with some scepticism.  (This transient period  isdiscussed  more  fully  below.)    Second,  if the host has been sendingdatagrams to a particular gateway, and that gateway itself crashes, thenall the other gateways in the internet will  reconstruct  the  topology,but  the  gateway  in  question will still be down, and therefore cannotprovide any advice back to the host.  As long as the host  continues  todirect  datagrams at this dead gateway, the datagrams will simply vanishoff the face of the earth, and nothing will come back in return.   Hostsmust detect this failure.     If some gateway many hops away fails, this is not of concern to thehost, for then the discovery of the failure is the responsibility of theimmediate  neighbor gateways, which will perform this action in a mannerinvisible to the host.  The  problem  only  arises  if  the  very  first                                   4gateway, the one to which the host is immediately sending the datagrams,fails.   We thus identify one single task which the host must perform asits part of fault isolation in the internet:  the  host  must  use  somestrategy  to  detect  that a gateway to which it is sending datagrams isdead.     Let us  assume  for  the  moment  that  the  host  implements  somealgorithm  to  detect  failed  gateways; we will return later to discusswhat this algorithm might be.  First, let  us  consider  what  the  hostshould  do  when it has determined that a gateway is down. In fact, withthe exception of one small problem, the action the host should  take  isextremely  simple.    The host should select some other gateway, and trysending the datagram to it.  Assuming that  gateway  is  up,  this  willeither  produce  correct  results, or some ICMP advice.  Since we assumethat, ignoring temporary periods immediately following  an  outage,  anygateway  is capable of giving correct advice, once the host has receivedadvice from any gateway, that host is in as good a condition as  it  canhope to be.     There is always the unpleasant possibility that when the host triesa different gateway, that gateway too will be down.  Therefore, whateveralgorithm  the  host  uses to detect a dead gateway must continuously beapplied, as the host tries every gateway in turn that it knows about.     The only difficult part of this algorithm is to specify  the  meansby which the host maintains the table of all of the gateways to which ithas  immediate  access.    Currently,  the specification of the internetprotocol does not architect any message by which a host can  ask  to  be                                   5supplied  with  such a table.  The reason is that different networks mayprovide very different mechanisms by which this table can be filled  in.For  example,  if  the  net is a broadcast net, such as an ethernet or aringnet, every gateway may simply broadcast such a table  from  time  totime,  and  the  host  need do nothing but listen to obtain the requiredinformation.  Alternatively, the network may provide  the  mechanism  oflogical  addressing,  by  which  a whole set of machines can be providedwith a single group  address,  to  which  a  request  can  be  sent  forassistance.   Failing those two schemes, the host can build up its tableof neighbor gateways by remembering all the gateways from which  it  hasever received a message.  Finally, in certain cases, it may be necessaryfor  this  table,  or  at  least the initial entries in the table, to beconstructed manually by a manager or operator at the  site.    In  caseswhere  the  network  in question provides absolutely no support for thiskind of host query, at least some manual intervention will  be  requiredto  get  started,  so  that  the  host  can  find out about at least onegateway.     4.  Host Algorithms for Fault Isolation     We now return to the question raised above.  What  strategy  shouldthe  host use to detect that it is talking to a dead gateway, so that itcan know to switch to some other gateway in the list. In fact, there areseveral algorithms which can be used.   All  are  reasonably  simple  toimplement, but they have very different implications for the overhead onthe  host, the gateway, and the network.  Thus, to a certain extent, thealgorithm picked must depend on the details of the network  and  of  thehost.                                   61.  NETWORK LEVEL DETECTION     Many  networks,  particularly  the  Arpanet,  perform precisely therequired function internal to the network.  If a host sends  a  datagramto  a dead gateway on the Arpanet, the network will return a "host dead"message, which is precisely the information the host needs  to  know  inorder  to  switch  to  another  gateway.   Some early implementations ofInternet on  the  Arpanet  threw  these  messages  away.    That  is  anexceedingly poor idea.2.  CONTINUOUS POLLING     The  ICMP  protocol  provides an echo mechanism by which a host maysolicit a response from a gateway.    A  host  could  simply  send  thismessage  at  a  reasonable  rate, to assure itself continuously that thegateway was still up.  This works, but, since the message must  be  sentfairly  often  to  detect  a fault in a reasonable time, it can imply anunbearable overhead on the host itself, the network,  and  the  gateway.This  strategy  is  prohibited  except  where  a  specific  analysis hasindicated that the overhead is tolerable.3.  TRIGGERED POLLING     If the use of polling could be restricted to only those times  whensomething  seemed  to  be  wrong,  then  the overhead would be bearable.
12 下一页
💿 文件大小 3544 K
👤 上传用户 kzdai22
📂 所属分类文章/文档
🏷️ 相关标签

#RFC #文档
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -