📄 error_handling.tex
字号:
%perform this retransmission, however, unless it can guarantee that the%request was not acted on by any instance. (This rules out most%unreliable transports.) If it cannot guarantee this, or if%retransmission fails, the XRL will get an error -- either% REPLY\_TIMED\_OUT, SEND\_FAILED, NO\_FINDER, or RESOLVE\_FAILED.%% XXX This is very MIT a la Gabriel's Worse-is-Better :-) I like it,% but would rather not commit to this just now%The XRL errors of NO\_FINDER, RESOLVE\_FAILED, and to some extentNO\_SUCH\_METHOD generally represent serious problems with the router.SEND\_FAILED represents a serious problem with the target, such asthat an instance of the target has died; this problem may or may notbe transient. The SEND\_FAILED\_TRANSIENT and REPLY\_TIMED\_OUT errorsarepotentially common errors, and should be handled by theapplication. However, the likelihood of SEND\_FAILED\_TRANSIENT canoften be reduced, making it a ``fatal'' error from the application'spoint of view, by limiting the rate at which requests are sent.NO\_FINDER, RESOLVE\_FAILED, NO\_SUCH\_METHOD, andSEND\_FAILED\_TRANSIENT, are all indications that the XRL was notcommunicated to its target. They are therefore called \emph{sendfailures}. The other two errors, REPLY\_TIMED\_OUT and SEND\_FAILED,may be generated even if the target received the request. They aretherefore called \emph{receive failures}.If a peer dies, we will receive notification of this explicitly andwill deal with it as specified in section \ref{pfailure}. Thus mostXRL transport errors SHOULD NOT be taken as an indication that thepeer is definitely dead. If an application cares that the peer hasdied or restarted, it SHOULD register with the finder to receivenotifications of process restarts. Thus, a process SHOULD assume thatan XRL transport problem will be transient until it receives anexplicit confirmation that the destination has failed, particularlywhen the XRL interface is unreliable.In addition to an XRL interface being reliable or unreliable, the waythe application uses an XRL interface can by pipelined ornon-pipelined. In the pipelined case, multiple requests can beoutstanding simultaneously; in the non-pipelined case at most onerequest can be outstanding at a time.It is useful for us to categorize XRL interfaces along these two axes:reliable/unreliable and pipelined/non-pipelined. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\subsection*{Unreliable, Non-pipelined}If an XRL send failure occurs, the sending application MAY choose toretransmit the XRL, or ignore the failure as it sees fit. In an XRL receive failure occurs, the sending application MAY also chooseto retransmit the XRL, or ignore the failure as it sees fit. However, ifthe application chooses to re-send the XRL, the interface MUST be writtenin such a way that the receipt of a duplicate request will not damage thesystem. (XXX Isn't this true anyway? Network duplicates?)%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\subsection*{Reliable, Non-pipelined}If a SEND\_FAILED\_TRANSIENT error occurs, the sending application MAYretransmit the XRL.SEND\_FAILED, NO\_FINDER, and most RESOLVE\_FAILED andNO\_SUCH\_METHOD errors are unrecoverable. The application shouldcause this XRL interface to go dormant, in the expectation that itwill authoritatively discover from the finder that the target hasdied.REPLY\_TIMED\_OUT cannot happen on reliable interfaces.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\subsection*{Unreliable, Pipelined}The same issues apply as with unreliable, non-pipelined, but thesituation is more complicated. An interface that uses unreliabletransport and pipelining is one that explicitly permits loss \emph{andre-ordering} of requests. It is up to the application to choosewhether to retransmit XRLs that return SEND\_FAILED\_TRANSIENT orREPLY\_TIMED\_OUT, but the application must only do so if it iscertain that the re-ordering caused by retransmission will not be aproblem.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\subsection*{Reliable, Pipelined}The XRL library ensures that pipelined messages sent to a reliable targetare delivered in order. In particular, if a request $R$ to a given targetgets an error, then no \emph{outstanding} requests to that target\emph{registered later than $R$} will successfully complete -- they willall get the same error, and none of them will be delivered to the receivingapplication. Once the error is delivered, this error state is wiped out,and later requests to the target may succeed -- perhaps because the targetwas restarted.Again, SEND\_FAILED, NO\_FINDER, and most RESOLVE\_FAILED andNO\_SUCH\_METHOD errors are unrecoverable.The application SHOULD cause this XRL interface to go dormant, in theexpectation that it will authoritatively discover from the finder thatthe target has died.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\section{Execution Error}A XORP router is partitioned into many processes; most of the operatingsystem specific interactions are performed by the FEA. In a router themost frequent operation will be the adding and deleting of routes.Consider BGP adding a route. First the BGP process will send the routeto the RIB, then the route may be sent to the FEA. If the addition of theroute from the RIB to the FEA fails, then there is no way ofpropagating this failure back to the BGP process due to theasynchronous nature of XRLs. If adding/deleting a route fails a verydrastic way of propagating this failure back to the BGP process wouldbe for either or both the FEA and RIB processes to exit, in which casethe process failure responses already described would be used and BGPwould exit. Process exit is an extreme response to failingto add a route, but at least the error handling code for process exitexists already. It is important though notto mask over implementation problems by ignoring errors. In the restof this section we will outline how to deal with a number of commonerrors.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\subsection{Adding/Deleting route failures}As stated above, a highly likely error is failures when adding ordeleting routes. Typically the interaction will occur between the RIBand FEA. When an error occurs it should be logged by the FEA and thecause returned to the RIB. The RIB can be configured with policy onhow to react to different errors.Adding a route will typically fail because a route already exists.Firstly, if a route already exists it is either the same or differentto the one that we attempted to add. Secondly, either the FEAinstalled the route or a third party installed it. Therefore whenadding a route fails the FEA should return if the current route is thesame or different to the one we attempted to add, as well as whoinstalled the route originally. The RIB on receiving the error statefrom the FEA can decide as a matter of policy how to proceed. If anattempt to add a route fails because a different route exists the RIBcould choose to delete the old route and add the new route.The most common reason for a route deletion to fail would be that theroute is no longer present. The FEA should log that it has been askedto delete a route that doesn't exist. The RIB should decide if thisproblem should be considered fatal.%%%%%%%%%%%%%%%%%%%%%%\subsubsection{Route Add Failure due to Resource Starvation}When a routing process sends a route to the RIB, the asynchronousnature of XRL handling means that the RIB will typically accept theroute before it has finished processing the addition, and certainlybefore it attempts to pass the route to the FEA, and hence on into theforwarding engine. It is possible for the route addition to fail dueto memory exhaustion in either the RIB or in the forwarding engineitself. Should this occur, it is important for the routing protocolto be made aware of the event, because the routing information willnow be out of synchronization with the forwarding information.If the forwarding engine refuses the route due to resource starvation,the FEA will receive the failure. The FEA will then indicateasynchronously to the RIB that the failure occurred. The RIB will inturn delete all state from all routing protocols that contributedversions of this route, and asynchronously pass the failure up tothose routing protocols. Each of those routing protocols will thenhandle the failure in a protocol specific manner.If the failure occurs due to resource starvation in the RIB, a similarprocess will be initiated. It is not currently clear how to reliablynotify a routing protocol in the case when the router is running outof memory for user-space processes.In the case of BGP, if a route fails to be added due to resourcestarvation, the simplest mechanism is to take down the peering thatoriginated the route. The normal peer reinitialization mechanism(after some time delay) will ensure that all the routes arere-instantiated after the resource starvation problem goes away.In the case of RIP, if a route fails to be added due to resourcestarvation, the simplest mechanism is to send our peers an infinitemetric route for this particular prefix and to delete the state forthis prefix. The normal RIP periodic update will ensure that theroute is re-instantiated after the resource starvation problem goesaway.In the case of link-state protocols such as OSPF and IS-IS, there isno good way to deal with this situation. A reasonable solution mightbe to take down all adjacencies to avoid causing a blackhole, then tobring up the adjacencies again but not propagate any link-stateadvertisements to our neighbors (so they won't route via us) until allthe link-state advertisements have been received and we'vesuccessfully installed all the routes in the kernel.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% APPENDIX%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\appendix\section{Modification History}\begin{itemize} \item June 9, 2003: Initial version 0.3 completed. \item August 28, 2003: Updated the version to 0.4, and the date. \item November 6, 2003: Updated the version to 0.5, and the date. \item July 8, 2004: Updated the version to 1.0, and the date. \item April 13, 2005: Updated the version to 1.1, and the date. \item March 8, 2006: Added a footnote about the policy manager process. Updated the version to 1.2, and the date. \item August 2, 2006: Added ``Modification History'' appendix. Updated the version to 1.3, and the date. \item March 20, 2007: Updated the version to 1.4, and the date.\end{itemize}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% BIBLIOGRAPHY%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\bibliography{../tex/xorp}\bibliographystyle{plain}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\end{document}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -