⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 error_handling.tex

📁 xorp源码hg
💻 TEX
📖 第 1 页 / 共 3 页
字号:
%% $XORP: xorp/docs/design_arch/error_handling.tex,v 1.42 2007/03/15 00:43:12 pavlin Exp $%\documentclass[11pt]{article}%\usepackage[dvips]{changebar}\usepackage{subfigure}\usepackage{fullpage}\usepackage{setspace}\usepackage{times}\usepackage{latexsym}\usepackage{epsfig}\usepackage{graphicx}\usepackage{xspace}\usepackage{color}\usepackage{amsmath}\usepackage{rotating}\usepackage{moreverb}\usepackage{listings}\usepackage{alltt}\usepackage{stmaryrd}%\usepackage[dvipdf]{graphics}%\usepackage[dvips]{graphicx}%\usepackage{xorp}\definecolor{gray}{rgb}{0.5,0.5,0.5}\newcommand{\etc}{\emph{etc.}\xspace}\newcommand{\ie}{\emph{i.e.,}\xspace}\newcommand{\eg}{\emph{e.g.,}\xspace}%\newcommand{\comment}[1]{{\color{gray}[\textsf{#1}]}}%\newcommand{\comment}[1]{}\newcommand{\xorp} {{\em XORP}\@\xspace}\newcommand{\module} {{\em module}\@\xspace}\newcommand{\modules} {{\em modules}\@\xspace}\newcommand{\finder} {{\em Finder}\@\xspace}\newcommand{\xorpsh} {{\em Xorpsh}\@\xspace}\newcommand{\cm} {{\em CM}\@\xspace}\newcommand{\xrl} {{\em XRL}\@\xspace}\newcommand{\xt} {{\em XRL Target}\@\xspace}\newcommand{\rtrmgr} {{\em rtrmgr}\@\xspace}% Changebar stuff% \newenvironment{colorcode}{\color{blue}}{}% \renewcommand{\cbstart}{\begin{colorcode}}% \renewcommand{\cbend}{\end{colorcode}}% \pagestyle{empty}\begin{document}\title{XORP Error Handling \\\vspace{1ex}Version 1.4}\author{ XORP Project					\\	 International Computer Science Institute	\\	 Berkeley, CA 94704, USA			\\         {\it http://www.xorp.org/}			\\	 {\it feedback@xorp.org}}\date{March 20, 2007}\maketitle%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\section{Introduction}A \xorp router is made up of a number of processes that communicatevia XRLs \cite{xorp:xrl} (a messaging system developed for \xorp). Inthis document we will focus on how to deal with errors that aregenerated directly or indirectly by \xrl calls, and discuss how tohandle process failures and the subsequent restart of failedprocesses.  Of course, in an ideal world processes would not fail, butwhen they do fail, our goals are to keep as much routerfunctionality working as possible, to avoid permanent inconsistenciesat all costs, and for the remainder of the functionality to berestored as quickly as possible.Many \xorp processes share routing state that must remainsynchronised. For example, the BGP process sends the result of itsrouting decisions to the RIB process, which passes these routes on tothe FEA and hence to the forwarding engine's Forwarding InformationBase (FIB). If the RIB process fails, then BGP would lose the abilityto manipulate the FIB, and forwarding would not match the BGP routingtable.  Thus, BGP should withdraw all routes that it told its peers, oralternatively it might drop all peerings until the RIB hassuccessfully restarted.A critical component of the system is the router manager process(\rtrmgr) which is responsible for starting and stopping routingprocesses. When a \xorp process starts or terminates, thatprocess's XRL client library ensures that the \finder is notified. Ifa process has an interest in the status of another process it canregister interest with the \finder.In a \xorp router, as with any complex system, errors can occur. Theseerrors can range from a \xorp process simply failing, to an attempt toinstall a route into the forwarding engine that already exists. Errorsneed to be dealt with in a consistent manner. The typesof error that may occur are categorized below.The first type of error is {\em Process Failure}.The second type of error is {\em Communication Error}. At the mostbasic level an attempt to send an \xrl has failed. The process thatwas the recipient of the \xrl may have failed or be slow to respond.The message that was being sent may have been lost in transit.The third type of error, {\em Execution Error}, is when an XRL callreturns an error due to some underlying interaction failure. A simpleexample of this type of error is a ``route add'' failing. The attempt toadd a route may fail for many reasons. The identical route may alreadybe present or a different route may be installed. The error may occurdue to a bug in the router code, because routing state has beenmanipulated by non \xorp processes, or due to resource starvation inthe forwarding engine.The fourth type of error, {\em Type Error}, is when an XRL call failsbecause the arguments passed to an XRL are invalid. This error willmost likely be due to a version mismatch between \xorp processes. Ifall the processes in a \xorp router have been built from the samesource tree this error should not occur. As we are building anextensible router it may be the case that a process built from adifferent source tree may encounter compatibility problems.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\section{\label{pfailure}Process Failure}A \xorp router is made up of a number of distinct processes. There aredependencies between these processes. We define the criticaldependencies and what action to take on detecting failure.The most critical component of a \xorp router is the \rtrmgr/\finderprocess. One of the functions of this component is to start/re-startprocesses. If process A is dependent on the status (\eg alive, dead,restarted) of process B, then process A registers this interest withthe \finder. This dependency on the \rtrmgr/\finder for managing andmonitoring process liveness state means that a \xorp router cannotsurvive the failure of this process. If we attempted to survive a\finder restart, it is conceivable that, in the same time window,another monitored process could restart, in which case the restartingof the monitored process could be missed by the \finder. To guardagainst this possible race, a \xorp process that detects the loss ofthe \finder must exit. There is one exception to this rule, the\xorpsh process, that will be discussed later in section \ref{xorpsh}.Each process in a \xorp router is described with how it should behavewhen another process in the system fails. Processes can explicitlyregister interest in the status of other processes through the\finder. If process A is dependent on the state of process B thenprocess A must register interest in process B.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\subsection{Implementing process failure detection}The \finder process will send keepalive messages to all processes atthirty second intervals. If a process does not respond to a keepaliveit is considered dead. The keepalive messages are sent over a reliabletransport such as TCP. A process dying should therefore be easy todetect.The \rtrmgr might also be able to detect that a process has died (butnot if it is simply not responding), as it will normally receive aSIGCHILD signal.  On discovering a process has died, the \rtrmgr willsend a hint to the \finder, which will immediately try and send akeepalive.  Again if the process has died it should be easy to detect.If a process is not responding to keepalives but it is still alive, itwill be marked as dead and all interested processes will be notified.Most importantly, the \rtrmgr will be notified and it will kill therunning process and start a new process.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\subsection{Actions to take on detecting process failure}Table \ref{failure_table} indicates what action a process should take ondetecting failure in other processes~\footnote{Note that currently thisdocument does not describe the policy manager. Such description will beincluded in the future. For all practical reasons, the policy manager isas important as the rtrmgr/finder, even though it is running as aseparate process.}. The ``(G)'' denotes that the process should attemptto exit gracefully. Figure \ref{failure_fig} shows the relationshipbetween the various processes. The thick arrows should be modelled as asignal sent from a process dying to its dependent processes.\begin{figure}  \begin{center}    \includegraphics[width=0.9\textwidth]{figs/error_dependency.eps}    \caption{Process relationship on failure}    \label{failure_fig}  \end{center}\end{figure}\begin{table}[ht]\begin{center}\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|}\hlineProcess fails   &                 &          &      &      &      &         &      &      &      &         \\\hline                & \rtrmgr/        & FEA      & MFEA & RIB  & IGMP & PIM     & BGP  & RIP  & OSPF & \xorpsh \\                & \finder         &          &      &      &      &         &      &      &      &         \\\hline\rtrmgr/        & /               & Withdraw & Exit & Exit & Exit & Exit    & Exit & Exit & Exit & Report  \\\finder         &                 & All      &      &      &      &         &      &      &      & Problem \\                &                 & Unicast  &      &      &      &         &      &      &      & Wait    \\                &                 & Routes   &      &      &      &         &      &      &      &         \\                &                 & Exit     &      &      &      &         &      &      &      &         \\\hlineFEA(*)          &  Restart        & /        & Exit & Exit & Exit & Exit    & Exit & Exit & Exit & -       \\\hlineMFEA(*)         &  Restart        & -        & /    & -    & Exit & Exit    & -    & -    & -    & -       \\\hlineRIB             &  Restart        & Withdraw & /    & -    & Exit & Exit    & Exit & Exit & Exit & -       \\                &                 & All      &      &      & (G)  & (G)     & (G)  & (G)  & (G)  &         \\                &                 & Unicast  &      &      &      &         &      &      &      &         \\                &                 & Routes   &      &      &      &         &      &      &      &         \\\hlineIGMP            &  Restart        & -        & -    & -    & /    & Delete  & -    & -    & -    & -       \\                &                 &          &      &      &      & Local   &      &      &      &         \\                &                 &          &      &      &      & Members &      &      &      &         \\                &                 &          &      &      &      & After   &      &      &      &         \\                &                 &          &      &      &      & Timeout &      &      &      &         \\\hlinePIM             &  Restart        & -        & -    & -    & -    & /       & -    & -    & -    & -       \\\hlineBGP             &  Restart        & -        & -    & -    & -    & -       & /    & -    & -    & -       \\\hlineBGP             &  Restart        & -        & -    & -    & -    & -       & /    & -    & -    & -       \\\hlineRIP             &  Restart        & -        & -    & -    & -    & -       & -    & /    & -    & -       \\\hlineOSPF            &  Restart        & -        & -    & -    & -    & -       & -    & -    & /    & -       \\\hline\xorpsh         &  Restart        & -        & -    & -    & -    & -       & -    & -    & -    & /       \\\hline\end{tabular}\end{center}{\small Note(*): Typically, the MFEA would be part of the FEA process}\caption{\label{failure_table}Action to take on detecting process failure}\end{table}%%%%%%%%%%%%%%%%%%%%%%\subsubsection{\rtrmgr/\finder - Router manager}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -