📄 rfc816.txt
字号:
Provided that one can get the proper advice from one's higher level
protocols, it is possible to implement such a strategy. For example,
one could program the TCP level so that whenever it retransmitted a
7
segment more than once, it sent a hint down to the IP layer which
triggered polling. This strategy does not have excessive overhead, but
does have the problem that the host may be somewhat slow to respond to
an error, since only after polling has started will the host be able to
confirm that something has gone wrong, and by then the TCP above may
have already timed out.
Both forms of polling suffer from a minor flaw. Hosts as well as
gateways respond to ICMP echo messages. Thus, polling cannot be used to
detect the error that a foreign address thought to be a gateway is
actually a host. Such a confusion can arise if the physical addresses
of machines are rearranged.
4. TRIGGERED RESELECTION
There is a strategy which makes use of a hint from a higher level,
as did the previous strategy, but which avoids polling altogether.
Whenever a higher level complains that the service seems to be
defective, the Internet layer can pick the next gateway from the list of
available gateways, and switch to it. Assuming that this gateway is up,
no real harm can come of this decision, even if it was wrong, for the
worst that will happen is a redirect message which instructs the host to
return to the gateway originally being used. If, on the other hand, the
original gateway was indeed down, then this immediately provides a new
route, so the period of time until recovery is shortened. This last
strategy seems particularly clever, and is probably the most generally
suitable for those cases where the network itself does not provide fault
isolation. (Regretably, I have forgotten who suggested this idea to me.
It is not my invention.)
8
5. Higher Level Fault Detection
The previous discussion has concentrated on fault detection and
recovery at the IP layer. This section considers what the higher layers
such as TCP should do.
TCP has a single fault recovery action; it repeatedly retransmits a
segment until either it gets an acknowledgement or its connection timer
expires. As discussed above, it may use retransmission as an event to
trigger a request for fault recovery to the IP layer. In the other
direction, information may flow up from IP, reporting such things as
ICMP Destination Unreachable or error messages from the attached
network. The only subtle question about TCP and faults is what TCP
should do when such an error message arrives or its connection timer
expires.
The TCP specification discusses the timer. In the description of
the open call, the timeout is described as an optional value that the
client of TCP may specify; if any segment remains unacknowledged for
this period, TCP should abort the connection. The default for the
timeout is 30 seconds. Early TCPs were often implemented with a fixed
timeout interval, but this did not work well in practice, as the
following discussion may suggest.
Clients of TCP can be divided into two classes: those running on
immediate behalf of a human, such as Telnet, and those supporting a
program, such as a mail sender. Humans require a sophisticated response
to errors. Depending on exactly what went wrong, they may want to
9
abandon the connection at once, or wait for a long time to see if things
get better. Programs do not have this human impatience, but also lack
the power to make complex decisions based on details of the exact error
condition. For them, a simple timeout is reasonable.
Based on these considerations, at least two modes of operation are
needed in TCP. One, for programs, abandons the connection without
exception if the TCP timer expires. The other mode, suitable for
people, never abandons the connection on its own initiative, but reports
to the layer above when the timer expires. Thus, the human user can see
error messages coming from all the relevant layers, TCP and ICMP, and
can request TCP to abort as appropriate. This second mode requires that
TCP be able to send an asynchronous message up to its client to report
the timeout, and it requires that error messages arriving at lower
layers similarly flow up through TCP.
At levels above TCP, fault detection is also required. Either of
the following can happen. First, the foreign client of TCP can fail,
even though TCP is still running, so data is still acknowledged and the
timer never expires. Alternatively, the communication path can fail,
without the TCP timer going off, because the local client has no data to
send. Both of these have caused trouble.
Sending mail provides an example of the first case. When sending
mail using SMTP, there is an SMTP level acknowledgement that is returned
when a piece of mail is successfully delivered. Several early mail
receiving programs would crash just at the point where they had received
all of the mail text (so TCP did not detect a timeout due to outstanding
10
unacknowledged data) but before the mail was acknowledged at the SMTP
level. This failure would cause early mail senders to wait forever for
the SMTP level acknowledgement. The obvious cure was to set a timer at
the SMTP level, but the first attempt to do this did not work, for there
was no simple way to select the timer interval. If the interval
selected was short, it expired in normal operational when sending a
large file to a slow host. An interval of many minutes was needed to
prevent false timeouts, but that meant that failures were detected only
very slowly. The current solution in several mailers is to pick a
timeout interval proportional to the size of the message.
Server telnet provides an example of the other kind of failure. It
can easily happen that the communications link can fail while there is
no traffic flowing, perhaps because the user is thinking. Eventually,
the user will attempt to type something, at which time he will discover
that the connection is dead and abort it. But the host end of the
connection, having nothing to send, will not discover anything wrong,
and will remain waiting forever. In some systems there is no way for a
user in a different process to destroy or take over such a hanging
process, so there is no way to recover.
One solution to this would be to have the host server telnet query
the user end now and then, to see if it is still up. (Telnet does not
have an explicit query feature, but the host could negotiate some
unimportant option, which should produce either agreement or
disagreement in return.) The only problem with this is that a
reasonable sample interval, if applied to every user on a large system,
11
can generate an unacceptable amount of traffic and system overhead. A
smart server telnet would use this query only when something seems
wrong, perhaps when there had been no user activity for some time.
In both these cases, the general conclusion is that client level
error detection is needed, and that the details of the mechanism are
very dependent on the application. Application programmers must be made
aware of the problem of failures, and must understand that error
detection at the TCP or lower level cannot solve the whole problem for
them.
6. Knowing When to Give Up
It is not obvious, when error messages such as ICMP Destination
Unreachable arrive, whether TCP should abandon the connection. The
reason that error messages are difficult to interpret is that, as
discussed above, after a failure of a gateway or network, there is a
transient period during which the gateways may have incorrect
information, so that irrelevant or incorrect error messages may
sometimes return. An isolated ICMP Destination Unreachable may arrive
at a host, for example, if a packet is sent during the period when the
gateways are trying to find a new route. To abandon a TCP connection
based on such a message arriving would be to ignore the valuable feature
of the Internet that for many internal failures it reconstructs its
function without any disruption of the end points.
But if failure messages do not imply a failure, what are they for?
In fact, error messages serve several important purposes. First, if
12
they arrive in response to opening a new connection, they probably are
caused by opening the connection improperly (e.g., to a non-existent
address) rather than by a transient network failure. Second, they
provide valuable information, after the TCP timeout has occurred, as to
the probable cause of the failure. Finally, certain messages, such as
ICMP Parameter Problem, imply a possible implementation problem. In
general, error messages give valuable information about what went wrong,
but are not to be taken as absolutely reliable. A general alerting
mechanism, such as the TCP timeout discussed above, provides a good
indication that whatever is wrong is a serious condition, but without
the advisory messages to augment the timer, there is no way for the
client to know how to respond to the error. The combination of the
timer and the advice from the error messages provide a reasonable set of
facts for the client layer to have. It is important that error messages
from all layers be passed up to the client module in a useful and
consistent way.
-------
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -