rfc896.txt
来自「RFC 的详细文档!」· 文本 代码 · 共 513 行 · 第 1/2 页
TXT
513 行
Network Working Group John Nagle
Request For Comments: 896 6 January 1984
Ford Aerospace and Communications Corporation
Congestion Control in IP/TCP Internetworks
This memo discusses some aspects of congestion control in IP/TCP
Internetworks. It is intended to stimulate thought and further
discussion of this topic. While some specific suggestions are
made for improved congestion control implementation, this memo
does not specify any standards.
Introduction
Congestion control is a recognized problem in complex networks.
We have discovered that the Department of Defense's Internet Pro-
tocol (IP) , a pure datagram protocol, and Transmission Control
Protocol (TCP), a transport layer protocol, when used together,
are subject to unusual congestion problems caused by interactions
between the transport and datagram layers. In particular, IP
gateways are vulnerable to a phenomenon we call "congestion col-
lapse", especially when such gateways connect networks of widely
different bandwidth. We have developed solutions that prevent
congestion collapse.
These problems are not generally recognized because these proto-
cols are used most often on networks built on top of ARPANET IMP
technology. ARPANET IMP based networks traditionally have uni-
form bandwidth and identical switching nodes, and are sized with
substantial excess capacity. This excess capacity, and the abil-
ity of the IMP system to throttle the transmissions of hosts has
for most IP / TCP hosts and networks been adequate to handle
congestion. With the recent split of the ARPANET into two inter-
connected networks and the growth of other networks with differ-
ing properties connected to the ARPANET, however, reliance on the
benign properties of the IMP system is no longer enough to allow
hosts to communicate rapidly and reliably. Improved handling of
congestion is now mandatory for successful network operation
under load.
Ford Aerospace and Communications Corporation, and its parent
company, Ford Motor Company, operate the only private IP/TCP
long-haul network in existence today. This network connects four
facilities (one in Michigan, two in California, and one in Eng-
land) some with extensive local networks. This net is cross-tied
to the ARPANET but uses its own long-haul circuits; traffic
between Ford facilities flows over private leased circuits,
including a leased transatlantic satellite connection. All
switching nodes are pure IP datagram switches with no node-to-
node flow control, and all hosts run software either written or
heavily modified by Ford or Ford Aerospace. Bandwidth of links
in this network varies widely, from 1200 to 10,000,000 bits per
second. In general, we have not been able to afford the luxury
of excess long-haul bandwidth that the ARPANET possesses, and our
long-haul links are heavily loaded during peak periods. Transit
times of several seconds are thus common in our network.
RFC 896 Congestion Control in IP/TCP Internetworks 1/6/84
Because of our pure datagram orientation, heavy loading, and wide
variation in bandwidth, we have had to solve problems that the
ARPANET / MILNET community is just beginning to recognize. Our
network is sensitive to suboptimal behavior by host TCP implemen-
tations, both on and off our own net. We have devoted consider-
able effort to examining TCP behavior under various conditions,
and have solved some widely prevalent problems with TCP. We
present here two problems and their solutions. Many TCP imple-
mentations have these problems; if throughput is worse through an
ARPANET / MILNET gateway for a given TCP implementation than
throughput across a single net, there is a high probability that
the TCP implementation has one or both of these problems.
Congestion collapse
Before we proceed with a discussion of the two specific problems
and their solutions, a description of what happens when these
problems are not addressed is in order. In heavily loaded pure
datagram networks with end to end retransmission, as switching
nodes become congested, the round trip time through the net
increases and the count of datagrams in transit within the net
also increases. This is normal behavior under load. As long as
there is only one copy of each datagram in transit, congestion is
under control. Once retransmission of datagrams not yet
delivered begins, there is potential for serious trouble.
Host TCP implementations are expected to retransmit packets
several times at increasing time intervals until some upper limit
on the retransmit interval is reached. Normally, this mechanism
is enough to prevent serious congestion problems. Even with the
better adaptive host retransmission algorithms, though, a sudden
load on the net can cause the round-trip time to rise faster than
the sending hosts measurements of round-trip time can be updated.
Such a load occurs when a new bulk transfer, such a file
transfer, begins and starts filling a large window. Should the
round-trip time exceed the maximum retransmission interval for
any host, that host will begin to introduce more and more copies
of the same datagrams into the net. The network is now in seri-
ous trouble. Eventually all available buffers in the switching
nodes will be full and packets must be dropped. The round-trip
time for packets that are delivered is now at its maximum. Hosts
are sending each packet several times, and eventually some copy
of each packet arrives at its destination. This is congestion
collapse.
This condition is stable. Once the saturation point has been
reached, if the algorithm for selecting packets to be dropped is
fair, the network will continue to operate in a degraded condi-
tion. In this condition every packet is being transmitted
several times and throughput is reduced to a small fraction of
normal. We have pushed our network into this condition experi-
mentally and observed its stability. It is possible for round-
trip time to become so large that connections are broken because
RFC 896 Congestion Control in IP/TCP Internetworks 1/6/84
the hosts involved time out.
Congestion collapse and pathological congestion are not normally
seen in the ARPANET / MILNET system because these networks have
substantial excess capacity. Where connections do not pass
through IP gateways, the IMP-to host flow control mechanisms usu-
ally prevent congestion collapse, especially since TCP implemen-
tations tend to be well adjusted for the time constants associ-
ated with the pure ARPANET case. However, other than ICMP Source
Quench messages, nothing fundamentally prevents congestion col-
lapse when TCP is run over the ARPANET / MILNET and packets are
being dropped at gateways. Worth noting is that a few badly-
behaved hosts can by themselves congest the gateways and prevent
other hosts from passing traffic. We have observed this problem
repeatedly with certain hosts (with whose administrators we have
communicated privately) on the ARPANET.
Adding additional memory to the gateways will not solve the prob-
lem. The more memory added, the longer round-trip times must
become before packets are dropped. Thus, the onset of congestion
collapse will be delayed but when collapse occurs an even larger
fraction of the packets in the net will be duplicates and
throughput will be even worse.
The two problems
Two key problems with the engineering of TCP implementations have
been observed; we call these the small-packet problem and the
source-quench problem. The second is being addressed by several
implementors; the first is generally believed (incorrectly) to be
solved. We have discovered that once the small-packet problem
has been solved, the source-quench problem becomes much more
tractable. We thus present the small-packet problem and our
solution to it first.
The small-packet problem
There is a special problem associated with small packets. When
TCP is used for the transmission of single-character messages
originating at a keyboard, the typical result is that 41 byte
packets (one byte of data, 40 bytes of header) are transmitted
for each byte of useful data. This 4000% overhead is annoying
but tolerable on lightly loaded networks. On heavily loaded net-
works, however, the congestion resulting from this overhead can
result in lost datagrams and retransmissions, as well as exces-
sive propagation time caused by congestion in switching nodes and
gateways. In practice, throughput may drop so low that TCP con-
nections are aborted.
This classic problem is well-known and was first addressed in the
Tymnet network in the late 1960s. The solution used there was to
impose a limit on the count of datagrams generated per unit time.
This limit was enforced by delaying transmission of small packets
RFC 896 Congestion Control in IP/TCP Internetworks 1/6/84
until a short (200-500ms) time had elapsed, in hope that another
character or two would become available for addition to the same
packet before the timer ran out. An additional feature to
enhance user acceptability was to inhibit the time delay when a
control character, such as a carriage return, was received.
This technique has been used in NCP Telnet, X.25 PADs, and TCP
Telnet. It has the advantage of being well-understood, and is not
too difficult to implement. Its flaw is that it is hard to come
up with a time limit that will satisfy everyone. A time limit
short enough to provide highly responsive service over a 10M bits
per second Ethernet will be too short to prevent congestion col-
lapse over a heavily loaded net with a five second round-trip
time; and conversely, a time limit long enough to handle the
heavily loaded net will produce frustrated users on the Ethernet.
The solution to the small-packet problem
Clearly an adaptive approach is desirable. One would expect a
proposal for an adaptive inter-packet time limit based on the
round-trip delay observed by TCP. While such a mechanism could
certainly be implemented, it is unnecessary. A simple and
elegant solution has been discovered.
The solution is to inhibit the sending of new TCP segments when
new outgoing data arrives from the user if any previously
transmitted data on the connection remains unacknowledged. This
inhibition is to be unconditional; no timers, tests for size of
data received, or other conditions are required. Implementation
typically requires one or two lines inside a TCP program.
At first glance, this solution seems to imply drastic changes in
the behavior of TCP. This is not so. It all works out right in
the end. Let us see why this is so.
When a user process writes to a TCP connection, TCP receives some
data. It may hold that data for future sending or may send a
packet immediately. If it refrains from sending now, it will
typically send the data later when an incoming packet arrives and
changes the state of the system. The state changes in one of two
ways; the incoming packet acknowledges old data the distant host
has received, or announces the availability of buffer space in
the distant host for new data. (This last is referred to as
"updating the window"). Each time data arrives on a connec-
tion, TCP must reexamine its current state and perhaps send some
packets out. Thus, when we omit sending data on arrival from the
user, we are simply deferring its transmission until the next
message arrives from the distant host. A message must always
arrive soon unless the connection was previously idle or communi-
cations with the other end have been lost. In the first case,
the idle connection, our scheme will result in a packet being
sent whenever the user writes to the TCP connection. Thus we do
not deadlock in the idle condition. In the second case, where
RFC 896 Congestion Control in IP/TCP Internetworks 1/6/84
the distant host has failed, sending more data is futile anyway.
Note that we have done nothing to inhibit normal TCP retransmis-
sion logic, so lost messages are not a problem.
Examination of the behavior of this scheme under various condi-
tions demonstrates that the scheme does work in all cases. The
first case to examine is the one we wanted to solve, that of the
character-oriented Telnet connection. Let us suppose that the
user is sending TCP a new character every 200ms, and that the
connection is via an Ethernet with a round-trip time including
software processing of 50ms. Without any mechanism to prevent
small-packet congestion, one packet will be sent for each charac-
ter, and response will be optimal. Overhead will be 4000%, but
this is acceptable on an Ethernet. The classic timer scheme,
with a limit of 2 packets per second, will cause two or three
characters to be sent per packet. Response will thus be degraded
even though on a high-bandwidth Ethernet this is unnecessary.
Overhead will drop to 1500%, but on an Ethernet this is a bad
tradeoff. With our scheme, every character the user types will
find TCP with an idle connection, and the character will be sent
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?