📄 rfc896.txt
字号:
Network Working Group John NagleRequest For Comments: 896 6 January 1984 Ford Aerospace and Communications Corporation Congestion Control in IP/TCP InternetworksThis memo discusses some aspects of congestion control in IP/TCPInternetworks. It is intended to stimulate thought and furtherdiscussion of this topic. While some specific suggestions aremade for improved congestion control implementation, this memodoes not specify any standards. IntroductionCongestion control is a recognized problem in complex networks.We have discovered that the Department of Defense's Internet Pro-tocol (IP) , a pure datagram protocol, and Transmission ControlProtocol (TCP), a transport layer protocol, when used together,are subject to unusual congestion problems caused by interactionsbetween the transport and datagram layers. In particular, IPgateways are vulnerable to a phenomenon we call "congestion col-lapse", especially when such gateways connect networks of widelydifferent bandwidth. We have developed solutions that preventcongestion collapse.These problems are not generally recognized because these proto-cols are used most often on networks built on top of ARPANET IMPtechnology. ARPANET IMP based networks traditionally have uni-form bandwidth and identical switching nodes, and are sized withsubstantial excess capacity. This excess capacity, and the abil-ity of the IMP system to throttle the transmissions of hosts hasfor most IP / TCP hosts and networks been adequate to handlecongestion. With the recent split of the ARPANET into two inter-connected networks and the growth of other networks with differ-ing properties connected to the ARPANET, however, reliance on thebenign properties of the IMP system is no longer enough to allowhosts to communicate rapidly and reliably. Improved handling ofcongestion is now mandatory for successful network operationunder load.Ford Aerospace and Communications Corporation, and its parentcompany, Ford Motor Company, operate the only private IP/TCPlong-haul network in existence today. This network connects fourfacilities (one in Michigan, two in California, and one in Eng-land) some with extensive local networks. This net is cross-tiedto the ARPANET but uses its own long-haul circuits; trafficbetween Ford facilities flows over private leased circuits,including a leased transatlantic satellite connection. Allswitching nodes are pure IP datagram switches with no node-to-node flow control, and all hosts run software either written orheavily modified by Ford or Ford Aerospace. Bandwidth of linksin this network varies widely, from 1200 to 10,000,000 bits persecond. In general, we have not been able to afford the luxuryof excess long-haul bandwidth that the ARPANET possesses, and ourlong-haul links are heavily loaded during peak periods. Transittimes of several seconds are thus common in our network.RFC 896 Congestion Control in IP/TCP Internetworks 1/6/84Because of our pure datagram orientation, heavy loading, and widevariation in bandwidth, we have had to solve problems that theARPANET / MILNET community is just beginning to recognize. Ournetwork is sensitive to suboptimal behavior by host TCP implemen-tations, both on and off our own net. We have devoted consider-able effort to examining TCP behavior under various conditions,and have solved some widely prevalent problems with TCP. Wepresent here two problems and their solutions. Many TCP imple-mentations have these problems; if throughput is worse through anARPANET / MILNET gateway for a given TCP implementation thanthroughput across a single net, there is a high probability thatthe TCP implementation has one or both of these problems. Congestion collapseBefore we proceed with a discussion of the two specific problemsand their solutions, a description of what happens when theseproblems are not addressed is in order. In heavily loaded puredatagram networks with end to end retransmission, as switchingnodes become congested, the round trip time through the netincreases and the count of datagrams in transit within the netalso increases. This is normal behavior under load. As long asthere is only one copy of each datagram in transit, congestion isunder control. Once retransmission of datagrams not yetdelivered begins, there is potential for serious trouble.Host TCP implementations are expected to retransmit packetsseveral times at increasing time intervals until some upper limiton the retransmit interval is reached. Normally, this mechanismis enough to prevent serious congestion problems. Even with thebetter adaptive host retransmission algorithms, though, a suddenload on the net can cause the round-trip time to rise faster thanthe sending hosts measurements of round-trip time can be updated.Such a load occurs when a new bulk transfer, such a filetransfer, begins and starts filling a large window. Should theround-trip time exceed the maximum retransmission interval forany host, that host will begin to introduce more and more copiesof the same datagrams into the net. The network is now in seri-ous trouble. Eventually all available buffers in the switchingnodes will be full and packets must be dropped. The round-triptime for packets that are delivered is now at its maximum. Hostsare sending each packet several times, and eventually some copyof each packet arrives at its destination. This is congestioncollapse.This condition is stable. Once the saturation point has beenreached, if the algorithm for selecting packets to be dropped isfair, the network will continue to operate in a degraded condi-tion. In this condition every packet is being transmittedseveral times and throughput is reduced to a small fraction ofnormal. We have pushed our network into this condition experi-mentally and observed its stability. It is possible for round-trip time to become so large that connections are broken becauseRFC 896 Congestion Control in IP/TCP Internetworks 1/6/84the hosts involved time out.Congestion collapse and pathological congestion are not normallyseen in the ARPANET / MILNET system because these networks havesubstantial excess capacity. Where connections do not passthrough IP gateways, the IMP-to host flow control mechanisms usu-ally prevent congestion collapse, especially since TCP implemen-tations tend to be well adjusted for the time constants associ-ated with the pure ARPANET case. However, other than ICMP SourceQuench messages, nothing fundamentally prevents congestion col-lapse when TCP is run over the ARPANET / MILNET and packets arebeing dropped at gateways. Worth noting is that a few badly-behaved hosts can by themselves congest the gateways and preventother hosts from passing traffic. We have observed this problemrepeatedly with certain hosts (with whose administrators we havecommunicated privately) on the ARPANET.Adding additional memory to the gateways will not solve the prob-lem. The more memory added, the longer round-trip times mustbecome before packets are dropped. Thus, the onset of congestioncollapse will be delayed but when collapse occurs an even largerfraction of the packets in the net will be duplicates andthroughput will be even worse. The two problemsTwo key problems with the engineering of TCP implementations havebeen observed; we call these the small-packet problem and thesource-quench problem. The second is being addressed by severalimplementors; the first is generally believed (incorrectly) to besolved. We have discovered that once the small-packet problemhas been solved, the source-quench problem becomes much moretractable. We thus present the small-packet problem and oursolution to it first. The small-packet problemThere is a special problem associated with small packets. WhenTCP is used for the transmission of single-character messagesoriginating at a keyboard, the typical result is that 41 bytepackets (one byte of data, 40 bytes of header) are transmittedfor each byte of useful data. This 4000% overhead is annoyingbut tolerable on lightly loaded networks. On heavily loaded net-works, however, the congestion resulting from this overhead canresult in lost datagrams and retransmissions, as well as exces-sive propagation time caused by congestion in switching nodes andgateways. In practice, throughput may drop so low that TCP con-nections are aborted.This classic problem is well-known and was first addressed in theTymnet network in the late 1960s. The solution used there was toimpose a limit on the count of datagrams generated per unit time.This limit was enforced by delaying transmission of small packetsRFC 896 Congestion Control in IP/TCP Internetworks 1/6/84until a short (200-500ms) time had elapsed, in hope that anothercharacter or two would become available for addition to the samepacket before the timer ran out. An additional feature toenhance user acceptability was to inhibit the time delay when acontrol character, such as a carriage return, was received.This technique has been used in NCP Telnet, X.25 PADs, and TCPTelnet. It has the advantage of being well-understood, and is nottoo difficult to implement. Its flaw is that it is hard to comeup with a time limit that will satisfy everyone. A time limitshort enough to provide highly responsive service over a 10M bitsper second Ethernet will be too short to prevent congestion col-lapse over a heavily loaded net with a five second round-triptime; and conversely, a time limit long enough to handle theheavily loaded net will produce frustrated users on the Ethernet. The solution to the small-packet problemClearly an adaptive approach is desirable. One would expect aproposal for an adaptive inter-packet time limit based on theround-trip delay observed by TCP. While such a mechanism couldcertainly be implemented, it is unnecessary. A simple andelegant solution has been discovered.The solution is to inhibit the sending of new TCP segments whennew outgoing data arrives from the user if any previouslytransmitted data on the connection remains unacknowledged. Thisinhibition is to be unconditional; no timers, tests for size ofdata received, or other conditions are required. Implementationtypically requires one or two lines inside a TCP program.At first glance, this solution seems to imply drastic changes inthe behavior of TCP. This is not so. It all works out right inthe end. Let us see why this is so.When a user process writes to a TCP connection, TCP receives somedata. It may hold that data for future sending or may send apacket immediately. If it refrains from sending now, it willtypically send the data later when an incoming packet arrives andchanges the state of the system. The state changes in one of twoways; the incoming packet acknowledges old data the distant hosthas received, or announces the availability of buffer space inthe distant host for new data. (This last is referred to as"updating the window"). Each time data arrives on a connec-tion, TCP must reexamine its current state and perhaps send somepackets out. Thus, when we omit sending data on arrival from theuser, we are simply deferring its transmission until the nextmessage arrives from the distant host. A message must alwaysarrive soon unless the connection was previously idle or communi-cations with the other end have been lost. In the first case,the idle connection, our scheme will result in a packet beingsent whenever the user writes to the TCP connection. Thus we donot deadlock in the idle condition. In the second case, whereRFC 896 Congestion Control in IP/TCP Internetworks 1/6/84the distant host has failed, sending more data is futile anyway.Note that we have done nothing to inhibit normal TCP retransmis-sion logic, so lost messages are not a problem.Examination of the behavior of this scheme under various condi-tions demonstrates that the scheme does work in all cases. Thefirst case to examine is the one we wanted to solve, that of thecharacter-oriented Telnet connection. Let us suppose that theuser is sending TCP a new character every 200ms, and that theconnection is via an Ethernet with a round-trip time includingsoftware processing of 50ms. Without any mechanism to preventsmall-packet congestion, one packet will be sent for each charac-ter, and response will be optimal. Overhead will be 4000%, butthis is acceptable on an Ethernet. The classic timer scheme,with a limit of 2 packets per second, will cause two or threecharacters to be sent per packet. Response will thus be degradedeven though on a high-bandwidth Ethernet this is unnecessary.Overhead will drop to 1500%, but on an Ethernet this is a badtradeoff. With our scheme, every character the user types willfind TCP with an idle connection, and the character will be sent
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -