rfc1185.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 1,179 行 · 第 1/4 页

TXT
1,179
字号






Network Working Group                                        V. Jacobson
Request for Comments: 1185                                           LBL
                                                               R. Braden
                                                                     ISI
                                                                L. Zhang
                                                                    PARC
                                                            October 1990


                   TCP Extension for High-Speed Paths

Status of This Memo

   This memo describes an Experimental Protocol extension to TCP for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "IAB
   Official Protocol Standards" for the standardization state and status
   of this protocol.  Distribution of this memo is unlimited.

Summary

   This memo describes a small extension to TCP to support reliable
   operation over very high-speed paths, using sender timestamps
   transmitted using the TCP Echo option proposed in RFC-1072.

1. INTRODUCTION

   TCP uses positive acknowledgments and retransmissions to provide
   reliable end-to-end delivery over a full-duplex virtual circuit
   called a connection [Postel81].  A connection is defined by its two
   end points; each end point is a "socket", i.e., a (host,port) pair.
   To protect against data corruption, TCP uses an end-to-end checksum.
   Duplication and reordering are handled using a fine-grained sequence
   number space, with each octet receiving a distinct sequence number.

   The TCP protocol [Postel81] was designed to operate reliably over
   almost any transmission medium regardless of transmission rate,
   delay, corruption, duplication, or reordering of segments.  In
   practice, proper TCP implementations have demonstrated remarkable
   robustness in adapting to a wide range of network characteristics.
   For example, TCP implementations currently adapt to transfer rates in
   the range of 100 bps to 10**7 bps and round-trip delays in the range
   1 ms to 100 seconds.

   However, the introduction of fiber optics is resulting in ever-higher
   transmission speeds, and the fastest paths are moving out of the
   domain for which TCP was originally engineered.  This memo and RFC-
   1072 [Jacobson88] propose modest extensions to TCP to extend the



Jacobson, Braden & Zhang                                        [Page 1]

RFC 1185               TCP over High-Speed Paths            October 1990


   domain of its application to higher speeds.

   There is no one-line answer to the question: "How fast can TCP go?".
   The issues are reliability and performance, and these depend upon the
   round-trip delay and the maximum time that segments may be queued in
   the Internet, as well as upon the transmission speed.  We must think
   through these relationships very carefully if we are to successfully
   extend TCP's domain.

   TCP performance depends not upon the transfer rate itself, but rather
   upon the product of the transfer rate and the round-trip delay.  This
   "bandwidth*delay product" measures the amount of data that would
   "fill the pipe"; it is the buffer space required at sender and
   receiver to obtain maximum throughput on the TCP connection over the
   path.  RFC-1072 proposed a set of TCP extensions to improve TCP
   efficiency for "LFNs" (long fat networks), i.e., networks with large
   bandwidth*delay products.

   On the other hand, high transfer rate can threaten TCP reliability by
   violating the assumptions behind the TCP mechanism for duplicate
   detection and sequencing.  The present memo specifies a solution for
   this problem, extending TCP reliability to transfer rates well beyond
   the foreseeable upper limit of bandwidth.

   An especially serious kind of error may result from an accidental
   reuse of TCP sequence numbers in data segments.  Suppose that an "old
   duplicate segment", e.g., a duplicate data segment that was delayed
   in Internet queues, was delivered to the receiver at the wrong moment
   so that its sequence numbers fell somewhere within the current
   window.  There would be no checksum failure to warn of the error, and
   the result could be an undetected corruption of the data.  Reception
   of an old duplicate ACK segment at the transmitter could be only
   slightly less serious: it is likely to lock up the connection so that
   no further progress can be made and a RST is required to
   resynchronize the two ends.

   Duplication of sequence numbers might happen in either of two ways:

   (1)  Sequence number wrap-around on the current connection

        A TCP sequence number contains 32 bits.  At a high enough
        transfer rate, the 32-bit sequence space may be "wrapped"
        (cycled) within the time that a segment may be delayed in
        queues.  Section 2 discusses this case and proposes a mechanism
        to reject old duplicates on the current connection.

   (2)  Segment from an earlier connection incarnation




Jacobson, Braden & Zhang                                        [Page 2]

RFC 1185               TCP over High-Speed Paths            October 1990


        Suppose a connection terminates, either by a proper close
        sequence or due to a host crash, and the same connection (i.e.,
        using the same pair of sockets) is immediately reopened.  A
        delayed segment from the terminated connection could fall within
        the current window for the new incarnation and be accepted as
        valid.  This case is discussed in Section 3.

   TCP reliability depends upon the existence of a bound on the lifetime
   of a segment: the "Maximum Segment Lifetime" or MSL.  An MSL is
   generally required by any reliable transport protocol, since every
   sequence number field must be finite, and therefore any sequence
   number may eventually be reused.  In the Internet protocol suite, the
   MSL bound is enforced by an IP-layer mechanism, the "Time-to-Live" or
   TTL field.

   Watson's Delta-T protocol [Watson81] includes network-layer
   mechanisms for precise enforcement of an MSL.  In contrast, the IP
   mechanism for MSL enforcement is loosely defined and even more
   loosely implemented in the Internet.  Therefore, it is unwise to
   depend upon active enforcement of MSL for TCP connections, and it is
   unrealistic to imagine setting MSL's smaller than the current values
   (e.g., 120 seconds specified for TCP).  The timestamp algorithm
   described in the following section gives a way out of this dilemma
   for high-speed networks.


2.  SEQUENCE NUMBER WRAP-AROUND

   2.1  Background

      Avoiding reuse of sequence numbers within the same connection is
      simple in principle: enforce a segment lifetime shorter than the
      time it takes to cycle the sequence space, whose size is
      effectively 2**31.

      More specifically, if the maximum effective bandwidth at which TCP
      is able to transmit over a particular path is B bytes per second,
      then the following constraint must be satisfied for error-free
      operation:

          2**31 / B  > MSL (secs)                                    [1]

      The following table shows the value for Twrap = 2**31/B in
      seconds, for some important values of the bandwidth B:







Jacobson, Braden & Zhang                                        [Page 3]

RFC 1185               TCP over High-Speed Paths            October 1990


           Network       B*8          B         Twrap
                      bits/sec   bytes/sec      secs
           _______    _______      ______       ______

           ARPANET       56kbps       7KBps    3*10**5 (~3.6 days)

           DS1          1.5Mbps     190KBps    10**4 (~3 hours)

           Ethernet      10Mbps    1.25MBps    1700 (~30 mins)

           DS3           45Mbps     5.6MBps    380

           FDDI         100Mbps    12.5MBps    170

           Gigabit        1Gbps     125MBps    17


      It is clear why wrap-around of the sequence space was not a
      problem for 56kbps packet switching or even 10Mbps Ethernets.  On
      the other hand, at DS3 and FDDI speeds, Twrap is comparable to the
      2 minute MSL assumed by the TCP specification [Postel81].  Moving
      towards gigabit speeds, Twrap becomes too small for reliable
      enforcement by the Internet TTL mechanism.

      The 16-bit window field of TCP limits the effective bandwidth B to
      2**16/RTT, where RTT is the round-trip time in seconds
      [McKenzie89].  If the RTT is large enough, this limits B to a
      value that meets the constraint [1] for a large MSL value.  For
      example, consider a transcontinental backbone with an RTT of 60ms
      (set by the laws of physics).  With the bandwidth*delay product
      limited to 64KB by the TCP window size, B is then limited to
      1.1MBps, no matter how high the theoretical transfer rate of the
      path.  This corresponds to cycling the sequence number space in
      Twrap= 2000 secs, which is safe in today's Internet.

      Based on this reasoning, an earlier RFC [McKenzie89] has cautioned
      that expanding the TCP window space as proposed in RFC-1072 will
      lead to sequence wrap-around and hence to possible data
      corruption.  We believe that this is mis-identifying the culprit,
      which is not the larger window but rather the high bandwidth.

           For example, consider a (very large) FDDI LAN with a diameter
           of 10km.  Using the speed of light, we can compute the RTT
           across the ring as (2*10**4)/(3*10**8) = 67 microseconds, and
           the delay*bandwidth product is then 833 bytes.  A TCP
           connection across this LAN using a window of only 833 bytes
           will run at the full 100mbps and can wrap the sequence space
           in about 3 minutes, very close to the MSL of TCP. Thus, high



Jacobson, Braden & Zhang                                        [Page 4]

RFC 1185               TCP over High-Speed Paths            October 1990


           speed alone can cause a reliability problem with sequence
           number wrap-around, even without extended windows.

      An "obvious" fix for the problem of cycling the sequence space is
      to increase the size of the TCP sequence number field.  For
      example, the sequence number field (and also the acknowledgment
      field) could be expanded to 64 bits.  However, the proposals for
      making such a change while maintaining compatibility with current
      TCP have tended towards complexity and ugliness.

      This memo proposes a simple solution to the problem, using the TCP
      echo options defined in RFC-1072.  Section 2.2 which follows
      describes the original use of these options to carry timestamps in
      order to measure RTT accurately.  Section 2.3 proposes a method of
      using these same timestamps to reject old duplicate segments that
      could corrupt an open TCP connection.  Section 3 discusses the
      application of this mechanism to avoiding old duplicates from
      previous incarnations.

   2.2  TCP Timestamps

      RFC-1072 defined two TCP options, Echo and Echo Reply.  Echo
      carries a 32-bit number, and the receiver of the option must
      return this same value to the source host in an Echo Reply option.

      RFC-1072 furthermore describes the use of these options to contain
      32-bit timestamps, for measuring the RTT.  A TCP sending data
      would include Echo options containing the current clock value.
      The receiver would echo these timestamps in returning segments
      (generally, ACK segments).  The difference between a timestamp
      from an Echo Reply option and the current time would then measure
      the RTT at the sender.

      This mechanism was designed to solve the following problem: almost
      all TCP implementations base their RTT measurements on a sample of
      only one packet per window.  If we look at RTT estimation as a
      signal processing problem (which it is), a data signal at some
      frequency (the packet rate) is being sampled at a lower frequency
      (the window rate).  Unfortunately, this lower sampling frequency
      violates Nyquist's criteria and may introduce "aliasing" artifacts
      into the estimated RTT [Hamming77].

      A good RTT estimator with a conservative retransmission timeout
      calculation can tolerate the aliasing when the sampling frequency
      is "close" to the data frequency.   For example, with a window of
      8 packets, the sample rate is 1/8 the data frequency -- less than
      an order of magnitude different.  However, when the window is tens
      or hundreds of packets, the RTT estimator may be seriously in



Jacobson, Braden & Zhang                                        [Page 5]

RFC 1185               TCP over High-Speed Paths            October 1990


      error, resulting in spurious retransmissions.

      A solution to the aliasing problem that actually simplifies the
      sender substantially (since the RTT code is typically the single
      biggest protocol cost for TCP) is as follows: the will sender
      place a timestamp in each segment and the receiver will reflect
      these timestamps back in ACK segments.  Then a single subtract
      gives the sender an accurate RTT measurement for every ACK segment
      (which will correspond to every other data segment, with a

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?