⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 rfc1072.txt

📁 RFC 的详细文档!
💻 TXT
📖 第 1 页 / 共 3 页
字号:
Network Working Group                                        V. Jacobson
Request for Comments: 1072                                           LBL
                                                               R. Braden
                                                                     ISI
                                                            October 1988


                  TCP Extensions for Long-Delay Paths


Status of This Memo

   This memo proposes a set of extensions to the TCP protocol to provide
   efficient operation over a path with a high bandwidth*delay product.
   These extensions are not proposed as an Internet standard at this
   time.  Instead, they are intended as a basis for further
   experimentation and research on transport protocol performance.
   Distribution of this memo is unlimited.

1. INTRODUCTION

   Recent work on TCP performance has shown that TCP can work well over
   a variety of Internet paths, ranging from 800 Mbit/sec I/O channels
   to 300 bit/sec dial-up modems [Jacobson88].  However, there is still
   a fundamental TCP performance bottleneck for one transmission regime:
   paths with high bandwidth and long round-trip delays.  The
   significant parameter is the product of bandwidth (bits per second)
   and round-trip delay (RTT in seconds); this product is the number of
   bits it takes to "fill the pipe", i.e., the amount of unacknowledged
   data that TCP must handle in order to keep the pipeline full.  TCP
   performance problems arise when this product is large, e.g.,
   significantly exceeds 10**5 bits.  We will refer to an Internet path
   operating in this region as a "long, fat pipe", and a network
   containing this path as an "LFN" (pronounced "elephan(t)").

   High-capacity packet satellite channels (e.g., DARPA's Wideband Net)
   are LFN's.  For example, a T1-speed satellite channel has a
   bandwidth*delay product of 10**6 bits or more; this corresponds to
   100 outstanding TCP segments of 1200 bytes each!  Proposed future
   terrestrial fiber-optical paths will also fall into the LFN class;
   for example, a cross-country delay of 30 ms at a DS3 bandwidth
   (45Mbps) also exceeds 10**6 bits.

   Clever algorithms alone will not give us good TCP performance over
   LFN's; it will be necessary to actually extend the protocol.  This
   RFC proposes a set of TCP extensions for this purpose.

   There are three fundamental problems with the current TCP over LFN



Jacobson & Braden                                               [Page 1]

RFC 1072          TCP Extensions for Long-Delay Paths       October 1988


   paths:


   (1)  Window Size Limitation

        The TCP header uses a 16 bit field to report the receive window
        size to the sender.  Therefore, the largest window that can be
        used is 2**16 = 65K bytes.  (In practice, some TCP
        implementations will "break" for windows exceeding 2**15,
        because of their failure to do unsigned arithmetic).

        To circumvent this problem, we propose a new TCP option to allow
        windows larger than 2**16. This option will define an implicit
        scale factor, to be used to multiply the window size value found
        in a TCP header to obtain the true window size.


   (2)  Cumulative Acknowledgments

        Any packet losses in an LFN can have a catastrophic effect on
        throughput.  This effect is exaggerated by the simple cumulative
        acknowledgment of TCP.  Whenever a segment is lost, the
        transmitting TCP will (eventually) time out and retransmit the
        missing segment. However, the sending TCP has no information
        about segments that may have reached the receiver and been
        queued because they were not at the left window edge, so it may
        be forced to retransmit these segments unnecessarily.

        We propose a TCP extension to implement selective
        acknowledgements.  By sending selective acknowledgments, the
        receiver of data can inform the sender about all segments that
        have arrived successfully, so the sender need retransmit only
        the segments that have actually been lost.

        Selective acknowledgments have been included in a number of
        experimental Internet protocols -- VMTP [Cheriton88], NETBLT
        [Clark87], and RDP [Velten84].  There is some empirical evidence
        in favor of selective acknowledgments -- simple experiments with
        RDP have shown that disabling the selective acknowlegment
        facility greatly increases the number of retransmitted segments
        over a lossy, high-delay Internet path [Partridge87].  A
        simulation study of a simple form of selective acknowledgments
        added to the ISO transport protocol TP4 also showed promise of
        performance improvement [NBS85].







Jacobson & Braden                                               [Page 2]

RFC 1072          TCP Extensions for Long-Delay Paths       October 1988


   (3)  Round Trip Timing

        TCP implements reliable data delivery by measuring the RTT,
        i.e., the time interval between sending a segment and receiving
        an acknowledgment for it, and retransmitting any segments that
        are not acknowledged within some small multiple of the average
        RTT.  Experience has shown that accurate, current RTT estimates
        are necessary to adapt to changing traffic conditions and,
        without them, a busy network is subject to an instability known
        as "congestion collapse" [Nagle84].

        In part because TCP segments may be repacketized upon
        retransmission, and in part because of complications due to the
        cumulative TCP acknowledgement, measuring a segments's RTT may
        involve a non-trivial amount of computation in some
        implementations.  To minimize this computation, some
        implementations time only one segment per window.  While this
        yields an adequate approximation to the RTT for small windows
        (e.g., a 4 to 8 segment Arpanet window), for an LFN (e.g., 100
        segment Wideband  Network windows) it results in an unacceptably
        poor RTT estimate.

        In the presence of errors, the problem becomes worse.  Zhang
        [Zhang86], Jain [Jain86] and Karn [Karn87] have shown that it is
        not possible to accumulate reliable RTT estimates if
        retransmitted segments are included in the estimate.  Since a
        full window of data will have been transmitted prior to a
        retransmission, all of the segments in that window will have to
        be ACKed before the next RTT sample can be taken.  This means at
        least an additional window's worth of time between RTT
        measurements and, as the error rate approaches one per window of
        data (e.g., 10**-6 errors per bit for the Wideband Net), it
        becomes effectively impossible to obtain an RTT measurement.

        We propose a TCP "echo" option that allows each segment to carry
        its own timestamp.  This will allow every segment, including
        retransmissions, to be timed at negligible computational cost.


   In designing new TCP options, we must pay careful attention to
   interoperability with existing implementations.  The only TCP option
   defined to date is an "initial option", i.e., it may appear only on a
   SYN segment.  It is likely that most implementations will properly
   ignore any options in the SYN segment that they do not understand, so
   new initial options should not cause a problem.  On the other hand,
   we fear that receiving unexpected non-initial options may cause some
   TCP's to crash.




Jacobson & Braden                                               [Page 3]

RFC 1072          TCP Extensions for Long-Delay Paths       October 1988


   Therefore, in each of the extensions we propose, non-initial options
   may be sent only if an exchange of initial options has indicated that
   both sides understand the extension.  This approach will also allow a
   TCP to determine when the connection opens how big a TCP header it
   will be sending.

2. TCP WINDOW SCALE OPTION

   The obvious way to implement a window scale factor would be to define
   a new TCP option that could be included in any segment specifying a
   window.  The receiver would include it in every acknowledgment
   segment, and the sender would interpret it.  Unfortunately, this
   simple approach would not work.  The sender must reliably know the
   receiver's current scale factor, but a TCP option in an
   acknowledgement segment will not be delivered reliably (unless the
   ACK happens to be piggy-backed on data).

   However, SYN segments are always sent reliably, suggesting that each
   side may communicate its window scale factor in an initial TCP
   option.  This approach has a disadvantage: the scale must be
   established when the connection is opened, and cannot be changed
   thereafter.  However, other alternatives would be much more
   complicated, and we therefore propose a new initial option called
   Window Scale.

2.1  Window Scale Option

      This three-byte option may be sent in a SYN segment by a TCP (1)
      to indicate that it is prepared to do both send and receive window
      scaling, and (2) to communicate a scale factor to be applied to
      its receive window.  The scale factor is encoded logarithmically,
      as a power of 2 (presumably to be implemented by binary shifts).

      Note: the window in the SYN segment itself is never scaled.

      TCP Window Scale Option:

      Kind: 3

             +---------+---------+---------+
             | Kind=3  |Length=3 |shift.cnt|
             +---------+---------+---------+

      Here shift.cnt is the number of bits by which the receiver right-
      shifts the true receive-window value, to scale it into a 16-bit
      value to be sent in TCP header (this scaling is explained below).
      The value shift.cnt may be zero (offering to scale, while applying
      a scale factor of 1 to the receive window).



Jacobson & Braden                                               [Page 4]

RFC 1072          TCP Extensions for Long-Delay Paths       October 1988


      This option is an offer, not a promise; both sides must send
      Window Scale options in their SYN segments to enable window
      scaling in either direction.

2.2  Using the Window Scale Option

      A model implementation of window scaling is as follows, using the
      notation of RFC-793 [Postel81]:

      *    The send-window (SND.WND) and receive-window (RCV.WND) sizes
           in the connection state block and in all sequence space
           calculations are expanded from 16 to 32 bits.

      *    Two window shift counts are added to the connection state:
           snd.scale and rcv.scale.  These are shift counts to be
           applied to the incoming and outgoing windows, respectively.
           The precise algorithm is shown below.

      *    All outgoing SYN segments are sent with the Window Scale
           option, containing a value shift.cnt = R that the TCP would
           like to use for its receive window.

      *    Snd.scale and rcv.scale are initialized to zero, and are
           changed only during processing of a received SYN segment.  If
           the SYN segment contains a Window Scale option with shift.cnt
           = S, set snd.scale to S and set rcv.scale to R; otherwise,
           both snd.scale and rcv.scale are left at zero.

      *    The window field (SEG.WND) in the header of every incoming
           segment, with the exception of SYN segments, will be left-
           shifted by snd.scale bits before updating SND.WND:

              SND.WND = SEG.WND << snd.scale

           (assuming the other conditions of RFC793 are met, and using
           the "C" notation "<<" for left-shift).

      *    The window field (SEG.WND) of every outgoing segment, with
           the exception of SYN segments, will have been right-shifted
           by rcv.scale bits:

              SEG.WND = RCV.WND >> rcv.scale.


      TCP determines if a data segment is "old" or "new" by testing if
      its sequence number is within 2**31 bytes of the left edge of the
      window.  If not, the data is "old" and discarded.  To insure that
      new data is never mistakenly considered old and vice-versa, the



Jacobson & Braden                                               [Page 5]

RFC 1072          TCP Extensions for Long-Delay Paths       October 1988


      left edge of the sender's window has to be at least 2**31 away
      from the right edge of the receiver's window.  Similarly with the
      sender's right edge and receiver's left edge.  Since the right and
      left edges of either the sender's or receiver's window differ by
      the window size, and since the sender and receiver windows can be
      out of phase by at most the window size, the above constraints
      imply that 2 * the max window size must be less than 2**31, or

           max window < 2**30

      Since the max window is 2**S (where S is the scaling shift count)
      times at most 2**16 - 1 (the maximum unscaled window), the maximum
      window is guaranteed to be < 2*30 if S <= 14.  Thus, the shift
      count must be limited to 14.  (This allows windows of 2**30 = 1
      Gbyte.)  If a Window Scale option is received with a shift.cnt
      value exceeding 14, the TCP should log the error but use 14
      instead of the specified value.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -