📄 rfc1144.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   From the above, it's clear that one design goal of the compression
   should be to limit the bandwidth demand of typing and ack traffic to at
   most 300 bps.  A typical maximum typing speed is around five characters



   ----------------------------
     3. See the excellent discussion of two-wire dialup line capacity in
   [1], chap. 11.  In particular, there is widespread misunderstanding of
   the capabilities of `echo-cancelling' modems (such as those conforming
   to CCITT V.32):  Echo-cancellation can offer each side of a two-wire
   line the full line bandwidth but, since the far talker's signal adds to
   the local `noise', not the full line capacity.  The 22Kbps Shannon limit
   is a hard-limit on data rate through a two-wire telephone connection.


   Jacobson                                                        [Page 2]

   RFC 1144               Compressing TCP/IP Headers          February 1990


   per second/4/ which leaves a budget 30 - 5 = 25 characters for headers
   or five bytes of header per character typed./5/  Five byte headers solve
   problems (1) and (3) directly and, indirectly, problem (2):  A packet
   size of 100--200 bytes will easily amortize the cost of a five byte
   header and offer a user 95--98% of the line bandwidth for data.  These
   short packets mean little interference between interactive and bulk data
   traffic (see sec. 5.2).

   Another design goal is that the compression protocol be based solely on
   information guaranteed to be known to both ends of a single serial link.
   Consider the topology shown in fig. 1 where communicating hosts A and B
   are on separate local area nets (the heavy black lines) and the nets are
   connected by two serial links (the open lines between gateways C--D and
   E--F)./6/ One compression possibility would be to convert each TCP/IP
   conversation into a semantically equivalent conversation in a protocol
   with smaller headers, e.g., to an X.25 call.  But, because of routing
   transients or multipathing, it's entirely possible that some of the A--B
   traffic will follow the A-C-D-B path and some will follow the A-E-F-B
   path.  Similarly, it's possible that A->B traffic will flow A-C-D-B and
   B->A traffic will flow B-F-E-A. None of the gateways can count on seeing
   all the packets in a particular TCP conversation and a compression
   algorithm that works for such a topology cannot be tied to the TCP
   connection syntax.

   A physical link treated as two, independent, simplex links (one each
   direction) imposes the minimum requirements on topology, routing and
   pipelining.  The ends of each simplex link only have to agree on the
   most recent packet(s) sent on that link.  Thus, although any compression
   scheme involves shared state, this state is spatially and temporally

   ----------------------------
     4. See [13].  Typing bursts or multiple character keystrokes such as
   cursor keys can exceed this average rate by factors of two to four.
   However the bandwidth demand stays approximately constant since the TCP
   Nagle algorithm[8] aggregates traffic with a <200ms interarrival time
   and the improved header-to-data ratio compensates for the increased
   data.
     5. A similar analysis leads to essentially the same header size limit
   for bulk data transfer ack packets.  Assuming that the MTU has been
   selected for `unobtrusive' background file transfers (i.e., chosen so
   the packet time is 200--400 ms --- see sec. 5), there can be at most 5
   data packets per second in the `high bandwidth' direction.  A reasonable
   TCP implementation will ack at most every other data packet so at 5
   bytes per ack the reverse channel bandwidth is 2.5 * 5 = 12.5 bytes/sec.
     6. Note that although the TCP endpoints are A and B, in this example
   compression/decompression must be done at the gateway serial links,
   i.e., between C and D and between E and F. Since A and B are using IP,
   they cannot know that their communication path includes a low speed
   serial link.  It is clearly a requirement that compression not break the
   IP model, i.e., that compression function between intermediate systems
   and not just between end systems.


   Jacobson                                                        [Page 3]

   RFC 1144               Compressing TCP/IP Headers          February 1990


   local and adheres to Dave Clark's principle of fate sharing[4]:  The two
   ends can only disagree on the state if the link connecting them is
   inoperable, in which case the disagreement doesn't matter.



   3  The compression algorithm


   3.1  The basic idea

   Figure 2 shows a typical (and minimum length) TCP/IP datagram header./7/
   The header size is 40 bytes:  20 bytes of IP and 20 of TCP.
   Unfortunately, since the TCP and IP protocols were not designed by a
   committee, all these header fields serve some useful purpose and it's
   not possible to simply omit some in the name of efficiency.

   However, TCP establishes connections and, typically, tens or hundreds of
   packets are exchanged on each connection.  How much of the per-packet
   information is likely to stay constant over the life of a connection?
   Half---the shaded fields in fig. 3.  So, if the sender and receiver keep
   track of active connections/8/ and the receiver keeps a copy of the
   header from the last packet it saw from each connection, the sender gets
   a factor-of-two compression by sending only a small (<= 8 bit)
   connection identifier together with the 20 bytes that change and letting
   the receiver fill in the 20 fixed bytes from the saved header.

   One can scavenge a few more bytes by noting that any reasonable
   link-level framing protocol will tell the receiver the length of a
   received message so total length (bytes 2 and 3) is redundant.  But then
   the header checksum (bytes 10 and 11), which protects individual hops
   from processing a corrupted IP header, is essentially the only part of
   the IP header being sent.  It seems rather silly to protect the
   transmission of information that isn't being transmitted.  So, the
   receiver can check the header checksum when the header is actually sent
   (i.e., in an uncompressed datagram) but, for compressed datagrams,
   regenerate it locally at the same time the rest of the IP header is
   being regenerated./9/


   ----------------------------
     7. The TCP and IP protocols and protocol headers are described in [10]
   and [11].
     8. The 96-bit tuple <src address, dst address, src port, dst port>
   uniquely identifies a TCP connection.
     9. The IP header checksum is not an end-to-end checksum in the sense
   of [14]:  The time-to-live update forces the IP checksum to be
   recomputed at each hop.  The author has had unpleasant personal
   experience with the consequences of violating the end-to-end argument in
   [14] and this protocol is careful to pass the end-to-end TCP checksum
   through unmodified.  See sec. 4.


   Jacobson                                                        [Page 4]

   RFC 1144               Compressing TCP/IP Headers          February 1990


   This leaves 16 bytes of header information to send.  All of these bytes
   are likely to change over the life of the conversation but they do not
   all change at the same time.  For example, during an FTP data transfer
   only the packet ID, sequence number and checksum change in the
   sender->receiver direction and only the packet ID, ack, checksum and,
   possibly, window, change in the receiver->sender direction.  With a copy
   of the last packet sent for each connection, the sender can figure out
   what fields change in the current packet then send a bitmask indicating
   what changed followed by the changing fields./10/

   If the sender only sends fields that differ, the above scheme gets the
   average header size down to around ten bytes.  However, it's worthwhile
   looking at how the fields change:  The packet ID typically comes from a
   counter that is incremented by one for each packet sent.  I.e., the
   difference between the current and previous packet IDs should be a
   small, positive integer, usually <256 (one byte) and frequently = 1.
   For packets from the sender side of a data transfer, the sequence number
   in the current packet will be the sequence number in the previous packet
   plus the amount of data in the previous packet (assuming the packets are
   arriving in order).  Since IP packets can be at most 64K, the sequence
   number change must be < 2^16 (two bytes).  So, if the differences in the
   changing fields are sent rather than the fields themselves, another
   three or four bytes per packet can be saved.

   That gets us to the five-byte header target.  Recognizing a couple of
   special cases will get us three byte headers for the two most common
   cases---interactive typing traffic and bulk data transfer---but the
   basic compression scheme is the differential coding developed above.
   Given that this intellectual exercise suggests it is possible to get
   five byte headers, it seems reasonable to flesh out the missing details
   and actually implement something.


   3.2  The ugly details

   3.2.1  Overview

   Figure 4 shows a block diagram of the compression software.  The
   networking system calls a SLIP output driver with an IP packet to be

   ----------------------------
    10. This is approximately Thinwire-I from [5].  A slight modification
   is to do a `delta encoding' where the sender subtracts the previous
   packet from the current packet (treating each packet as an array of 16
   bit integers), then sends a 20-bit mask indicating the non-zero
   differences followed by those differences.  If distinct conversations
   are separated, this is a fairly effective compression scheme (e.g.,
   typically 12-16 byte headers) that doesn't involve the compressor
   knowing any details of the packet structure.  Variations on this theme
   have been used, successfully, for a number of years (e.g., the Proteon
   router's serial link protocol[3]).


   Jacobson                                                        [Page 5]

   RFC 1144               Compressing TCP/IP Headers          February 1990


   sent over the serial line.  The packet goes through a compressor which
   checks if the protocol is TCP. Non-TCP packets and `uncompressible' TCP
   packets (described below) are just marked as TYPE_IP and passed to a
   framer.  Compressible TCP packets are looked up in an array of packet
   headers.  If a matching connection is found, the incoming packet is
   compressed, the (uncompressed) packet header is copied into the array,
   and a packet of type COMPRESSED_TCP is sent to the framer.  If no match
   is found, the oldest entry in the array is discarded, the packet header
   is copied into that slot, and a packet of type UNCOMPRESSED_TCP is sent
   to the framer.  (An UNCOMPRESSED_TCP packet is identical to the original
   IP packet except the IP protocol field is replaced with a connection
   number---an index into the array of saved, per-connection packet
   headers.  This is how the sender (re-)synchronizes the receiver and
   `seeds' it with the first, uncompressed packet of a compressed packet
   sequence.)

   The framer is responsible for communicating the packet data, type and
   boundary (so the decompressor can learn how many bytes came out of the
   compressor).  Since the compression is a differential coding, the framer
   must not re-order packets (this is rarely a concern over a single serial
   link).  It must also provide good error detection and, if connection
   numbers are compressed, must provide an error indication to the
   decompressor (see sec. 4)./11/

   The decompressor does a `switch' on the type of incoming packets:  For
   TYPE_IP, the packet is simply passed through.  For UNCOMPRESSED_TCP, the
   connection number is extracted from the IP protocol field and
   IPPROTO_TCP is restored, then the connection number is used as an index
   into the receiver's array of saved TCP/IP headers and the header of the
   incoming packet is copied into the indexed slot.  For COMPRESSED_TCP,
   the connection number is used as an array index to get the TCP/IP header
   of the last packet from that connection, the info in the compressed
   packet is used to update that header, then a new packet is constructed
   containing the now-current header from the array concatenated with the
   data from the compressed packet.

   Note that the communication is simplex---no information flows in the
   decompressor-to-compressor direction.  In particular, this implies that
   the decompressor is relying on TCP retransmissions to correct the saved
   state in the event of line errors (see sec. 4).





   ----------------------------
    11. Link level framing is outside the scope of this document.  Any
   framing that provides the facilities listed in this paragraph should be
   adequate for the compression protocol.  However, the author encourages
   potential implementors to see [9] for a proposed, standard, SLIP
   framing.


   Jacobson                                                        [Page 6]

   RFC 1144               Compressing TCP/IP Headers          February 1990


   3.2.2  Compressed packet format

   Figure 5 shows the format of a compressed TCP/IP packet.  There is a
   change mask that identifies which of the fields expected to change
   per-packet actually changed, a connection number so the receiver can
   locate the saved copy of the last packet for this TCP connection, the
   unmodified TCP checksum so the end-to-end data integrity check will
   still be valid, then for each bit set in the change mask, the amount the
   associated field changed.  (Optional fields, controlled by the mask, are
   enclosed in dashed lines in the figure.)  In all cases, the bit is set
   if the associated field is present and clear if the field is absent./12/

   Since the delta's in the sequence number, etc., are usually small,
   particularly if the tuning guidelines in section 5 are followed, all the
   numbers are encoded in a variable length scheme that, in practice,
   handles most traffic with eight bits:  A change of one through 255 is
   represented in one byte.  Zero is improbable (a change of zero is never
   sent) so a byte of zero signals an extension:  The next two bytes are
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -