📄 rfc2525.txt
字号:
Paxson, et. al. Informational [Page 22]
RFC 2525 TCP Implementation Problems March 1999
How to detect
If source code is available, that is generally the easiest way to
detect this problem. Search for each modification to the cwnd
variable; (at least) one of these will be for congestion
avoidance, and inspection of the related code should immediately
identify the problem if present.
The problem can also be detected by closely examining packet
traces taken near the sender. During congestion avoidance, cwnd
will increase by an additional segment upon the receipt of
(typically) eight acknowledgements without a loss. This increase
is in addition to the one segment increase per round trip time (or
two round trip times if the receiver is using delayed ACKs).
Furthermore, graphs of the sequence number vs. time, taken from
packet traces, are normally linear during congestion avoidance.
When viewing packet traces of transfers from senders exhibiting
this problem, the graphs appear quadratic instead of linear.
Finally, the traces will show that, with sufficiently large
windows, nearly every loss event results in a timeout.
How to fix
This problem may be corrected by removing the "+ MSS/8" term from
the congestion avoidance code that increases cwnd each time an ACK
of new data is received.
2.7.
Name of Problem
Initial RTO too low
Classification
Performance
Description
When a TCP first begins transmitting data, it lacks the RTT
measurements necessary to have computed an adaptive retransmission
timeout (RTO). RFC 1122, 4.2.3.1, states that a TCP SHOULD
initialize RTO to 3 seconds. A TCP that uses a lower value
exhibits "Initial RTO too low".
Significance
In environments with large RTTs (where "large" means any value
larger than the initial RTO), TCPs will experience very poor
performance.
Paxson, et. al. Informational [Page 23]
RFC 2525 TCP Implementation Problems March 1999
Implications
Whenever RTO < RTT, very poor performance can result as packets
are unnecessarily retransmitted (because RTO will expire before an
ACK for the packet can arrive) and the connection enters slow
start and congestion avoidance. Generally, the algorithms for
computing RTO avoid this problem by adding a positive term to the
estimated RTT. However, when a connection first begins it must
use some estimate for RTO, and if it picks a value less than RTT,
the above problems will arise.
Furthermore, when the initial RTO < RTT, it can take a long time
for the TCP to correct the problem by adapting the RTT estimate,
because the use of Karn's algorithm (mandated by RFC 1122,
4.2.3.1) will discard many of the candidate RTT measurements made
after the first timeout, since they will be measurements of
retransmitted segments.
Relevant RFCs
RFC 1122 states that TCPs SHOULD initialize RTO to 3 seconds and
MUST implement Karn's algorithm.
Trace file demonstrating it
The following trace file was taken using tcpdump at host A, the
data sender. The advertised window and SYN options have been
omitted for clarity.
07:52:39.870301 A > B: S 2786333696:2786333696(0)
07:52:40.548170 B > A: S 130240000:130240000(0) ack 2786333697
07:52:40.561287 A > B: P 1:513(512) ack 1
07:52:40.753466 A > B: . 1:513(512) ack 1
07:52:41.133687 A > B: . 1:513(512) ack 1
07:52:41.458529 B > A: . ack 513
07:52:41.458686 A > B: . 513:1025(512) ack 1
07:52:41.458797 A > B: P 1025:1537(512) ack 1
07:52:41.541633 B > A: . ack 513
07:52:41.703732 A > B: . 513:1025(512) ack 1
07:52:42.044875 B > A: . ack 513
07:52:42.173728 A > B: . 513:1025(512) ack 1
07:52:42.330861 B > A: . ack 1537
07:52:42.331129 A > B: . 1537:2049(512) ack 1
07:52:42.331262 A > B: P 2049:2561(512) ack 1
07:52:42.623673 A > B: . 1537:2049(512) ack 1
07:52:42.683203 B > A: . ack 1537
07:52:43.044029 B > A: . ack 1537
07:52:43.193812 A > B: . 1537:2049(512) ack 1
Paxson, et. al. Informational [Page 24]
RFC 2525 TCP Implementation Problems March 1999
Note from the SYN/SYN-ACK exchange, the RTT is over 600 msec.
However, from the elapsed time between the third and fourth lines
(the first packet being sent and then retransmitted), it is
apparent the RTO was initialized to under 200 msec. The next line
shows that this value has doubled to 400 msec (correct exponential
backoff of RTO), but that still does not suffice to avoid an
unnecessary retransmission.
Finally, an ACK from B arrives for the first segment. Later two
more duplicate ACKs for 513 arrive, indicating that both the
original and the two retransmissions arrived at B. (Indeed, a
concurrent trace at B showed that no packets were lost during the
entire connection). This ACK opens the congestion window to two
packets, which are sent back-to-back, but at 07:52:41.703732 RTO
again expires after a little over 200 msec, leading to an
unnecessary retransmission, and the pattern repeats. By the end
of the trace excerpt above, 1536 bytes have been successfully
transmitted from A to B, over an interval of more than 2 seconds,
reflecting terrible performance.
Trace file demonstrating correct behavior
The following trace file was taken using tcpdump at host C, the
data sender. The advertised window and SYN options have been
omitted for clarity.
17:30:32.090299 C > D: S 2031744000:2031744000(0)
17:30:32.900325 D > C: S 262737964:262737964(0) ack 2031744001
17:30:32.900326 C > D: . ack 1
17:30:32.910326 C > D: . 1:513(512) ack 1
17:30:34.150355 D > C: . ack 513
17:30:34.150356 C > D: . 513:1025(512) ack 1
17:30:34.150357 C > D: . 1025:1537(512) ack 1
17:30:35.170384 D > C: . ack 1025
17:30:35.170385 C > D: . 1537:2049(512) ack 1
17:30:35.170386 C > D: . 2049:2561(512) ack 1
17:30:35.320385 D > C: . ack 1537
17:30:35.320386 C > D: . 2561:3073(512) ack 1
17:30:35.320387 C > D: . 3073:3585(512) ack 1
17:30:35.730384 D > C: . ack 2049
The initial SYN/SYN-ACK exchange shows that RTT is more than 800
msec, and for some subsequent packets it rises above 1 second, but
C's retransmit timer does not ever expire.
References
This problem is documented in [Paxson97].
Paxson, et. al. Informational [Page 25]
RFC 2525 TCP Implementation Problems March 1999
How to detect
This problem is readily detected by inspecting a packet trace of
the startup of a TCP connection made over a long-delay path. It
can be diagnosed from either a sender-side or receiver-side trace.
Long-delay paths can often be found by locating remote sites on
other continents.
How to fix
As this problem arises from a faulty initialization, one hopes
fixing it requires a one-line change to the TCP source code.
2.8.
Name of Problem
Failure of window deflation after loss recovery
Classification
Congestion control / performance
Description
The fast recovery algorithm allows TCP senders to continue to
transmit new segments during loss recovery. First, fast
retransmission is initiated after a TCP sender receives three
duplicate ACKs. At this point, a retransmission is sent and cwnd
is halved. The fast recovery algorithm then allows additional
segments to be sent when sufficient additional duplicate ACKs
arrive. Some implementations of fast recovery compute when to
send additional segments by artificially incrementing cwnd, first
by three segments to account for the three duplicate ACKs that
triggered fast retransmission, and subsequently by 1 MSS for each
new duplicate ACK that arrives. When cwnd allows, the sender
transmits new data segments.
When an ACK arrives that covers new data, cwnd is to be reduced by
the amount by which it was artificially increased. However, some
TCP implementations fail to "deflate" the window, causing an
inappropriate amount of data to be sent into the network after
recovery. One cause of this problem is the "header prediction"
code, which is used to handle incoming segments that require
little work. In some implementations of TCP, the header
prediction code does not check to make sure cwnd has not been
artificially inflated, and therefore does not reduce the
artificially increased cwnd when appropriate.
Significance
TCP senders that exhibit this problem will transmit a burst of
data immediately after recovery, which can degrade performance, as
well as network stability. Effectively, the sender does not
Paxson, et. al. Informational [Page 26]
RFC 2525 TCP Implementation Problems March 1999
reduce the size of cwnd as much as it should (to half its value
when loss was detected), if at all. This can harm the performance
of the TCP connection itself, as well as competing TCP flows.
Implications
A TCP sender exhibiting this problem does not reduce cwnd
appropriately in times of congestion, and therefore may contribute
to congestive collapse.
Relevant RFCs
RFC 2001 outlines the fast retransmit/fast recovery algorithms.
[Brakmo95] outlines this implementation problem and offers a fix.
Trace file demonstrating it
The following trace file was taken using tcpdump at host A, the
data sender. The advertised window (which never changed) has been
omitted for clarity, except for the first packet sent by each
host.
08:22:56.825635 A.7505 > B.7505: . 29697:30209(512) ack 1 win 4608
08:22:57.038794 B.7505 > A.7505: . ack 27649 win 4096
08:22:57.039279 A.7505 > B.7505: . 30209:30721(512) ack 1
08:22:57.321876 B.7505 > A.7505: . ack 28161
08:22:57.322356 A.7505 > B.7505: . 30721:31233(512) ack 1
08:22:57.347128 B.7505 > A.7505: . ack 28673
08:22:57.347572 A.7505 > B.7505: . 31233:31745(512) ack 1
08:22:57.347782 A.7505 > B.7505: . 31745:32257(512) ack 1
08:22:57.936393 B.7505 > A.7505: . ack 29185
08:22:57.936864 A.7505 > B.7505: . 32257:32769(512) ack 1
08:22:57.950802 B.7505 > A.7505: . ack 29697 win 4096
08:22:57.951246 A.7505 > B.7505: . 32769:33281(512) ack 1
08:22:58.169422 B.7505 > A.7505: . ack 29697
08:22:58.638222 B.7505 > A.7505: . ack 29697
08:22:58.643312 B.7505 > A.7505: . ack 29697
08:22:58.643669 A.7505 > B.7505: . 29697:30209(512) ack 1
08:22:58.936436 B.7505 > A.7505: . ack 29697
08:22:59.002614 B.7505 > A.7505: . ack 29697
08:22:59.003026 A.7505 > B.7505: . 33281:33793(512) ack 1
08:22:59.682902 B.7505 > A.7505: . ack 33281
08:22:59.683391 A.7505 > B.7505: P 33793:34305(512) ack 1
08:22:59.683748 A.7505 > B.7505: P 34305:34817(512) ack 1 ***
08:22:59.684043 A.7505 > B.7505: P 34817:35329(512) ack 1
08:22:59.684266 A.7505 > B.7505: P 35329:35841(512) ack 1
08:22:59.684567 A.7505 > B.7505: P 35841:36353(512) ack 1
08:22:59.684810 A.7505 > B.7505: P 36353:36865(512) ack 1
08:22:59.685094 A.7505 > B.7505: P 36865:37377(512) ack 1
Paxson, et. al. Informational [Page 27]
RFC 2525 TCP Implementation Problems March 1999
The first 12 lines of the trace show incoming ACKs clocking out a
window of data segments. At this point in the transfer, cwnd is 7
segments. The next 4 lines of the trace show 3 duplicate ACKs
arriving from the receiver, followed by a retransmission from the
sender. At this point, cwnd is halved (to 3 segments) and
artificially incremented by the three duplicate ACKs that have
arrived, making cwnd 6 segments. The next two lines show 2 more
duplicate ACKs arriving, each of which increases cwnd by 1
segment. So, after these two duplicate ACKs arrive the cwnd is 8
segments and the sender has permission to send 1 new segment
(since there are 7 segments outstanding). The next line in the
trace shows this new segment being transmitted. The next packet
shown in the trace is an ACK from host B that covers the first 7
outstanding segments (all but the new segment sent during
recovery). This should cause cwnd to be reduced to 3 segments and
2 segments to be transmitted (since there is already 1 outstanding
segment in the network). However, as shown
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -