📄 rfc1191.txt
字号:
entries should be initialized to be the MTU of the associated
first-hop data link, and must never be changed by the PMTU Discovery
process. (PMTU Discovery only creates or changes entries for
per-host routes). Until a Datagram Too Big message is received, the
PMTU associated with the initially-chosen route is presumed to be
accurate.
When a Datagram Too Big message is received, the ICMP layer
determines a new estimate for the Path MTU (either from a non-zero
Next-Hop MTU value in the packet, or using the method described in
section 5). If a per-host route for this path does not exist, then
one is created (almost as if a per-host ICMP Redirect is being
processed; the new route uses the same first-hop router as the
current route). If the PMTU estimate associated with the per-host
route is higher than the new estimate, then the value in the routing
entry is changed.
The packetization layers must be notified about decreases in the
PMTU. Any packetization layer instance (for example, a TCP
connection) that is actively using the path must be notified if the
PMTU estimate is decreased.
Note: even if the Datagram Too Big message contains an
Original Datagram Header that refers to a UDP packet, the TCP
layer must be notified if any of its connections use the given
Mogul & Deering [page 10]
RFC 1191 Path MTU Discovery November 1990
path.
Also, the instance that sent the datagram that elicited the Datagram
Too Big message should be notified that its datagram has been
dropped, even if the PMTU estimate has not changed, so that it may
retransmit the dropped datagram.
Note: The notification mechanism can be analogous to the
mechanism used to provide notification of an ICMP Source
Quench message. In some implementations (such as
4.2BSD-derived systems), the existing notification mechanism
is not able to identify the specific connection involved, and
so an additional mechanism is necessary.
Alternatively, an implementation can avoid the use of an
asynchronous notification mechanism for PMTU decreases by
postponing notification until the next attempt to send a
datagram larger than the PMTU estimate. In this approach,
when an attempt is made to SEND a datagram with the DF bit
set, and the datagram is larger than the PMTU estimate, the
SEND function should fail and return a suitable error
indication. This approach may be more suitable to a
connectionless packetization layer (such as one using UDP),
which (in some implementations) may be hard to "notify" from
the ICMP layer. In this case, the normal timeout-based
retransmission mechanisms would be used to recover from the
dropped datagrams.
It is important to understand that the notification of the
packetization layer instances using the path about the change in the
PMTU is distinct from the notification of a specific instance that a
packet has been dropped. The latter should be done as soon as
practical (i.e., asynchronously from the point of view of the
packetization layer instance), while the former may be delayed until
a packetization layer instance wants to create a packet.
Retransmission should be done for only for those packets that are
known to be dropped, as indicated by a Datagram Too Big message.
6.3. Purging stale PMTU information
Internetwork topology is dynamic; routes change over time. The PMTU
discovered for a given destination may be wrong if a new route comes
into use. Thus, PMTU information cached by a host can become stale.
Because a host using PMTU Discovery always sets the DF bit, if the
stale PMTU value is too large, this will be discovered almost
Mogul & Deering [page 11]
RFC 1191 Path MTU Discovery November 1990
immediately once a datagram is sent to the given destination. No
such mechanism exists for realizing that a stale PMTU value is too
small, so an implementation should "age" cached values. When a PMTU
value has not been decreased for a while (on the order of 10
minutes), the PMTU estimate should be set to the first-hop data-link
MTU, and the packetization layers should be notified of the change.
This will cause the complete PMTU Discovery process to take place
again.
Note: an implementation should provide a means for changing
the timeout duration, including setting it to "infinity". For
example, hosts attached to an FDDI network which is then
attached to the rest of the Internet via a slow serial line
are never going to discover a new non-local PMTU, so they
should not have to put up with dropped datagrams every 10
minutes.
An upper layer MUST not retransmit datagrams in response to an
increase in the PMTU estimate, since this increase never comes in
response to an indication of a dropped datagram.
One approach to implementing PMTU aging is to add a timestamp field
to the routing table entry. This field is initialized to a
"reserved" value, indicating that the PMTU has never been changed.
Whenever the PMTU is decreased in response to a Datagram Too Big
message, the timestamp is set to the current time.
Once a minute, a timer-driven procedure runs through the routing
table, and for each entry whose timestamp is not "reserved" and is
older than the timeout interval:
- The PMTU estimate is set to the MTU of the associated first
hop.
- Packetization layers using this route are notified of the
increase.
PMTU estimates may disappear from the routing table if the per-host
routes are removed; this can happen in response to an ICMP Redirect
message, or because certain routing-table daemons delete old routes
after several minutes. Also, on a multi-homed host a topology change
may result in the use of a different source interface. When this
happens, if the packetization layer is not notified then it may
continue to use a cached PMTU value that is now too small. One
solution is to notify the packetization layer of a possible PMTU
change whenever a Redirect message causes a route change, and
whenever a route is simply deleted from the routing table.
Mogul & Deering [page 12]
RFC 1191 Path MTU Discovery November 1990
Note: a more sophisticated method for detecting PMTU increases
is described in section 7.1.
6.4. TCP layer actions
The TCP layer must track the PMTU for the destination of a
connection; it should not send datagrams that would be larger than
this. A simple implementation could ask the IP layer for this value
(using the GET_MAXSIZES interface described in [1]) each time it
created a new segment, but this could be inefficient. Moreover, TCP
implementations that follow the "slow-start" congestion-avoidance
algorithm [4] typically calculate and cache several other values
derived from the PMTU. It may be simpler to receive asynchronous
notification when the PMTU changes, so that these variables may be
updated.
A TCP implementation must also store the MSS value received from its
peer (which defaults to 536), and not send any segment larger than
this MSS, regardless of the PMTU. In 4.xBSD-derived implementations,
this requires adding an additional field to the TCP state record.
Finally, when a Datagram Too Big message is received, it implies that
a datagram was dropped by the router that sent the ICMP message. It
is sufficient to treat this as any other dropped segment, and wait
until the retransmission timer expires to cause retransmission of the
segment. If the PMTU Discovery process requires several steps to
estimate the right PMTU, this could delay the connection by many
round-trip times.
Alternatively, the retransmission could be done in immediate response
to a notification that the Path MTU has changed, but only for the
specific connection specified by the Datagram Too Big message. The
datagram size used in the retransmission should, of course, be no
larger than the new PMTU.
Note: One MUST not retransmit in response to every Datagram
Too Big message, since a burst of several oversized segments
will give rise to several such messages and hence several
retransmissions of the same data. If the new estimated PMTU
is still wrong, the process repeats, and there is an
exponential growth in the number of superfluous segments sent!
This means that the TCP layer must be able to recognize when a
Datagram Too Big notification actually decreases the PMTU that
it has already used to send a datagram on the given
connection, and should ignore any other notifications.
Mogul & Deering [page 13]
RFC 1191 Path MTU Discovery November 1990
Modern TCP implementations incorporate "congestion advoidance" and
"slow-start" algorithms to improve performance [4]. Unlike a
retransmission caused by a TCP retransmission timeout, a
retransmission caused by a Datagram Too Big message should not change
the congestion window. It should, however, trigger the slow-start
mechanism (i.e., only one segment should be retransmitted until
acknowledgements begin to arrive again).
TCP performance can be reduced if the sender's maximum window size is
not an exact multiple of the segment size in use (this is not the
congestion window size, which is always a multiple of the segment
size). In many system (such as those derived from 4.2BSD), the
segment size is often set to 1024 octets, and the maximum window size
(the "send space") is usually a multiple of 1024 octets, so the
proper relationship holds by default. If PMTU Discovery is used,
however, the segment size may not be a submultiple of the send space,
and it may change during a connection; this means that the TCP layer
may need to change the transmission window size when PMTU Discovery
changes the PMTU value. The maximum window size should be set to the
greatest multiple of the segment size (PMTU - 40) that is less than
or equal to the sender's buffer space size.
PMTU Discovery does not affect the value sent in the TCP MSS option,
because that value is used by the other end of the connection, which
may be using an unrelated PMTU value.
6.5. Issues for other transport protocols
Some transport protocols (such as ISO TP4 [3]) are not allowed to
repacketize when doing a retransmission. That is, once an attempt is
made to transmit a datagram of a certain size, its contents cannot be
split into smaller datagrams for retransmission. In such a case, the
original datagram should be retransmitted without the DF bit set,
allowing it to be fragmented as necessary to reach its destination.
Subsequent datagrams, when transmitted for the first time, should be
no larger than allowed by the Path MTU, and should have the DF bit
set.
The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
protocol [11] that, in many cases, sends datagrams that must be
fragmented even for the first-hop link. This might improve
performance in certain cases, but it is known to cause reliability
and performance problems, especially when the client and server are
separated by routers.
We recommend that NFS implementations use PMTU Discovery whenever
Mogul & Deering [page 14]
RFC 1191 Path MTU Discovery November 1990
routers are involved. Most NFS implementations allow the RPC
datagram size to be changed at mount-time (indirectly, by changing
the effective file system block size), but might require some
modification to support changes later on.
Also, since a single NFS operation cannot be split across several UDP
datagrams, certain operations (primarily, those operating on file
names and directories) require a minimum datagram size that may be
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -