rfc3229.txt
来自「RFC 的详细文档!」· 文本 代码 · 共 1,453 行 · 第 1/5 页
TXT
1,453 行
Network Working Group J. Mogul
Request for Comments: 3229 Compaq WRL
Category: Standards Track B. Krishnamurthy
F. Douglis
AT&T
A. Feldmann
Univ. of Saarbruecken
Y. Goland
A. van Hoff
Marimba
D. Hellerstein
ERS/USDA
January 2002
Delta encoding in HTTP
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
This document describes how delta encoding can be supported as a
compatible extension to HTTP/1.1.
Many HTTP (Hypertext Transport Protocol) requests cause the retrieval
of slightly modified instances of resources for which the client
already has a cache entry. Research has shown that such modifying
updates are frequent, and that the modifications are typically much
smaller than the actual entity. In such cases, HTTP would make more
efficient use of network bandwidth if it could transfer a minimal
description of the changes, rather than the entire new instance of
the resource. This is called "delta encoding."
Mogul, et al. Standards Track [Page 1]
RFC 3229 Delta encoding in HTTP January 2002
Table of Contents
1 Introduction.................................................... 3
1.1 Related research and proposals........................... 4
2 Goals........................................................... 5
3 Terminology..................................................... 6
4 The HTTP message-generation sequence............................ 8
4.1 Relationship between deltas and ranges................... 11
5 Basic mechanisms................................................ 13
5.1 Background: an overview of HTTP cache validation......... 13
5.2 Requesting the transmission of deltas.................... 14
5.3 Choice of delta algorithm and format..................... 16
5.4 Identification of delta-encoded responses................ 16
5.5 Guaranteeing cache safety................................ 17
5.6 Transmission of delta-encoded responses.................. 18
5.7 Examples of requests combining Range and delta encoding.. 19
6 Encoding algorithms and formats................................. 22
7 Management of base instances.................................... 23
7.1 Multiple entity tags in the If-None-Match header......... 24
7.2 Hints for managing the client cache...................... 25
8 Deltas and intermediate caches.................................. 27
9 Digests for data integrity...................................... 28
10 Specification.................................................. 28
10.1 Protocol parameter specifications....................... 28
10.2 IANA Considerations..................................... 30
10.3 Basic requirements for delta-encoded responses.......... 30
10.4 Status code specifications.............................. 30
10.4.1 226 IM Used...................................... 31
10.5 Header specifications................................... 31
10.5.1 Delta-Base....................................... 31
10.5.2 IM............................................... 32
10.5.3 A-IM............................................. 33
10.6 Caching rules for 226 responses......................... 35
10.7 Rules for deltas in the presence of content-codings..... 36
10.7.1 Rules for generating deltas in the presence of
content-codings.................................. 37
10.7.2 Rules for applying deltas in the presence of
content-codings.................................. 37
10.7.3 Examples for using A-IM, IM, and content-codings. 38
10.8 New Cache-Control directives............................ 40
10.8.1 Retain directive................................. 40
10.8.2 IM directive..................................... 40
10.9 Use of compression with delta encoding.................. 41
10.10 Delta encoding and multipart/byteranges................ 42
11 Quantifying the protocol overhead.............................. 42
12 Security Considerations........................................ 44
13 Acknowledgements............................................... 44
14 Intellectual Property Rights................................... 44
Mogul, et al. Standards Track [Page 2]
RFC 3229 Delta encoding in HTTP January 2002
15 References..................................................... 44
16 Authors' addresses............................................. 47
17 Full Copyright Statement....................................... 49
1 Introduction
The World Wide Web is a distributed system, and so often benefits
from caching to reduce retrieval delays. Retrieval of a Web resource
(such as a document, image, icon, or applet) over the Internet or
other wide-area networks usually takes enough time that the delay is
over the human threshold of perception. Often, that delay is
measured in seconds. Caching can often eliminate or significantly
reduce retrieval delays.
Many Web resources change over time, so a practical caching approach
must include a coherency mechanism, to avoid presenting stale
information to the user. Originally, the Hypertext Transfer Protocol
(HTTP) provided little support for caching, but under operational
pressures, it quickly evolved to support a simple mechanism for
maintaining cache coherency.
In HTTP/1.0 [2], the server may supply a "last-modified" timestamp
with a response. If a client stores this response in a cache entry,
and then later wishes to re-use the response, it may transmit a
request message with an "If-modified-since" field containing that
timestamp; this is known as a conditional retrieval. Upon receiving
a conditional request, the server may either reply with a full
response, or, if the resource has not changed, it may send an
abbreviated reply, indicating that the client's cache entry is still
valid. HTTP/1.0 also includes a means for the server to indicate,
via an "Expires" timestamp, that a response will be valid until that
time; if so, a client may use a cached copy of the response until
that time, without first validating it using a conditional retrieval.
HTTP/1.1 [10] adds many new features to improve cache coherency and
performance. However, it preserves the all-or-none model for
responses to conditional retrievals: either the server indicates that
the resource value has not changed at all, or it must transmit the
entire current value.
Common sense suggests (and traces confirm), however, that even when a
Web resource does change, the new instance is often substantially
similar to the old one. If the difference, or "delta", between the
two instances could be sent to the client instead of the entire new
instance, a client holding a cached copy of the old instance could
apply the delta to construct the new version. In a world of finite
bandwidth, the reduction in response size and delay could be
significant.
Mogul, et al. Standards Track [Page 3]
RFC 3229 Delta encoding in HTTP January 2002
One can think of deltas as a way to squeeze as much benefit as
possible from client and proxy caches. Rather than treating an
entire response as the "cache line", with deltas we can treat
arbitrary pieces of a cached response as the replaceable unit, and
avoid transferring pieces that have not changed.
This document proposes a set of compatible extensions to HTTP/1.1
that allow clients and servers to use delta encoding with minimal
overhead.
We assume that the reader is familiar with the HTTP/1.1
specification.
1.1 Related research and proposals
The idea of delta encoding to reduce communication or storage costs
is not new. For example, the MPEG-1 video compression standard
transmits occasional still-image frames, but most of the frames sent
are encoded (to oversimplify) as changes from an adjacent frame. The
SCCS and RCS [27] systems for software version control represent
intermediate versions as deltas; SCCS starts with an original version
and encodes subsequent ones with forward deltas, whereas RCS encodes
previous versions as reverse deltas from their successors.
Jacobson's technique for compressing IP and TCP headers over slow
links [17] uses a clever, highly specialized form of delta encoding.
In spite of this history, it appears to have taken several years
before anyone thought of applying delta encoding to HTTP, perhaps
because the development of HTTP caching has been somewhat haphazard.
The first published suggestion for delta encoding appears to have
been by Williams et al. in a paper about HTTP cache removal policies
[30], but these authors did not elaborate on their design until later
[29].
The WebExpress project [15] appears to be the first published
description of an implementation of delta encoding for HTTP (which
they call "differencing"). WebExpress is aimed specifically at
wireless environments, and includes a number of orthogonal
optimizations. Also, the WebExpress design does not propose changing
the HTTP protocol itself, but rather uses a pair of interposed
proxies to convert the HTTP message stream into an optimized form.
The results reported for WebExpress differencing are impressive, but
are limited to a few selected benchmarks.
Banga et al. [1] describe the use of optimistic deltas, in which a
layer of interposed proxies on either end of a slow link collaborate
to reduce latency. If the client-side proxy has a cached copy of a
resource, the server-side proxy can simply send a delta (or a 304
Mogul, et al. Standards Track [Page 4]
RFC 3229 Delta encoding in HTTP January 2002
[Not Modified] response). If only the server-side proxy has a cached
copy, it may optimistically send its (possibly stale) copy to the
client-side proxy, followed (if necessary) by a delta once the
server-side proxy has validated its own cache entry with the origin
server. The use of optimistic deltas, unlike delta encoding,
actually increases the number of bytes sent over the network, in an
attempt to improve latency by anticipating a "Not Modified" response
from the origin server. The optimistic delta paper, like the
WebExpress paper, did not propose a change to the HTTP protocol
itself, and reported results only for a small set of selected URLs.
Mogul et al. [23] collected lengthy traces, at two different sites,
of the full contents of HTTP messages, to quantify the potential
benefits of delta-encoded responses. They showed that delta encoding
can provide remarkable improvements in response-size and response-
delay for an important subset of HTTP content types. They proposed a
set of HTTP extensions, but without the level of detail required for
a specification. Douglis et al. [8] used the same sets of full-
content traces to quantify the rate at which resources change in the
Web.
The HTTP Distribution and Replication Protocol (DRP), proposed to W3C
by Marimba, Netscape, Sun, Novell, and At Home, aims to provide a
collection of new features for HTTP, to support "the efficient
replication of data over HTTP" [13]. One aspect of the DRP proposal
is the use of "differential downloading," which is essentially a form
of delta encoding. The original DRP proposal uses a different
approach than is described here, but a forthcoming revision of DRP
will be revised to conform to the proposal in this document.
Tridgell and Mackerras [28] describe the "rsync" algorithm, which
accomplishes something similar to delta encoding. In rsync, the
client breaks a cache entry into a series of fixed-sized blocks,
computes a digest value for each block, and sends the series of
digest values to the server as part of its request. The origin
server does the same block-based computation, and returns only those
blocks whose digest values differ. We believe that it might be
possible to support rsync using the "instance manipulation" framework
described later in this document, but this has not been worked out in
any detail.
2 Goals
The goals of this proposal are:
1. Reduce the mean size of HTTP responses, thereby improving
latency and network utilization.
Mogul, et al. Standards Track [Page 5]
RFC 3229 Delta encoding in HTTP January 2002
2. Avoid any extra network round trips.
3. Minimize the amount of per-request and per-response overheads.
4. Support a variety of encoding algorithms and formats.
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?