📄 rfc3040.txt
字号:
Network Working Group I. Cooper
Request for Comments: 3040 Equinix, Inc.
Category: Informational I. Melve
UNINETT
G. Tomlinson
CacheFlow Inc.
January 2001
Internet Web Replication and Caching Taxonomy
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2001). All Rights Reserved.
Abstract
This memo specifies standard terminology and the taxonomy of web
replication and caching infrastructure as deployed today. It
introduces standard concepts, and protocols used today within this
application domain. Currently deployed solutions employing these
technologies are presented to establish a standard taxonomy. Known
problems with caching proxies are covered in the document titled
"Known HTTP Proxy/Caching Problems", and are not part of this
document. This document presents open protocols and points to
published material for each protocol.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Base Terms . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 First order derivative terms . . . . . . . . . . . . . . . 6
2.3 Second order derivatives . . . . . . . . . . . . . . . . . 7
2.4 Topological terms . . . . . . . . . . . . . . . . . . . . 7
2.5 Automatic use of proxies . . . . . . . . . . . . . . . . . 8
3. Distributed System Relationships . . . . . . . . . . . . . 9
3.1 Replication Relationships . . . . . . . . . . . . . . . . 9
3.1.1 Client to Replica . . . . . . . . . . . . . . . . . . . . 9
3.1.2 Inter-Replica . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Proxy Relationships . . . . . . . . . . . . . . . . . . . 10
3.2.1 Client to Non-Interception Proxy . . . . . . . . . . . . . 10
Cooper, et al. Informational [Page 1]
RFC 3040 Internet Web Replication & Caching Taxonomy January 2001
3.2.2 Client to Surrogate to Origin Server . . . . . . . . . . . 10
3.2.3 Inter-Proxy . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.3.1 (Caching) Proxy Meshes . . . . . . . . . . . . . . . . . . 11
3.2.3.2 (Caching) Proxy Arrays . . . . . . . . . . . . . . . . . . 12
3.2.4 Network Element to Caching Proxy . . . . . . . . . . . . . 12
4. Replica Selection . . . . . . . . . . . . . . . . . . . . 13
4.1 Navigation Hyperlinks . . . . . . . . . . . . . . . . . . 13
4.2 Replica HTTP Redirection . . . . . . . . . . . . . . . . . 14
4.3 DNS Redirection . . . . . . . . . . . . . . . . . . . . . 14
5. Inter-Replica Communication . . . . . . . . . . . . . . . 15
5.1 Batch Driven Replication . . . . . . . . . . . . . . . . . 15
5.2 Demand Driven Replication . . . . . . . . . . . . . . . . 16
5.3 Synchronized Replication . . . . . . . . . . . . . . . . . 16
6. User Agent to Proxy Configuration . . . . . . . . . . . . 17
6.1 Manual Proxy Configuration . . . . . . . . . . . . . . . . 17
6.2 Proxy Auto Configuration (PAC) . . . . . . . . . . . . . . 17
6.3 Cache Array Routing Protocol (CARP) v1.0 . . . . . . . . . 18
6.4 Web Proxy Auto-Discovery Protocol (WPAD) . . . . . . . . . 18
7. Inter-Proxy Communication . . . . . . . . . . . . . . . . 19
7.1 Loosely coupled Inter-Proxy Communication . . . . . . . . 19
7.1.1 Internet Cache Protocol (ICP) . . . . . . . . . . . . . . 19
7.1.2 Hyper Text Caching Protocol . . . . . . . . . . . . . . . 20
7.1.3 Cache Digest . . . . . . . . . . . . . . . . . . . . . . . 21
7.1.4 Cache Pre-filling . . . . . . . . . . . . . . . . . . . . 22
7.2 Tightly Coupled Inter-Cache Communication . . . . . . . . 22
7.2.1 Cache Array Routing Protocol (CARP) v1.0 . . . . . . . . . 22
8. Network Element Communication . . . . . . . . . . . . . . 23
8.1 Web Cache Control Protocol (WCCP) . . . . . . . . . . . . 23
8.2 Network Element Control Protocol (NECP) . . . . . . . . . 24
8.3 SOCKS . . . . . . . . . . . . . . . . . . . . . . . . . . 25
9. Security Considerations . . . . . . . . . . . . . . . . . 25
9.1 Authentication . . . . . . . . . . . . . . . . . . . . . . 26
9.1.1 Man in the middle attacks . . . . . . . . . . . . . . . . 26
9.1.2 Trusted third party . . . . . . . . . . . . . . . . . . . 26
9.1.3 Authentication based on IP number . . . . . . . . . . . . 26
9.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2.1 Trusted third party . . . . . . . . . . . . . . . . . . . 26
9.2.2 Logs and legal implications . . . . . . . . . . . . . . . 27
9.3 Service security . . . . . . . . . . . . . . . . . . . . . 27
9.3.1 Denial of service . . . . . . . . . . . . . . . . . . . . 27
9.3.2 Replay attack . . . . . . . . . . . . . . . . . . . . . . 27
9.3.3 Stupid configuration of proxies . . . . . . . . . . . . . 28
9.3.4 Copyrighted transient copies . . . . . . . . . . . . . . . 28
9.3.5 Application level access . . . . . . . . . . . . . . . . . 28
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 28
References . . . . . . . . . . . . . . . . . . . . . . . . 28
Authors' Addresses . . . . . . . . . . . . . . . . . . . . 31
Full Copyright Statement . . . . . . . . . . . . . . . . . 32
Cooper, et al. Informational [Page 2]
RFC 3040 Internet Web Replication & Caching Taxonomy January 2001
1. Introduction
Since its introduction in 1990, the World-Wide Web has evolved from a
simple client server model into a complex distributed architecture.
This evolution has been driven largely due to the scaling problems
associated with exponential growth. Distinct paradigms and solutions
have emerged to satisfy specific requirements. Two core
infrastructure components being employed to meet the demands of this
growth are replication and caching. In many cases, there is a need
for web caches and replicated services to be able to coexist.
This memo specifies standard terminology and the taxonomy of web
replication and caching infrastructure deployed in the Internet
today. The principal goal of this document is to establish a common
understanding and reference point of this application domain.
It is also expected that this document will be used in the creation
of a standard architectural framework for efficient, reliable, and
predictable service in a web which includes both replicas and caches.
Some of the protocols which this memo examines are specified only by
company technical white papers or work in progress documents. Such
references are included to demonstrate the existence of such
protocols, their experimental deployment in the Internet today, or to
aid the reader in their understanding of this technology area.
There are many protocols, both open and proprietary, employed in web
replication and caching today. A majority of the open protocols
include DNS [8], Cache Digests [21][10], CARP [14], HTTP [1], ICP
[2], PAC [12], SOCKS [7], WPAD [13], and WCCP [18][19]. These
protocols, and their use within the caching and replication
environments, are discussed below.
2. Terminology
The following terminology provides definitions of common terms used
within the web replication and caching community. Base terms are
taken, where possible, from the HTTP/1.1 specification [1] and are
included here for reference. First- and second-order derivatives are
constructed from these base terms to help define the relationships
that exist within this area.
Terms that are in common usage and which are contrary to definitions
in RFC 2616 and this document are highlighted.
Cooper, et al. Informational [Page 3]
RFC 3040 Internet Web Replication & Caching Taxonomy January 2001
2.1 Base Terms
The majority of these terms are taken as-is from RFC 2616 [1], and
are included here for reference.
client (taken from [1])
A program that establishes connections for the purpose of sending
requests.
server (taken from [1])
An application program that accepts connections in order to
service requests by sending back responses. Any given program may
be capable of being both a client and a server; our use of these
terms refers only to the role being performed by the program for a
particular connection, rather than to the program's capabilities
in general. Likewise, any server may act as an origin server,
proxy, gateway, or tunnel, switching behavior based on the nature
of each request.
proxy (taken from [1])
An intermediary program which acts as both a server and a client
for the purpose of making requests on behalf of other clients.
Requests are serviced internally or by passing them on, with
possible translation, to other servers. A proxy MUST implement
both the client and server requirements of this specification. A
"transparent proxy" is a proxy that does not modify the request or
response beyond what is required for proxy authentication and
identification. A "non-transparent proxy" is a proxy that
modifies the request or response in order to provide some added
service to the user agent, such as group annotation services,
media type transformation, protocol reduction, or anonymity
filtering. Except where either transparent or non-transparent
behavior is explicitly stated, the HTTP proxy requirements apply
to both types of proxies.
Note: The term "transparent proxy" refers to a semantically
transparent proxy as described in [1], not what is commonly
understood within the caching community. We recommend that the term
"transparent proxy" is always prefixed to avoid confusion (e.g.,
"network transparent proxy"). However, see definition of
"interception proxy" below.
The above condition requiring implementation of both the server and
client requirements of HTTP/1.1 is only appropriate for a non-network
transparent proxy.
Cooper, et al. Informational [Page 4]
RFC 3040 Internet Web Replication & Caching Taxonomy January 2001
cache (taken from [1])
A program's local store of response messages and the subsystem
that controls its message storage, retrieval, and deletion. A
cache stores cacheable responses in order to reduce the response
time and network bandwidth consumption on future, equivalent
requests. Any client or server may include a cache, though a
cache cannot be used by a server that is acting as a tunnel.
Note: The term "cache" used alone often is meant as "caching proxy".
Note: There are additional motivations for caching, for example
reducing server load (as a further means to reduce response time).
cacheable (taken from [1])
A response is cacheable if a cache is allowed to store a copy of
the response message for use in answering subsequent requests.
The rules for determining the cacheability of HTTP responses are
defined in section 13. Even if a resource is cacheable, there may
be additional constraints on whether a cache can use the cached
copy for a particular request.
gateway (taken from [1])
A server which acts as an intermediary for some other server.
Unlike a proxy, a gateway receives requests as if it were the
origin server for the requested resource; the requesting client
may not be aware that it is communicating with a gateway.
tunnel (taken from [1])
An intermediary program which is acting as a blind relay between
two connections. Once active, a tunnel is not considered a party
to the HTTP communication, though the tunnel may have been
initiated by an HTTP request. The tunnel ceases to exist when
both ends of the relayed connections are closed.
replication
"Creating and maintaining a duplicate copy of a database or file
system on a different computer, typically a server." - Free
Online Dictionary of Computing (FOLDOC)
inbound/outbound (taken from [1])
Inbound and outbound refer to the request and response paths for
messages: "inbound" means "traveling toward the origin server",
and "outbound" means "traveling toward the user agent".
network element
A network device that introduces multiple paths between source and
destination, transparent to HTTP.
Cooper, et al. Informational [Page 5]
RFC 3040 Internet Web Replication & Caching Taxonomy January 2001
2.2 First order derivative terms
The following terms are constructed taking the above base terms as
foundation.
origin server (taken from [1])
The server on which a given resource resides or is to be created.
user agent (taken from [1])
The client which initiates a request. These are often browsers,
editors, spiders (web-traversing robots), or other end user tools.
caching proxy
A proxy with a cache, acting as a server to clients, and a client
to servers.
Caching proxies are often referred to as "proxy caches" or simply
"caches". The term "proxy" is also frequently misused when
referring to caching proxies.
surrogate
A gateway co-located with an origin server, or at a different
point in the network, delegated the authority to operate on behalf
of, and typically working in close co-operation with, one or more
origin servers. Responses are typically delivered from an
internal cache.
Surrogates may derive cache entries from the origin server or from
another of the origin server's delegates. In some cases a
surrogate may tunnel such requests.
Where close co-operation between origin servers and surrogates
exists, this enables modifications of some protocol requirements,
including the Cache-Control directives in [1]. Such modifications
have yet to be fully specified.
Devices commonly known as "reverse proxies" and "(origin) server
accelerators" are both more properly defined as surrogates.
reverse proxy
See "surrogate".
server accelerator
See "surrogate".
Cooper, et al. Informational [Page 6]
RFC 3040 Internet Web Replication & Caching Taxonomy January 2001
2.3 Second order derivatives
The following terms further build on first order derivatives:
master origin server
An origin server on which the definitive version of a resource
resides.
replica origin server
An origin server holding a replica of a resource, but which may
act as an authoritative reference for client requests.
content consumer
The user or system that initiates inbound requests, through use of
a user agent.
browser
A special instance of a user agent that acts as a content
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -