📄 rfc2187.txt
字号:
Wessels & Claffy Informational [Page 6]
RFC 2187 ICP September 1997
Note that the cache.isp.com cache need not explicitly specify the
customer cache as a peer, nor is the type of relationship encoded
within the ICP query itself. The access control entries regulate the
relationships between this cache and its neighbors. For our example,
the ISP would use:
acl src Customer proxy.customer.org
http_access allow Customer
icp_access allow Customer
This defines an access control entry named `Customer' which specifies
a source IP address of the customer cache machine. The customer
cache would then be allowed to make any request to both the HTTP and
ICP ports (including cache misses). This configuration implies that
the ISP cache is a parent of the customer.
If the ISP wanted to enforce a sibling relationship, it would need to
deny access to cache misses. This would be done as follows:
miss_access deny Customer
Of course the ISP should also communicate this to the customer, so
that the customer will change his configuration from parent to
sibling. Otherwise, if the customer requests an object not in the
ISP cache, an error message is generated.
5. Applying the Protocol
The following sections describe the ICP implementation in the
Harvest[3] (research version) and Squid Web cache[5] packages. In
terms of version numbers, this means version 1.4pl2 for Harvest and
version 1.1.10 for Squid.
The basic sequence of events in an ICP transaction is as follows:
1. Local cache receives an HTTP[1] request from a cache client.
2. The local cache sends ICP queries (section 5.1).
3. The peer cache(s) receive the queries and send ICP replies
(section 5.2).
4. The local cache receives the ICP replies and decides where to
forward the request (section 5.3).
Wessels & Claffy Informational [Page 7]
RFC 2187 ICP September 1997
5.1. Sending ICP Queries
5.1.1. Determine whether to use ICP at all
Not every HTTP request requires an ICP query to be sent. Obviously,
cache hits will not need ICP because the request is satisfied
immediately. For origin servers very close to the cache, we do not
want to use any neighbor caches. In Squid and Harvest, the
administrator specifies what constitutes a `local' server with the
`local_domain' and `local_ip' configuration options. The cache
always contacts a local server directly, never querying a peer cache.
There are other classes of requests that the cache (or the
administrator) may prefer to forward directly to the origin server.
In Squid and Harvest, one such class includes all non-GET request
methods. A Squid cache can also be configured to not use peers for
URLs matching the `hierarchy_stoplist'.
In order for an HTTP request to yield an ICP transaction, it must:
o not be a cache hit
o not be to a local server
o be a GET request, and
o not match the `hierarchy_stoplist' configuration.
We call this a "hierarchical" request. A "non-hierarchical" request
is one that doesn't generate any ICP traffic. To avoid processing
requests that are likely to lower cache efficiency, one can configure
the cache to not consult the hierarchy for URLs that contain certain
strings (e.g. `cgi_bin').
5.1.2. Determine which peers to query
By default, a cache sends an ICP_OP_QUERY message to each peer,
unless any one of the following are true:
o Restrictions prevent querying a peer for this request, based on
the configuration directive `cache_host_domain', which specifies
a set of DNS domains (from the URLs) for which the peer should
or should not be queried. In Squid, a more flexible directive
('cache_host_acl') supports restrictions on other parts of the
request (method, port number, source, etc.).
Wessels & Claffy Informational [Page 8]
RFC 2187 ICP September 1997
o The peer is a sibling, and the HTTP request includes a "Pragma:
no-cache" header. This is because the sibling would be asked to
transit the request, which is not allowed.
o The peer is configured to never be sent ICP queries (i.e. with
the `no-query' option).
If the determination yields only one queryable ICP peer, and the
Squid configuration directive `single_parent_bypass' is set, then one
can bypass waiting for the single ICP response and just send the HTTP
request directly to the peer cache.
The Squid configuration option `source_ping' configures a Squid cache
to send a ping to the original source simultaneous with its ICP
queries, in case the origin is closer than any of the caches.
5.1.3. Calculate the expected number of ICP replies
Harvest and Squid want to maximize the chance to get a HIT reply from
one of the peers. Therefore, the cache waits for all ICP replies to
be received. Normally, we expect to receive an ICP reply for each
query sent, except:
o When the peer is believed to be down. If the peer is down Squid
and Harvest continue to send it ICP queries, but do not expect
the peer to reply. When an ICP reply is again received from the
peer, its status will be changed to up.
The determination of up/down status has varied a little bit as
the Harvest and Squid software evolved. Both Harvest and Squid
mark a peer down when it fails to reply to 20 consecutive ICP
queries. Squid also marks a peer down when a TCP connection
fails, and up again when a diagnostic TCP connection succeeds.
o When sending to a multicast address. In this case we'll
probably expect to receive more than one reply, and have no way
to definitively determine how many to expect. We discuss
multicast issues in section 7 below.
5.1.4. Install timeout event
Because ICP uses UDP as underlying transport, ICP queries and replies
may sometimes be dropped by the network. The cache installs a
timeout event in case not all of the expected replies arrive. By
default Squid and Harvest use a two-second timeout. If object
retrieval has not commenced when the timeout occurs, a source is
selected as described in section 5.3.9 below.
Wessels & Claffy Informational [Page 9]
RFC 2187 ICP September 1997
5.2. Receiving ICP Queries and Sending Replies
When an ICP_OP_QUERY message is received, the cache examines it and
decides which reply message is to be sent. It will send one of the
following reply opcodes, tested for use in the order listed:
5.2.1. ICP_OP_ERR
The URL is extracted from the payload and parsed. If parsing fails,
an ICP_OP_ERR message is returned.
5.2.2. ICP_OP_DENIED
The access controls are checked. If the peer is not allowed to make
this request, ICP_OP_DENIED is returned. Squid counts the number of
ICP_OP_DENIED messages sent to each peer. If more than 95% of more
than 100 replies have been denied, then no reply is sent at all.
This prevents misconfigured caches from endlessly sending unnecessary
ICP messages back and forth.
5.2.3. ICP_OP_HIT
If the cache reaches this point without already matching one of the
previous opcodes, it means the request is allowed and we must
determine if it will be HIT or MISS, so we check if the URL exists in
the local cache. If so, and if the cached entry is fresh for at
least the next 30 seconds, we can return an ICP_OP_HIT message. The
stale/fresh determination uses the local refresh (or TTL) rules.
Note that a race condition exists for ICP_OP_HIT replies to sibling
peers. The ICP_OP_HIT means that a subsequent HTTP request for the
named URL would result in a cache hit. We assume that the HTTP
request will come very quickly after the ICP_OP_HIT. However, there
is a slight chance that the object might be purged from this cache
before the HTTP request is received. If this happens, and the
replying peer has applied Squid's `miss_access' configuration then
the user will receive a very confusing access denied message.
5.2.3.1. ICP_OP_HIT_OBJ
Before returning the ICP_OP_HIT message, we see if we can send an
ICP_OP_HIT_OBJ message instead. We can use ICP_OP_HIT_OBJ if:
o The ICP_OP_QUERY message had the ICP_FLAG_HIT_OBJ flag set.
Wessels & Claffy Informational [Page 10]
RFC 2187 ICP September 1997
o The entire object (plus URL) will fit in an ICP message. The
maximum ICP message size is 16 Kbytes, but an application may
choose to set a smaller maximum value for ICP_OP_HIT_OBJ
replies.
Normally ICP replies are sent immediately after the query is
received, but the ICP_OP_HIT_OBJ message cannot be sent until the
object data is available to copy into the reply message. For Squid
and Harvest this means the object must be "swapped in" from disk if
it is not already in memory. Therefore, on average, an
ICP_OP_HIT_OBJ reply will have higher latency than ICP_OP_HIT.
5.2.4. ICP_OP_MISS_NOFETCH
At this point we have a cache miss. ICP has two types of miss
replies. If the cache does not want the peer to request the object
from it, it sends an ICP_OP_MISS_NOFETCH message.
5.2.5. ICP_OP_MISS
Finally, an ICP_OP_MISS reply is returned as the default. If the
replying cache is a parent of the querying cache, the ICP_OP_MISS
indicates an invitation to fetch the URL through the replying cache.
5.3. Receiving ICP Replies
Some ICP replies will be ignored; specifically, when any of the
following are true:
o The reply message originated from an unknown peer.
o The object named by the URL does not exist.
o The object is already being fetched.
5.3.1. ICP_OP_DENIED
If more than 95% of more than 100 replies from a peer cache have been
ICP_OP_DENIED, then such a high denial rate most likely indicates a
configuration error, either locally or at the peer. For this reason,
no further queries will be sent to the peer for the duration of the
cache process.
5.3.2. ICP_OP_HIT
Object retrieval commences immediately from the replying peer.
Wessels & Claffy Informational [Page 11]
RFC 2187 ICP September 1997
5.3.3. ICP_OP_HIT_OBJ
The object data is extracted from the ICP message and the retrieval
is complete. If there is some problem with the ICP_OP_HIT_OBJ
message (e.g. missing data) the reply will be treated like a standard
ICP_OP_HIT.
5.3.4. ICP_OP_SECHO
Object retrieval commences immediately from the origin server because
the ICP_OP_SECHO reply arrived prior to any ICP_OP_HIT's. If an
ICP_OP_HIT had arrived prior, this ICP_OP_SECHO reply would be
ignored because the retrieval has already started.
5.3.5. ICP_OP_DECHO
An ICP_OP_DECHO reply is handled like an ICP_OP_MISS. Non-ICP peers
must always be configured as parents; a non-ICP sibling makes no
sense. One serious problem with the ICP_OP_DECHO feature is that
since it bounces messages off the peer's UDP echo port, it does not
indicate that the peer cache is actually running -- only that network
connectivity exists between the pair.
5.3.6. ICP_OP_MISS
If the peer is a sibling, the ICP_OP_MISS reply is ignored.
Otherwise, the peer may be "remembered" for future use in case no HIT
replies are received later (section 5.3.9).
Harvest and Squid remember the first parent to return an ICP_OP_MISS
message. With Squid, the parents may be weighted so that the "first
parent to miss" may not actually be the first reply received. We
call this the FIRST_PARENT_MISS. Remember that sibling misses are
entirely ignored, we only care about misses from parents. The parent
miss RTT's can be weighted because sometimes the closest parent is
not the one people want to use.
Also, recent versions of Squid may remember the parent with the
lowest RTT to the origin server, using the ICP_FLAG_SRC_RTT option.
We call this the CLOSEST_PARENT_MISS.
5.3.7. ICP_OP_MISS_NOFETCH
This reply is essentially ignored. A cache must not forward a
request to a peer that returns ICP_OP_MISS_NOFETCH.
Wessels & Claffy Informational [Page 12]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -