📄 rfc2187.txt
字号:
RFC 2187 ICP September 1997
5.3.8. ICP_OP_ERR
Silently ignored.
5.3.9. When all peers MISS.
For ICP_OP_HIT and ICP_OP_SECHO the request is forwarded immediately.
For ICP_OP_HIT_OBJ there is no need to forward the request. For all
other reply opcodes, we wait until the expected number of replies
have been received. When we have all of the expected replies, or
when the query timeout occurs, it is time to forward the request.
Since MISS replies were received from all peers, we must either
select a parent cache or the origin server.
o If the peers are using the ICP_FLAG_SRC_RTT feature, we forward
the request to the peer with the lowest RTT to the origin
server. If the local cache is also measuring RTT's to origin
servers, and is closer than any of the parents, the request is
forwarded directly to the origin server.
o If there is a FIRST_PARENT_MISS parent available, the request
will be forwarded there.
o If the ICP query/reply exchange did not produce any appropriate
parents, the request will be sent directly to the origin server
(unless firewall restrictions prevent it).
5.4. ICP Options
The following options were added to Squid to support some new
features while maintaining backward compatibility with the Harvest
implementation.
5.4.1. ICP_FLAG_HIT_OBJ
This flag is off by default and will be set in an ICP_OP_QUERY
message only if these three criteria are met:
o It is enabled in the cache configuration file with `udp_hit_obj
on'.
o The peer must be using ICP version 2.
o The HTTP request must not include the "Pragma: no-cache" header.
Wessels & Claffy Informational [Page 13]
RFC 2187 ICP September 1997
5.4.2. ICP_FLAG_SRC_RTT
This flag is off by default and will be set in an ICP_OP_QUERY
message only if these two criteria are met:
o It is enabled in the cache configuration file with `query_icmp
on'.
o The peer must be using ICP version 2.
6. Firewalls
Operating a Web cache behind a firewall or in a private network poses
some interesting problems. The hard part is figuring out whether the
cache is able to connect to the origin server. Harvest and Squid
provide an `inside_firewall' configuration directive to list DNS
domains on the near side of a firewall. Everything else is assumed
to be on the far side of a firewall. Squid also has a `firewall_ip'
directive so that inside hosts can be specified by IP addresses as
well.
In a simple configuration, a Squid cache behind a firewall will have
only one parent cache (which is on the firewall itself). In this
case, Squid must use that parent for all servers beyond the firewall,
so there is no need to utilize ICP.
In a more complex configuration, there may be a number of peer caches
also behind the firewall. Here, ICP may be used to check for cache
hits in the peers. Occasionally, when ICP is being used, there may
not be any replies received. If the cache were not behind a
firewall, the request would be forwarded directly to the origin
server. But in this situation, the cache must pick a parent cache,
either randomly or due to configuration information. For example,
Squid allows a parent cache to be designated as a default choice when
no others are available.
7. Multicast
For efficient distribution, a cache may deliver ICP queries to a
multicast address, and neighbor caches may join the multicast group
to receive such queries.
Current practice is that caches send ICP replies only to unicast
addresses, for several reasons:
o Multicasting ICP replies would not reduce the number of packets
sent.
Wessels & Claffy Informational [Page 14]
RFC 2187 ICP September 1997
o It prevents other group members from receiving unexpected
replies.
o The reply should follow unicast routing paths to indicate
(unicast) connectivity between the receiver and the sender since
the subsequent HTTP request will be unicast routed.
Trust is an important aspect of inter-cache relationships. A Web
cache should not automatically trust any cache which replies to a
multicast ICP query. Caches should ignore ICP messages from
addresses not specifically configured as neighbors. Otherwise, one
could easily pollute a cache mesh by running an illegitimate cache
and having it join a group, return ICP_OP_HIT for all requests, and
then deliver bogus content.
When sending to multicast groups, cache administrators must be
careful to use the minimum multicast TTL required to reach all group
members. Joining a multicast group requires no special privileges
and there is no way to prevent anyone from joining "your" group. Two
groups of caches utilizing the same multicast address could overlap,
which would cause a cache to receive ICP replies from unknown
neighbors. The unknown neighbors would not be used to retrieve the
object data, but the cache would constantly receive ICP replies that
it must always ignore.
To prevent an overlapping cache mesh, caches should thus limit the
scope of their ICP queries with appropriate TTLs; an application such
as mtrace[6] can determine appropriate multicast TTLs.
As mentioned in section 5.1.3, we need to estimate the number of
expected replies for an ICP_OP_QUERY message. For unicast we expect
one reply for each query if the peer is up. However, for multicast
we generally expect more than one reply, but have no way of knowing
exactly how many replies to expect. Squid regularly (every 15
minutes) sends out test ICP_OP_QUERY messages to only the multicast
group peers. As with a real ICP query, a timeout event is installed
and the replies are counted until the timeout occurs. We have found
that the received count varies considerably. Therefore, the number
of replies to expect is calculated as a moving average, rounded down
to the nearest integer.
Wessels & Claffy Informational [Page 15]
RFC 2187 ICP September 1997
8. Lessons Learned
8.1. Differences Between ICP and HTTP
ICP is notably different from HTTP. HTTP supports a rich and
sophisticated set of features. In contrast, ICP was designed to be
simple, small, and efficient. HTTP request and reply headers consist
of lines of ASCII text delimited by a CRLF pair, whereas ICP uses a
fixed size header and represents numbers in binary. The only thing
ICP and HTTP have in common is the URL.
Note that the ICP message does not even include the HTTP request
method. The original implementation assumed that only GET requests
would be cachable and there would be no need to locate non-GET
requests in neighbor caches. Thus, the current version of ICP does
not accommodate non-GET requests, although the next version of this
protocol will likely include a field for the request method.
HTTP defines features that are important for caching but not
expressible with the current ICP protocol. Among these are Pragma:
no-cache, If-Modified-Since, and all of the Cache-Control features of
HTTP/1.1. An ICP_OP_HIT_OBJ message may deliver an object which may
not obey all of the request header constraints. These differences
between ICP and HTTP are the reason we discourage the use of the
ICP_OP_HIT_OBJ feature.
8.2. Parents, Siblings, Hits and Misses
Note that the ICP message does not have a field to indicate the
intent of the querying cache. That is, nowhere in the ICP request or
reply does it say that the two caches have a sibling or parent
relationship. A sibling cache can only respond with HIT or MISS, not
"you can retrieve this from me" or "you can not retrieve this from
me." The querying cache must apply the HIT or MISS reply to its
local configuration to prevent it from resolving misses through a
sibling cache. This constraint is awkward, because this aspect of
the relationship can be configured only in the cache originating the
requests, and indirectly via the access controls configured in the
queried cache as described earlier in section 4.2.
Wessels & Claffy Informational [Page 16]
RFC 2187 ICP September 1997
8.3. Different Roles of ICP
There are two different understandings of what exactly the role of
ICP is in a cache mesh. One understanding is that ICP's role is only
object location, specifically, to provide hints about whether or not
a named object exists in a neighbor cache. An implied assumption is
that cache hits are highly desirable, and ICP is used to maximize the
chance of getting them. If an ICP message is lost due to congestion,
then nothing significant is lost; the request will be satisfied
regardless.
ICP is increasingly being tasked to fill a more complex role:
conveying cache usage policy. For example, many organizations (e.g.
universities) will install a Web cache on the border of their
network. Such organizations may be happy to establish sibling
relationships with other, nearby caches, subject to the following
terms:
o Any of the organization's customers or users may request any
object (cached or not).
o Anyone may request an object already in the cache.
o Anyone may request any object from the organization's servers
behind the cache.
o All other requests are denied; specifically, the organization
will not provide transit for requests in which neither the
client nor the server falls within its domain.
To successfully convey policy the ICP exchange must very accurately
predict the result (hit, miss) of a subsequent HTTP request. The
result may often depend on other request fields, such as Cache-
Control. So it's not possible for ICP to accurately predict the
result without more, or perhaps all, of the HTTP request.
8.4. Protocol Design Flaws of ICPv2
We recognize certain flaws with the original design of ICP, and make
note of them so that future versions can avoid the same mistakes.
o The NULL-terminated URL in the payload field requires stepping
through the message an octet at a time to find some of the
fields (i.e. the beginning of object data in an ICP_OP_HIT_OBJ
message).
Wessels & Claffy Informational [Page 17]
RFC 2187 ICP September 1997
o Two fields (Sender Host Address and Requester Host Address) are
IPv4 specific. However, neither of these fields are used in
practice; they are normally zero-filled. If IP addresses have a
role in the ICP message, there needs to be an address family
descriptor for each address, and clients need to be able to say
whether they want to hear IPv6 responses or not.
o Options are limited to 32 option flags and 32 bits of option
data. This should be more like TCP, with an option descriptor
followed by option data.
o Although currently used as the cache key, the URL string no
longer serves this role adequately. Some HTTP responses now
vary according to the requestor's User-Agent and other headers.
A cache key must incorporate all non-transport headers present
in the client's request. All non-hop-by-hop request headers
should be sent in an ICP query.
o ICPv2 uses different opcode values for queries and responses.
ICP should use the same opcode for both sides of a two-sided
transaction, with a "query/response" indicator telling which
side is which.
o ICPv2 does not include any authentication fields.
9. Security Considerations
Security is an issue with ICP over UDP because of its connectionless
nature. Below we consider various vulnerabilities and methods of
attack, and their implications.
Our first line of defense is to check the source IP address of the
ICP message, e.g. as given by recvfrom(2). ICP query messages should
be processed if the access control rules allow the querying address
access to the cache. However, ICP reply messages must only be
accepted from known neighbors; a cache must ignore replies from
unknown addresses.
Because we trust the validity of an address in an IP packet, ICP is
susceptible to IP address spoofing. In this document we address some
consequences of IP address spoofing. Normally, spoofed addresses can
only be detected by routers, not by hosts. However, the IP
Authentication Header[7,8] can be used underneath ICP to provide
cryptographic authentication of the entire IP packet containing the
ICP protocol, thus eliminating the risk of IP address spoofing.
Wessels & Claffy Informational [Page 18]
RFC 2187 ICP September 1997
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -