📄 icpv2-application.txt
字号:
Internet-Draft 8 Jul 1997 Also, recent versions of Squid may remember the parent with the low- est RTT to the origin server, using the ICP_FLAG_SRC_RTT option. We call this the CLOSEST_PARENT_MISS.5.3.7. ICP_OP_MISS_NOFETCH This reply is essentially ignored. A cache must not forward a request to a peer that returns ICP_OP_MISS_NOFETCH.5.3.8. ICP_OP_ERR Silently ignored.5.3.9. When all peers MISS. For ICP_OP_HIT and ICP_OP_SECHO the request is forwarded immediately. For ICP_OP_HIT_OBJ there is no need to forward the request. For all other reply opcodes, we wait until the expected number of replies have been received. When we have all of the expected replies, or when the query timeout occurs, it is time to forward the request. Since MISS replies were received from all peers, we must either select a parent cache or the origin server. o If the peers are using the ICP_FLAG_SRC_RTT feature, we forward the request to the peer with the lowest RTT to the origin server. If the local cache is also measuring RTT's to origin servers, and is closer than any of the parents, the request is forwarded directly to the origin server. o If there is a FIRST_PARENT_MISS parent available, the request will be forwarded there. o If the ICP query/reply exchange did not produce any appropriate parents, the request will be sent directly to the origin server (unless firewall restrictions prevent it).5.4. ICP Options The following options were added to Squid to support some new fea- tures while maintaining backward compatibility with the Harvest implementation.Wessels & Claffy [Page 13]Internet-Draft 8 Jul 19975.4.1. ICP_FLAG_HIT_OBJ This flag is off by default and will be set in an ICP_OP_QUERY mes- sage only if these three criteria are met: o It is enabled in the cache configuration file with `udp_hit_obj on'. o The peer must be using ICP version 2. o The HTTP request must not include the "Pragma: no-cache" header.5.4.2. ICP_FLAG_SRC_RTT This flag is off by default and will be set in an ICP_OP_QUERY mes- sage only if these two criteria are met: o It is enabled in the cache configuration file with `query_icmp on'. o The peer must be using ICP version 2.6. Firewalls Operating a Web cache behind a firewall or in a private network poses some interesting problems. The hard part is figuring out whether the cache is able to connect to the origin server. Harvest and Squid provide an `inside_firewall' configuration directive to list DNS domains on the near side of a firewall. Everything else is assumed to be on the far side of a firewall. Squid also has a `firewall_ip' directive so that inside hosts can be specified by IP addresses as well. In a simple configuration, a Squid cache behind a firewall will have only one parent cache (which is on the firewall itself). In this case, Squid must use that parent for all servers beyond the firewall, so there is no need to utilize ICP. In a more complex configuration, there may be a number of peer caches also behind the firewall. Here, ICP may be used to check for cache hits in the peers. Occasionally, when ICP is being used, there may not be any replies received. If the cache were not behind a fire- wall, the request would be forwarded directly to the origin server. But in this situation, the cache must pick a parent cache, either randomly or due to configuration information. For example, Squid allows a parent cache to be designated as a default choice when noWessels & Claffy [Page 14]Internet-Draft 8 Jul 1997 others are available.7. Multicast For efficient distribution, a cache may deliver ICP queries to a mul- ticast address, and neighbor caches may join the multicast group to receive such queries. Current practice is that caches send ICP replies only to unicast addresses, for several reasons: o Multicasting ICP replies would not reduce the number of packets sent. o It prevents other group members from receiving unexpected replies. o The reply should follow unicast routing paths to indicate (uni- cast) connectivity between the receiver and the sender since the subsequent HTTP request will be unicast routed. Trust is an important aspect of inter-cache relationships. A Web cache should not automatically trust any cache which replies to a multicast ICP query. Caches should ignore ICP messages from addresses not specifically configured as neighbors. Otherwise, one could easily pollute a cache mesh by running an illegitimate cache and having it join a group, return ICP_OP_HIT for all requests, and then deliver bogus content. When sending to multicast groups, cache administrators must be care- ful to use the minimum multicast TTL required to reach all group mem- bers. Joining a multicast group requires no special privileges and there is no way to prevent anyone from joining "your" group. Two groups of caches utilizing the same multicast address could overlap, which would cause a cache to receive ICP replies from unknown neigh- bors. The unknown neighbors would not be used to retrieve the object data, but the cache would constantly receive ICP replies that it must always ignore. To prevent an overlapping cache mesh, caches should thus limit the scope of their ICP queries with appropriate TTLs; an application such as mtrace[6] can determine appropriate multicast TTLs. As mentioned in section 5.1.3, we need to estimate the number of expected replies for an ICP_OP_QUERY message. For unicast we expect one reply for each query if the peer is up. However, for multicastWessels & Claffy [Page 15]Internet-Draft 8 Jul 1997 we generally expect more than one reply, but have no way of knowing exactly how many replies to expect. Squid regularly (every 15 min- utes) sends out test ICP_OP_QUERY messages to only the multicast group peers. As with a real ICP query, a timeout event is installed and the replies are counted until the timeout occurs. We have found that the received count varies considerably. Therefore, the number of replies to expect is calculated as a moving average, rounded down to the nearest integer.8. Lessons Learned8.1. Differences Between ICP and HTTP ICP is notably different from HTTP. HTTP supports a rich and sophis- ticated set of features. In contrast, ICP was designed to be simple, small, and efficient. HTTP request and reply headers consist of lines of ASCII text delimited by a CRLF pair, whereas ICP uses a fixed size header and represents numbers in binary. The only thing ICP and HTTP have in common is the URL. Note that the ICP message does not even include the HTTP request method. The original implementation assumed that only GET requests would be cachable and there would be no need to locate non-GET requests in neighbor caches. Thus, the current version of ICP does not accommodate non-GET requests, although the next version of this protocol will likely include a field for the request method. HTTP defines features that are important for caching but not express- ible with the current ICP protocol. Among these are Pragma: no- cache, If-Modified-Since, and all of the Cache-Control features of HTTP/1.1. An ICP_OP_HIT_OBJ message may deliver an object which may not obey all of the request header constraints. These differences between ICP and HTTP are the reason we discourage the use of the ICP_OP_HIT_OBJ feature.8.2. Parents, Siblings, Hits and Misses Note that the ICP message does not have a field to indicate the intent of the querying cache. That is, nowhere in the ICP request or reply does it say that the two caches have a sibling or parent rela- tionship. A sibling cache can only respond with HIT or MISS, not "you can retrieve this from me" or "you can not retrieve this from me." The querying cache must apply the HIT or MISS reply to its local configuration to prevent it from resolving misses through a sibling cache. This constraint is awkward, because this aspect ofWessels & Claffy [Page 16]Internet-Draft 8 Jul 1997 the relationship can be configured only in the cache originating the requests, and indirectly via the access controls configured in the queried cache as described earlier in section 4.2.8.3. Different Roles of ICP There are two different understandings of what exactly the role of ICP is in a cache mesh. One understanding is that ICP's role is only object location, specifically, to provide hints about whether or not a named object exists in a neighbor cache. An implied assumption is that cache hits are highly desirable, and ICP is used to maximize the chance of getting them. If an ICP message is lost due to congestion, then nothing significant is lost; the request will be satisfied regardless. ICP is increasingly being tasked to fill a more complex role: convey- ing cache usage policy. For example, many organizations (e.g. uni- versities) will install a Web cache on the border of their network. Such organizations may be happy to establish sibling relationships with other, nearby caches, subject to the following terms: o Any of the organization's customers or users may request any object (cached or not). o Anyone may request an object already in the cache. o Anyone may request any object from the organization's servers behind the cache. o All other requests are denied; specifically, the organization will not provide transit for requests in which neither the client nor the server falls within its domain. To successfully convey policy the ICP exchange must very accurately predict the result (hit, miss) of a subsequent HTTP request. The result may often depend on other request fields, such as Cache-Con- trol. So it's not possible for ICP to accurately predict the result without more, or perhaps all, of the HTTP request.8.4. Protocol Design Flaws of ICPv2 We recognize certain flaws with the original design of ICP, and make note of them so that future versions can avoid the same mistakes. o The NULL-terminated URL in the payload field requires stepping through the message an octet at a time to find some of theWessels & Claffy [Page 17]Internet-Draft 8 Jul 1997 fields (i.e. the beginning of object data in an ICP_OP_HIT_OBJ message). o Two fields (Sender Host Address and Requester Host Address) are IPv4 specific. However, neither of these fields are used in practice; they are normally zero-filled. If IP addresses have a role in the ICP message, there needs to be an address family descriptor for each address, and clients need to be able to say whether they want to hear IPv6 responses or not. o Options are limited to 32 option flags and 32 bits of option data. This should be more like TCP, with an option descriptor followed by option data. o Although currently used as the cache key, the URL string no longer serves this role adequately. Some HTTP responses now vary according to the requestor's User-Agent and other headers. A cache key must incorporate all non-transport headers present in the client's request. All non-hop-by-hop request headers should be sent in an ICP query. o ICPv2 uses different opcode values for queries and responses. ICP should use the same opcode for both sides of a two-sided transaction, with a "query/response" indicator telling which side is which. o ICPv2 does not include any authentication fields.9. Security Considerations Security is an issue with ICP over UDP because of its connectionless nature. Below we consider various vulnerabilities and methods of attack, and their implications. Our first line of defense is to check the source IP address of the ICP message, e.g. as given by recvfrom(2). ICP query messages should be processed if the access control rules allow the querying address access to the cache. However, ICP reply messages must only be accepted from known neighbors; a cache must ignore replies from unknown addresses. Because we trust the validity of an address in an IP packet, ICP is susceptible to IP address spoofing. In this document we address some consequences of IP address spoofing. Normally, spoofed addresses can only be detected by routers, not by hosts. However, the IP Authenti- cation Header[7,8] can be used underneath ICP to provide crypto- graphic authentication of the entire IP packet containing the ICPWessels & Claffy [Page 18]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -