📄 icpv2-application.txt
字号:
Wessels & Claffy [Page 6]Internet-Draft 8 Jul 1997 relationship. If there are no access lines present, the cache allows the request by default. Note that the cache.isp.com cache need not explicitly specify the customer cache as a peer, nor is the type of relationship encoded within the ICP query itself. The access control entries regulate the relationships between this cache and its neighbors. For our example, the ISP would use: acl src Customer proxy.customer.org http_access allow Customer icp_access allow Customer This defines an access control entry named `Customer' which specifies a source IP address of the customer cache machine. The customer cache would then be allowed to make any request to both the HTTP and ICP ports (including cache misses). This configuration implies that the ISP cache is a parent of the customer. If the ISP wanted to enforce a sibling relationship, it would need to deny access to cache misses. This would be done as follows: miss_access deny Customer Of course the ISP should also communicate this to the customer, so that the customer will change his configuration from parent to sib- ling. Otherwise, if the customer requests an object not in the ISP cache, an error message is generated.5. Applying the Protocol The following sections describe the ICP implementation in the Har- vest[3] (research version) and Squid Web cache[5] packages. In terms of version numbers, this means version 1.4pl2 for Harvest and version 1.1.10 for Squid. The basic sequence of events in an ICP transaction is as follows: 1. Local cache receives an HTTP[1] request from a cache client. 2. The local cache sends ICP queries (section 5.1). 3. The peer cache(s) receive the queries and send ICP replies (sec- tion 5.2). 4. The local cache receives the ICP replies and decides where to forward the request (section 5.3).Wessels & Claffy [Page 7]Internet-Draft 8 Jul 19975.1. Sending ICP Queries5.1.1. Determine whether to use ICP at all Not every HTTP request requires an ICP query to be sent. Obviously, cache hits will not need ICP because the request is satisfied immedi- ately. For origin servers very close to the cache, we do not want to use any neighbor caches. In Squid and Harvest, the administrator specifies what constitutes a `local' server with the `local_domain' and `local_ip' configuration options. The cache always contacts a local server directly, never querying a peer cache. There are other classes of requests that the cache (or the adminis- trator) may prefer to forward directly to the origin server. In Squid and Harvest, one such class includes all non-GET request meth- ods. A Squid cache can also be configured to not use peers for URLs matching the `hierarchy_stoplist'. In order for an HTTP request to yield an ICP transaction, it must: o not be a cache hit o not be to a local server o be a GET request, and o not match the `hierarchy_stoplist' configuration. We call this a "hierarchical" request. A "non-hierarchical" request is one that doesn't generate any ICP traffic. To avoid processing requests that are likely to lower cache efficiency, one can configure the cache to not consult the hierarchy for URLs that contain certain strings (e.g. `cgi_bin').5.1.2. Determine which peers to query By default, a cache sends an ICP_OP_QUERY message to each peer, unless any one of the following are true: o Restrictions prevent querying a peer for this request, based on the configuration directive `cache_host_domain', which specifies a set of DNS domains (from the URLs) for which the peer should or should not be queried. In Squid, a more flexible directive ('cache_host_acl') supports restrictions on other parts of the request (method, port number, source, etc.).Wessels & Claffy [Page 8]Internet-Draft 8 Jul 1997 o The peer is a sibling, and the HTTP request includes a "Pragma: no-cache" header. This is because the sibling would be asked to transit the request, which is not allowed. o The peer is configured to never be sent ICP queries (i.e. with the `no-query' option). If the determination yields only one queryable ICP peer, and the Squid configuration directive `single_parent_bypass' is set, then one can bypass waiting for the single ICP response and just send the HTTP request directly to the peer cache. The Squid configuration option `source_ping' configures a Squid cache to send a ping to the original source simultaneous with its ICP queries, in case the origin is closer than any of the caches.5.1.3. Calculate the expected number of ICP replies Harvest and Squid want to maximize the chance to get a HIT reply from one of the peers. Therefore, the cache waits for all ICP replies to be received. Normally, we expect to receive an ICP reply for each query sent, except: o When the peer is believed to be down. If the peer is down Squid and Harvest continue to send it ICP queries, but do not expect the peer to reply. When an ICP reply is again received from the peer, its status will be changed to up. The determination of up/down status has varied a little bit as the Harvest and Squid software evolved. Both Harvest and Squid mark a peer down when it fails to reply to 20 consecutive ICP queries. Squid also marks a peer down when a TCP connection fails, and up again when a diagnostic TCP connection succeeds. o When sending to a multicast address. In this case we'll proba- bly expect to receive more than one reply, and have no way to definitively determine how many to expect. We discuss multicast issues in section 7 below.5.1.4. Install timeout event Because ICP uses UDP as underlying transport, ICP queries and replies may sometimes be dropped by the network. The cache installs a time- out event in case not all of the expected replies arrive. By default Squid and Harvest use a two-second timeout. If object retrieval has not commenced when the timeout occurs, a source is selected asWessels & Claffy [Page 9]Internet-Draft 8 Jul 1997 described in section 5.3.9 below.5.2. Receiving ICP Queries and Sending Replies When an ICP_OP_QUERY message is received, the cache examines it and decides which reply message is to be sent. It will send one of the following reply opcodes, tested for use in the order listed:5.2.1. ICP_OP_ERR The URL is extracted from the payload and parsed. If parsing fails, an ICP_OP_ERR message is returned.5.2.2. ICP_OP_DENIED The access controls are checked. If the peer is not allowed to make this request, ICP_OP_DENIED is returned. Squid counts the number of ICP_OP_DENIED messages sent to each peer. If more than 95% of more than 100 replies have been denied, then no reply is sent at all. This prevents misconfigured caches from endlessly sending unnecessary ICP messages back and forth.5.2.3. ICP_OP_HIT If the cache reaches this point without already matching one of the previous opcodes, it means the request is allowed and we must deter- mine if it will be HIT or MISS, so we check if the URL exists in the local cache. If so, and if the cached entry is fresh for at least the next 30 seconds, we can return an ICP_OP_HIT message. The stale/fresh determination uses the local refresh (or TTL) rules. Note that a race condition exists for ICP_OP_HIT replies to sibling peers. The ICP_OP_HIT means that a subsequent HTTP request for the named URL would result in a cache hit. We assume that the HTTP request will come very quickly after the ICP_OP_HIT. However, there is a slight chance that the object might be purged from this cache before the HTTP request is received. If this happens, and the reply- ing peer has applied Squid's `miss_access' configuration then the user will receive a very confusing access denied message.5.2.3.1. ICP_OP_HIT_OBJ Before returning the ICP_OP_HIT message, we see if we can send anWessels & Claffy [Page 10]Internet-Draft 8 Jul 1997 ICP_OP_HIT_OBJ message instead. We can use ICP_OP_HIT_OBJ if: o The ICP_OP_QUERY message had the ICP_FLAG_HIT_OBJ flag set. o The entire object (plus URL) will fit in an ICP message. The maximum ICP message size is 16 Kbytes, but an application may choose to set a smaller maximum value for ICP_OP_HIT_OBJ replies. Normally ICP replies are sent immediately after the query is received, but the ICP_OP_HIT_OBJ message cannot be sent until the object data is available to copy into the reply message. For Squid and Harvest this means the object must be "swapped in" from disk if it is not already in memory. Therefore, on average, an ICP_OP_HIT_OBJ reply will have higher latency than ICP_OP_HIT.5.2.4. ICP_OP_MISS_NOFETCH At this point we have a cache miss. ICP has two types of miss replies. If the cache does not want the peer to request the object from it, it sends an ICP_OP_MISS_NOFETCH message.5.2.5. ICP_OP_MISS Finally, an ICP_OP_MISS reply is returned as the default. If the replying cache is a parent of the querying cache, the ICP_OP_MISS indicates an invitation to fetch the URL through the replying cache.5.3. Receiving ICP Replies Some ICP replies will be ignored; specifically, when any of the fol- lowing are true: o The reply message originated from an unknown peer. o The object named by the URL does not exist. o The object is already being fetched.5.3.1. ICP_OP_DENIED If more than 95% of more than 100 replies from a peer cache have been ICP_OP_DENIED, then such a high denial rate most likely indicates a configuration error, either locally or at the peer. For this reason,Wessels & Claffy [Page 11]Internet-Draft 8 Jul 1997 no further queries will be sent to the peer for the duration of the cache process.5.3.2. ICP_OP_HIT Object retrieval commences immediately from the replying peer.5.3.3. ICP_OP_HIT_OBJ The object data is extracted from the ICP message and the retrieval is complete. If there is some problem with the ICP_OP_HIT_OBJ mes- sage (e.g. missing data) the reply will be treated like a standard ICP_OP_HIT.5.3.4. ICP_OP_SECHO Object retrieval commences immediately from the origin server because the ICP_OP_SECHO reply arrived prior to any ICP_OP_HIT's. If an ICP_OP_HIT had arrived prior, this ICP_OP_SECHO reply would be ignored because the retrieval has already started.5.3.5. ICP_OP_DECHO An ICP_OP_DECHO reply is handled like an ICP_OP_MISS. Non-ICP peers must always be configured as parents; a non-ICP sibling makes no sense. One serious problem with the ICP_OP_DECHO feature is that since it bounces messages off the peer's UDP echo port, it does not indicate that the peer cache is actually running -- only that network connectivity exists between the pair.5.3.6. ICP_OP_MISS If the peer is a sibling, the ICP_OP_MISS reply is ignored. Other- wise, the peer may be "remembered" for future use in case no HIT replies are received later (section 5.3.9). Harvest and Squid remember the first parent to return an ICP_OP_MISS message. With Squid, the parents may be weighted so that the "first parent to miss" may not actually be the first reply received. We call this the FIRST_PARENT_MISS. Remember that sibling misses are entirely ignored, we only care about misses from parents. The parent miss RTT's can be weighted because sometimes the closest parent is not the one people want to use.Wessels & Claffy [Page 12]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -