📄 rfc2187.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 4 页
字号:
Wessels & Claffy             Informational                      [Page 6]

RFC 2187                          ICP                     September 1997


   Note that the cache.isp.com cache need not explicitly specify the
   customer cache as a peer, nor is the type of relationship encoded
   within the ICP query itself.  The access control entries regulate the
   relationships between this cache and its neighbors.  For our example,
   the ISP would use:

       acl src Customer  proxy.customer.org
       http_access allow Customer
       icp_access  allow Customer

   This defines an access control entry named `Customer' which specifies
   a source IP address of the customer cache machine.  The customer
   cache would then be allowed to make any request to both the HTTP and
   ICP ports (including cache misses).  This configuration implies that
   the ISP cache is a parent of the customer.

   If the ISP wanted to enforce a sibling relationship, it would need to
   deny access to cache misses.  This would be done as follows:

       miss_access deny Customer

   Of course the ISP should also communicate this to the customer, so
   that the customer will change his configuration from parent to
   sibling.  Otherwise, if the customer requests an object not in the
   ISP cache, an error message is generated.

5.  Applying the Protocol

   The following sections describe the ICP implementation in the
   Harvest[3] (research version) and Squid Web cache[5] packages.  In
   terms of version numbers, this means version 1.4pl2 for Harvest and
   version 1.1.10 for Squid.

   The basic sequence of events in an ICP transaction is as follows:

   1.   Local cache receives an HTTP[1] request from a cache client.

   2.   The local cache sends ICP queries (section 5.1).

   3.   The peer cache(s) receive the queries and send ICP replies
        (section 5.2).

   4.   The local cache receives the ICP replies and decides where to
        forward the request (section 5.3).







Wessels & Claffy             Informational                      [Page 7]

RFC 2187                          ICP                     September 1997


5.1.  Sending ICP Queries

5.1.1.  Determine whether to use ICP at all

   Not every HTTP request requires an ICP query to be sent.  Obviously,
   cache hits will not need ICP because the request is satisfied
   immediately.  For origin servers very close to the cache, we do not
   want to use any neighbor caches.  In Squid and Harvest, the
   administrator specifies what constitutes a `local' server with the
   `local_domain' and `local_ip' configuration options.  The cache
   always contacts a local server directly, never querying a peer cache.

   There are other classes of requests that the cache (or the
   administrator) may prefer to forward directly to the origin server.
   In Squid and Harvest, one such class includes all non-GET request
   methods.  A Squid cache can also be configured to not use peers for
   URLs matching the `hierarchy_stoplist'.

   In order for an HTTP request to yield an ICP transaction, it must:

   o    not be a cache hit

   o    not be to a local server

   o    be a GET request, and

   o    not match the `hierarchy_stoplist' configuration.

   We call this a "hierarchical" request.  A "non-hierarchical" request
   is one that doesn't generate any ICP traffic.  To avoid processing
   requests that are likely to lower cache efficiency, one can configure
   the cache to not consult the hierarchy for URLs that contain certain
   strings (e.g. `cgi_bin').

5.1.2.  Determine which peers to query

   By default, a cache sends an ICP_OP_QUERY message to each peer,
   unless any one of the following are true:

   o    Restrictions prevent querying a peer for this request, based on
        the configuration directive `cache_host_domain', which specifies
        a set of DNS domains (from the URLs) for which the peer should
        or should not be queried.  In Squid, a more flexible directive
        ('cache_host_acl') supports restrictions on other parts of the
        request (method, port number, source, etc.).






Wessels & Claffy             Informational                      [Page 8]

RFC 2187                          ICP                     September 1997


   o    The peer is a sibling, and the HTTP request includes a "Pragma:
        no-cache" header.  This is because the sibling would be asked to
        transit the request, which is not allowed.

   o    The peer is configured to never be sent ICP queries (i.e. with
        the `no-query' option).

   If the determination yields only one queryable ICP peer, and the
   Squid configuration directive `single_parent_bypass' is set, then one
   can bypass waiting for the single ICP response and just send the HTTP
   request directly to the peer cache.

   The Squid configuration option `source_ping' configures a Squid cache
   to send a ping to the original source simultaneous with its ICP
   queries, in case the origin is closer than any of the caches.

5.1.3.  Calculate the expected number of ICP replies

   Harvest and Squid want to maximize the chance to get a HIT reply from
   one of the peers.  Therefore, the cache waits for all ICP replies to
   be received.  Normally, we expect to receive an ICP reply for each
   query sent, except:

   o    When the peer is believed to be down.  If the peer is down Squid
        and Harvest continue to send it ICP queries, but do not expect
        the peer to reply.  When an ICP reply is again received from the
        peer, its status will be changed to up.

        The determination of up/down status has varied a little bit as
        the Harvest and Squid software evolved.  Both Harvest and Squid
        mark a peer down when it fails to reply to 20 consecutive ICP
        queries.  Squid also marks a peer down when a TCP connection
        fails, and up again when a diagnostic TCP connection succeeds.

   o    When sending to a multicast address.  In this case we'll
        probably expect to receive more than one reply, and have no way
        to definitively determine how many to expect.  We discuss
        multicast issues in section 7 below.


5.1.4.  Install timeout event

   Because ICP uses UDP as underlying transport, ICP queries and replies
   may sometimes be dropped by the network.  The cache installs a
   timeout event in case not all of the expected replies arrive.  By
   default Squid and Harvest use a two-second timeout.  If object
   retrieval has not commenced when the timeout occurs, a source is
   selected as described in section 5.3.9 below.



Wessels & Claffy             Informational                      [Page 9]

RFC 2187                          ICP                     September 1997


5.2.  Receiving ICP Queries and Sending Replies

   When an ICP_OP_QUERY message is received, the cache examines it and
   decides which reply message is to be sent.  It will send one of the
   following reply opcodes, tested for use in the order listed:

5.2.1.  ICP_OP_ERR

   The URL is extracted from the payload and parsed.  If parsing fails,
   an ICP_OP_ERR message is returned.

5.2.2.  ICP_OP_DENIED

   The access controls are checked.  If the peer is not allowed to make
   this request, ICP_OP_DENIED is returned.  Squid counts the number of
   ICP_OP_DENIED messages sent to each peer.  If more than 95% of more
   than 100 replies have been denied, then no reply is sent at all.
   This prevents misconfigured caches from endlessly sending unnecessary
   ICP messages back and forth.

5.2.3.  ICP_OP_HIT

   If the cache reaches this point without already matching one of the
   previous  opcodes, it means the request is allowed and we must
   determine if it will be HIT or MISS, so we check if the URL exists in
   the local cache.  If so, and if the cached entry is fresh for at
   least the next 30 seconds, we can return an ICP_OP_HIT message.  The
   stale/fresh determination uses the local refresh (or TTL) rules.

   Note that a race condition exists for ICP_OP_HIT replies to sibling
   peers.  The ICP_OP_HIT means that a subsequent HTTP request for the
   named URL would result in a cache hit.  We assume that the HTTP
   request will come very quickly after the ICP_OP_HIT.  However, there
   is a slight chance that the object might be purged from this cache
   before the HTTP request is received.  If this happens, and the
   replying peer has applied Squid's `miss_access' configuration then
   the user will receive a very confusing access denied message.

5.2.3.1.  ICP_OP_HIT_OBJ

   Before returning the ICP_OP_HIT message, we see if we can send an
   ICP_OP_HIT_OBJ message instead.  We can use ICP_OP_HIT_OBJ if:

   o    The ICP_OP_QUERY message had the ICP_FLAG_HIT_OBJ flag set.







Wessels & Claffy             Informational                     [Page 10]

RFC 2187                          ICP                     September 1997


   o    The entire object (plus URL) will fit in an ICP message.  The
        maximum ICP message size is 16 Kbytes, but an application may
        choose to set a smaller maximum value for ICP_OP_HIT_OBJ
        replies.

   Normally ICP replies are sent immediately after the query is
   received, but the ICP_OP_HIT_OBJ message cannot be sent until the
   object data is available to copy into the reply message.  For Squid
   and Harvest this means the object must be "swapped in" from disk if
   it is not already in memory.  Therefore, on average, an
   ICP_OP_HIT_OBJ reply will have higher latency than ICP_OP_HIT.

5.2.4.  ICP_OP_MISS_NOFETCH

   At this point we have a cache miss.  ICP has two types of miss
   replies.  If the cache does not want the peer to request the object
   from it, it sends an ICP_OP_MISS_NOFETCH message.

5.2.5.  ICP_OP_MISS

   Finally, an ICP_OP_MISS reply is returned as the default.  If the
   replying cache is a parent of the querying cache, the ICP_OP_MISS
   indicates an invitation to fetch the URL through the replying cache.

5.3.  Receiving ICP Replies

   Some ICP replies will be ignored; specifically, when any of the
   following are true:

   o    The reply message originated from an unknown peer.

   o    The object named by the URL does not exist.

   o    The object is already being fetched.

5.3.1.  ICP_OP_DENIED

   If more than 95% of more than 100 replies from a peer cache have been
   ICP_OP_DENIED, then such a high denial rate most likely indicates a
   configuration error, either locally or at the peer.  For this reason,
   no further queries will be sent to the peer for the duration of the
   cache process.

5.3.2.  ICP_OP_HIT

   Object retrieval commences immediately from the replying peer.





Wessels & Claffy             Informational                     [Page 11]

RFC 2187                          ICP                     September 1997


5.3.3.  ICP_OP_HIT_OBJ

   The object data is extracted from the ICP message and the retrieval
   is complete.  If there is some problem with the ICP_OP_HIT_OBJ
   message (e.g. missing data) the reply will be treated like a standard
   ICP_OP_HIT.

5.3.4.  ICP_OP_SECHO

   Object retrieval commences immediately from the origin server because
   the ICP_OP_SECHO reply arrived prior to any ICP_OP_HIT's.  If an
   ICP_OP_HIT had arrived prior, this ICP_OP_SECHO reply would be
   ignored because the retrieval has already started.

5.3.5.  ICP_OP_DECHO

   An ICP_OP_DECHO reply is handled like an ICP_OP_MISS.  Non-ICP peers
   must always be configured as parents; a non-ICP sibling makes no
   sense.  One serious problem with the ICP_OP_DECHO feature is that
   since it bounces messages off the peer's UDP echo port, it does not
   indicate that the peer cache is actually running -- only that network
   connectivity exists between the pair.

5.3.6.  ICP_OP_MISS

   If the peer is a sibling, the ICP_OP_MISS reply is ignored.
   Otherwise, the peer may be "remembered" for future use in case no HIT
   replies are received later (section 5.3.9).

   Harvest and Squid remember the first parent to return an ICP_OP_MISS
   message.  With Squid, the parents may be weighted so that the "first
   parent to miss" may not actually be the first reply received.  We
   call this the FIRST_PARENT_MISS.  Remember that sibling misses are
   entirely ignored, we only care about misses from parents.  The parent
   miss RTT's can be weighted because sometimes the closest parent is
   not the one people want to use.

   Also, recent versions of Squid may remember the parent with the
   lowest RTT to the origin server, using the ICP_FLAG_SRC_RTT option.
   We call this the CLOSEST_PARENT_MISS.

5.3.7.  ICP_OP_MISS_NOFETCH

   This reply is essentially ignored.  A cache must not forward a
   request to a peer that returns ICP_OP_MISS_NOFETCH.






Wessels & Claffy             Informational                     [Page 12]
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -