📄 rfc2651.txt
字号:
/--CIP------|______| / _______ _______ / | | | |- | C |-------CIP----| E | |_______| |_______|- | \ r \ _______ e \ ______ | | f \--CIP-----| | | G |-------CIP---------e------------------| H | |_______| r |______| \--referral---| r --referral-/ | a | | l | \ 3 | 2 | 1 \--------/ | | | client | | | -------- Figure 1: Sample layout of the Index Service meshAllen & Mealling Standards Track [Page 7]RFC 2651 The CIP Architecture August 1999 All indices passed in a given mesh are assumed, as of this writing, to be of the same type (i.e. governed by the same CIP index object specification). It may be possible to create gateways between meshes carrying different index objects, but at this time that process is undefined and declared to be outside the scope of this specification. In the case where a CIP server receives an index of a type that it does not understand it _can_ pass that index forward untouched. In the case where a server implementation decides not to accept unknown indices it should return an appropriate error message to the server sending the index. This behavior is to allow mesh implementations to attempt heterogeneous meshes. As stated above heterogeneous meshes are considered to be ill defined and as such should be considered dangerous. Experience suggests that this index passing activity should take place among CIP servers as a parallel (and possibly lower-priority) job to their primary job of answering queries. Index objects travel among CIP servers by protocol exchanges explicitly defined in this document, not via the server's native protocol. This distinction is important, and bears repeating: Queries are answered (and referrals are sent) via the native data access protocol. Index objects are transferred via alternative means, as defined by this document. When two servers cooperate to move indexing information, the pair are said to be in a "polling relationship". The server that holds the data of interest, and generates the index is called the "polled server". The other server, which is the one that collects the generated index, is the "polling server". In a polling relationship, the polled server is responsible for notifying the polling server when it has a new index that the polling server might be interested in. In response, the polling server may immediately pick up the index object, or it may schedule a job to pick up a copy of the new index at a more convenient time. But, a polling server is not required to wait on the polled server to notify it of changes. The polling server can request a new index at any time. Independent of the symmetric polling relationship, there's another way that servers can pass indices using CIP. In an "index pushing" relationship, a CIP server simply sends the index to a peer whenever necessary, and allows the receiver to handle the index object as itAllen & Mealling Standards Track [Page 8]RFC 2651 The CIP Architecture August 1999 chooses. The receiving server may refuse it, may accept it, then silently discard it, may accept only portions of it (by accepting it as is, then filtering it), or may accept it without question. The index pushing relationship is intended for use by dumb leaf nodes which simply want to make their index available to the global mesh of servers, but have no interest in implementing the complete CIP transaction protocol. It lowers the barriers to entry for CIP leaf nodes. For more information on participating in a CIP mesh in this restricted manner, see the section below on "Protocol Conformance". CIP index passing operations take place across a reliable transport mechanisms, including both TCP connections, and Internet mail messages. The precise mechanisms are described in the Transport document [CIP-Transport].3.2.3 Index Object Synthesis From the preceding discussion, it should be clear that indexing servers read and write index objects as they pass them around the mesh. However, a CIP server need not simply pass the in-bound indices through as the out-bound ones. While it is always permissible to pass an index object through to other servers, a server may choose to aggregate two or more of them, thereby reducing redundancy in the index, at the cost of longer referral chains. A basic premise of index passing is that even while collapsing a body of data into an index by lossy compression methods, hints useful to routing queries will survive in the resulting index. Since the index is not a complete copy of the original dataset, it contains less information. Index objects can be passed along unchanged, but as more and more information collects in the resulting index object, redundancy will creep in again, and it may prove useful to apply the compression again, by aggregating two or more index objects into one. This kind of aggregation should be performed without compromising the ability to correctly route queries while avoiding excessive numbers of missed results. The acceptable likelihood of false negatives must be established on a per-application-domain basis, and is controlled by the granularity of the index and the aggregation rules defined for it by the particular specification. However, when CIP is used in a multi-protocol application domain, such as a Directory Service (with contenders including Whois++, LDAP, and Ph), things get significantly trickier. The fundamental problem is to avoid forcing a referral chain to pass through part of the mesh which does not support the protocol by which that client made the query. If this ever happens, the client loses access to any hitsAllen & Mealling Standards Track [Page 9]RFC 2651 The CIP Architecture August 1999 beyond that point in the referral chain, since it cannot resolve the referral in its native data access protocol. This is a failure of query routing, which should be avoided. In addition to multi-protocol considerations, server managers may choose not to allow index object aggregation for performance reasons. As referral chains lengthen, a client needs to perform more transactions to resolve a query. As the number of transactions increases, so do the user-perceived delays, the system loads, and the global bandwidth demands. In general, there's a tradeoff between aggressive aggregation (which leads to reductions in the indexing overhead) and aggressive referral chain optimization. This tradeoff, which is also sensitive to the particular application domain, needs to be explored more in actual operational situations. Conceptually, a CIP index server has several index objects on hand at any given time. If it holds data in addition to indexing information, the server has an index object formed from its own data, called the "local index". It may have one or more indices from remote servers which it has collected via the index passing mechanisms. These are called "in-bound indices". Implementor's Note: It may not be necessary to keep all of these structures intact and distinct in the local database. It is also not required to keep the out-bound index (or indices) built and ready to distribute at all times. The previous paragraph merely introduces a useful model for expressing the aggregation rules. Implementors are free to model index objects internally however they see fit. The following two rules control how a CIP server formulates its outgoing indices: 1. An index server may pass any of the index objects in its local index and its in-bound indices through unchanged to polling servers. 2. If and only if the following three conditions are true, an index server can aggregate two or more index objects into a single new index object, to be added to the set of out-bound indices. a. Each index object to be aggregated covers exactly the same set of protocols, as defined by the scheme component of the Base- URI's in each index object. b. The index server supports every one of the data access protocols represented by the Base-URI's in the index objects to be aggregated.Allen & Mealling Standards Track [Page 10]RFC 2651 The CIP Architecture August 1999 c. The specification for the index object type specified by the type header of the index objects explicitly defines the aggregation operation. The resulting index object must have Base-URI's characteristic of the local server for each protocol it supports. The outgoing objects should have the DSI of the local server.4. Navigating the mesh With the CIP infrastructure in place to manage index objects, the only problem remaining is how to successfully use the indexing information to do efficient searches. CIP facilitates query routing, which is essentially a client activity. A client connects to one server, which redirects the query to servers "closer to" the answer. This redirection message is called a referral.4.1 The Referral The concept of a referral and the mechanism for deciding when they should be issued is described by CIP. However, the referral itself must be transferred to the client in the native protocol, so its syntax is not directly a CIP issue. The mechanism for deciding that a referral needs to be made and generating that referral resides in the CIP implementation in the server. The mechanism for sending the referral to the client resides in the server's native protocol implementation. A referral is made when a search against the index objects held by the server shows that there may be hits available in one of the datasets represented by those index objects. If more that one index object indicates that a referral must be generated to a given dataset, the server should generate only one referral to the given dataset, as the client may not be able to detect duplicates. Though the format of the referral is dependent on the native protocol(s) of the CIP server, the baseline contents of the referral are constant across all protocols. At the least, a DSI and a URI must be returned. The DSI is the DSI associated with the dataset which caused the hit. This must be presented to the client so that it can avoid referral loops. The Base-URI parameter which travels along with index objects is used to provide the other required part of a referral. The additional information in the Base-URI may be necessary for the server receiving the referred query to correctly handle it. A good example of this is an LDAP server, which needs a base X.500 distinguished name from which to search. When an LDAP server sends aAllen & Mealling Standards Track [Page 11]RFC 2651 The CIP Architecture August 1999 centroid-format index object up to a CIP indexing server, it sends a Base-URI along with the name of the X.500 subtree for which the index was made. When a referral is made, the Base-URI is passed back to the client so that it can pass it to the original LDAP server. As usual, in addition to sending the DSI, a DSI-Description header can be optionally sent. Because a client may attempt to check with the user before chasing the referral, and because this string is the friendliest representation of the DSI that CIP has to offer, it should be included in referrals when available (i.e. when it was sent along with the index object).4.2 Cross-protocol Mappings Each data access protocol which uses CIP will need a clearly defined set of rules to map queries in the native protocol to searches against an index object. These rules will vary according to the data domain. In principle, this could create a bit of a scaling difficulty; for N protocols and M data domains, there would be N x M mappings required. In practice, this should not be the case, since some access protocols will be wholly unsuited to some data domains. Consider for example, a LDAP server trying to make a search in an index object composed from unorganized text based pages. What would the results be? How would the client make sense of the results? However, as pre-existing protocols are connected to CIP, and as new ones are developed to work with CIP, this issue must be examined. In the case of Whois++ and the CENTROID index type, there is an extremely close mapping, since the two were designed together. When hooking LDAP to the CENTROID index type, it will be necessary to map the attribute names used in the LDAP system to attribute names which are already being used in the CENTROID mesh. It will also be necessary to tokenize the LDAP queries under the same rules as the CENTROID indexing policy, so that searches will take place correctly. These application- and protocol-specific actions must be specified in the index object specification, as discussed in the [CIP-MIME] document.4.3 Moving through the mesh From a client's point of view, CIP simply pushes all the "hard work" onto its shoulders. After all, it is the client which needs to track down the real data. While this is true, it is very misleading. Because the client has control over the query routing process, the client has significant control over the size of the result set, the speed with which the query progresses, and the depth of the search.Allen & Mealling Standards Track [Page 12]RFC 2651 The CIP Architecture August 1999 The simplest client implementation provides referrals to the user in a raw, ready-to-reuse form, without attempting to follow them. For instance, one Whois++ client, which interacts with the user via a Web-based form, simply makes referrals into HTML hypertext links. Encoded in the link via the HTML forms interface GET encoding rules is the data of the referral: the hostname, port, and query. If a user chooses to follow the referral link, he executes a new search on the new host. A more savvy client might present the referrals to the user and ask which should be followed. And, assuming appropriate limits were placed on search time and bandwidth usage, it might be reasonable to program a client to follow all referrals automatically. When following all referrals, a client must show a bit of intelligence. Remember that the mesh is defined as an interconnected graph of CIP servers. This graph may have cycles, which could cause an infinite loop of referrals, wasting the servers' time and the client's too. When faced with the job of tacking down all referrals, a client must use some form of a mesh traversal algorithm. Such an algorithm has been documented for use with Whois++ in RFC-1914. The same algorithm can be easily used with this version of CIP. In Whois++ the equivalent of a DSI is called a handle. With this substitution, the Whois++ mesh traversal algorithm works unchanged with CIP. Finally, the mesh entry point (i.e. the first server queried) can have an impact on the success of the query. To avoid scaling issues, it is not acceptable to use a single "root" node, and force all clients to connect to it. Instead, clients should connect to a reasonably well connected (with respect to the CIP mesh, not the Internet infrastructure) local server. If no match can be made from this entry point, the client can expand the search by asking the original server who polls it. In general, those servers will have a better "vantage point" on the mesh, and will turn up answers that the initial search didn't. The mechanism for dynamically determining the
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -