rfc2258.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 844 行 · 第 1/3 页

TXT
844
字号

RFC 2258              Internet Nomenclator Project          January 1998


   The distributed catalog service is logically one network service, but
   it can be divided into pieces that are distributed and/or replicated.
   Query resolvers access this distributed, replicated service using the
   same techniques that work for multiple data repositories.

   A Nomenclator system naturally includes many query resolvers.
   Resolvers are independent, but renewable, query agents that can be as
   powerful as the resources available at the user site.  Caching
   decreases the dependence of the resolver on the distributed catalog
   service for frequently used meta-data, and on data repositories for
   frequently used data.  Caching thus improves the number of users that
   can be supported and the local availability of the query service.

2.2 Meta-Data Techniques

   The active catalog structures the information space into a collection
   of relations about people, hosts, organizations, services and other
   objects. It collects meta-data for each relation and structures it
   into "access functions" for locating and retrieving data.  Access
   functions respond to the question: "Where is data to answer this
   query?"  There are two types of responses corresponding to the two
   types of access functions.  The first type of response is: "Look over
   there." "Catalog functions" return this response; they constrain the
   query search by limiting the data repositories contacted to those
   having data relevant to the query. Catalog functions return a
   referral to data access functions that will answer the query or to
   additional catalog functions to contact for more detailed
   information.  The second response to "Where?" is: "Here it is!" "Data
   access functions" return this response; they understand how to obtain
   query answers from specific data repositories.  They return tuples
   that answer the query.  Nomenclator supplies access functions for
   common name services, such as the CCSO service, and organizations can
   write and supply access functions for data in their repositories.

   Access functions are implemented as remote or local services.  Remote
   access functions are services that are available through a standard
   remote procedure call interface.  Local access functions are
   functions that are supplied with the query resolver.  Local access
   functions can be applied to a variety of indexing and data retrieval
   tasks by loading them with meta-data stored in distributed catalog
   service.  Remote access functions are preferred over local ones when
   the resources of the query resolver are inadequate to support the
   access function.  The owners of data may also choose to supply remote
   access functions for privacy reasons if their access functions use
   proprietary information or algorithms.  Local functions are preferred
   whenever possible, because they are highly replicated in resolver
   caches.  They can reduce system and network load by bringing the
   resources of the active catalog directly to the users.



Ordille                      Informational                      [Page 6]

RFC 2258              Internet Nomenclator Project          January 1998


   Remote access functions are simple to add to Nomenclator and local
   access functions are simple to apply to new data repositories,
   because the active catalog provides "referrals" that describe the
   conditions for using access functions.  For simplicity, this document
   describes referral techniques for exact matching of query strings.
   Extensions to these techniques in Nomenclator support matching query
   strings that contain wildcards or word-based matching of query
   strings in the style of the CCSO services.

   Each referral contains a template and a list of references to access
   functions.  The template is a conjunctive selection predicate that
   describes the scope of the access functions.  Conjunctive queries
   that are within the scope of the template can be answered with the
   referral.  When a template contains a wildcard value ("*") for an
   attribute, the attribute must be present in any queries that are
   processed by the referral.  The system follows the following rule:

     Query Coverage Rule:

     If the set of tuples satisfying the selection predicate in a query
     is covered by (is a subset of) the set of tuples satisfying the
     template, then the query can be answered by the access functions in
     the reference list of the referral.

   For example, the query below:

     select * from People where country = "US" and surname = "Ordille";


   is covered by the following templates in Lines (1) through (3), but
   not by the templates in Lines (4) and (5):


      (1) country = "US" and surname = "*"

      (2) country = "US" and surname = "Ordille"

      (3) country = "US"

      (4) organization = "*"

      (5) country = "US" and surname = "Elliott"

   Referrals form a generalization/specialization graph for a relation
   called a "referral graph."  Referral graphs are a conceptual tool
   that guides the integration of different catalog functions into our
   system and that supplies a basis for catalog function construction
   and query processing.  A "referral graph" is a partial ordering of



Ordille                      Informational                      [Page 7]

RFC 2258              Internet Nomenclator Project          January 1998


   the referrals for a relation.  It is constructed using the
   subset/superset relationship: "S is a subset of G."  A referral S is
   a subset of referral G if the set of queries covered by the template
   of S is a subset of the set of queries covered by the template of G.
   S is considered a more specific referral than G; G is considered a
   more general referral than S.  For example, the subset relationship
   exists between the pairs of referrals with the templates listed
   below:


      (1) country = "US" and surname = "Ordille"
          is a subset of
          country = "US"

      (2) country = "US" and surname = "Ordille"
          is a subset of
          country = "US" and surname = "*"

      (3) country = "US" and surname = "*"
          is a subset of
          country ="US"

      (4) country = "US"
          is a subset
          "empty template"

   but it does not exist between the pairs of referrals with the
   following templates:

      (5) country = "US"
          is not a subset of
          department = "CS"

      (6) country = "US" and name = "Ordille"
          is not a subset of
          country = "US" and name = "Elliott"

   In Lines (1) and (2), the more general referral covers more queries,
   because it covers queries that list different values for surname.  In
   Line (3), the more general referral covers more queries, because it
   covers queries that do not constrain surname to a value.  In Line
   (4), the specific referral covers only those queries that constrain
   the country to "US" while the empty template covers all queries.

   During query processing, wildcards in a template are replaced with
   the value of the corresponding attribute in the query.  For any query
   covered by two referrals S and G such that S is a subset of G, the
   set of tuples satisfying the template in S is covered by the set of



Ordille                      Informational                      [Page 8]

RFC 2258              Internet Nomenclator Project          January 1998


   tuples satisfying the template in G.  S is used to process the query,
   because it provides the more constrained (and faster) search space.
   The referral S has a more constrained logical search space than G,
   because the set of tuples in the scope of S is no larger, and often
   smaller, than the set in the scope of G. Moreover, S has a more
   constrained physical search space than G, because the data
   repositories that must contacted for answers to S must also be
   contacted for answers to G, but additional data repositories may need
   to be contacted to answer G.

   In constraining a query, a catalog function always produces a
   referral that is more specific than the referral containing the
   catalog function.  Wildcards ("*") in a template indicate which
   attribute values are used by the associated catalog function to
   generate a more specific referral.  In other words, catalog functions
   always follow the rule:

      Catalog Function Constrained Search Rule:

      Given a referral R with a template t and a catalog function cf,
      and a query q covered by t, the result of using cf to process q,
      cf(q), is a referral R' with template t' such that q  is covered
      by t' and R' is more specific than R.

   Catalog functions make it possible to import a portion of the indices
   for the information space into the query resolver.  Since they
   generate referrals, the resolver can cache the most useful referrals
   for a relation and call the catalog function as needed to generate
   new referrals.

   The resolver query processing algorithm obtains an initial set of
   referrals from the distributed catalog service.  It then navigates
   the referral graph, calling catalog functions as necessary to obtain
   additional referrals that narrow the search space. Sometimes, two
   referrals that cover the query have the relationship of general to
   specific to each other.  The resolver eliminates unnecessary access
   function processing by using only the most specific referral along
   each path of the referral graph.

   The search space for the query is initially set to all the data
   repositories in the relation.  As the resolver obtains referrals to
   sets of relevant data repositories (and their associated data access
   functions) it forms the intersection of the referrals to constrain
   the search space further.  The intersection of the referrals includes
   only those data repositories listed in all the referrals.
   Intersection combines independent paths through the referral graph to
   derive benefit from indices on different attributes.




Ordille                      Informational                      [Page 9]

RFC 2258              Internet Nomenclator Project          January 1998


2.3 Meta-Data and Data Caching

   A Nomenclator query resolver caches the meta-data that result from
   calling catalog functions.  It also caches the responses for queries.
   If the predicate of a new query is covered by the predicate of a
   previous query, Nomenclator calculates the response for the new query
   from the cached response of the old query.  Nomenclator timestamps
   its cache entries to provide measures of the currentness of query
   responses and selective cache refresh.  The timestamps are used to
   calculate a t-bound on query responses [5][1].  A t-bound is the time
   after which changes may have occurred to the data that are not
   reflected in the query response. It is the time of the oldest cache
   entry used to calculate the response.  Nomenclator returns a t-bound
   with each query response.  Users can request more current data by
   asking for responses that are more recent than this t-bound. Making
   such a request flushes older items from the cache if more recent
   items are available.  Query resolvers calculate a minimum t-bound
   that is some refresh interval earlier than the current time.
   Resolvers keep themselves current by replacing items in the cache
   that are earlier than the minimum t-bound.

2.4 Scale and Performance

   Three performance studies of active catalog and meta-data caching
   techniques are available [5].  The first study shows that the active
   catalog and meta-data caching can constrain the search effectively in
   a real environment, the X.500 name space.  The second study examined
   the performance of an active catalog and meta-data caching for single
   users on a local area network.  The experiments showed that the
   techniques to eliminate data repositories from the search space can
   dramatically improve response time.  Response times improve, because
   latency is reduced.  The reduction of latency in communications and
   processing is critical to large-scale descriptive query optimization.
   The experiments also showed that an active catalog is the most
   significant contributor to better response time in a system with low
   load, and that meta-data caching functions to reduce the load on the
   system.  The third study used an analytical model to evaluate the
   performance and scaling of these techniques for a large Internet
   environment.  It showed that meta-data caching plays an essential
   role in scaling the distributed catalog service to millions of users.
   It also showed that constraining the search space with an active
   catalog contributes significantly to scaling data repositories to
   millions of users.  Replication and data caching also contribute to
   the scale of the system in a large Internet environment.







Ordille                      Informational                     [Page 10]

RFC 2258              Internet Nomenclator Project          January 1998

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?