rfc2969.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 1,068 行 · 第 1/3 页

TXT
1,068
字号
   When it then was time to test the resulting software with standard
   commercial client and server software, a few more surprises came to
   light (primarily in terms of these softwares' expected worldview and
   occasional implementation shortcuts).  Again, more detail is provided
   in the Appendix, but highlights included client software that could
   only handle a very small subset of a protocol's defined status
   message lexicon (e.g., 2 system messages supported), and client
   software that automatically appended additional terms to a query
   specified by the user (e.g., adding "or email=<what the user typed in
   to the query>").

2.4 Some observations

2.4.1 Participation of the WDSPs

   One of the things that came to light was that the nature of the index
   object generated by the WDSPs has an important impact on performance
   -- both in terms of integrating the index object into the Referral
   Index, and in terms of efficiency of handling queries.  A proposal
   might be either to define more clearly how the WDSPs should generate
   the CIP index object (currently left to their discretion), or to
   alert individual WDSPs when their index objects are considered
   substandard.

   On another front, when chaining referrals to WDSP servers, some
   servers perform more efficiently than others, affecting the overall
   response time of the DAG system.  From a service point of view, it
   should also be possible to suggest to WDSP's that are consistently
   slow (longer than some selected response time) that they are
   substandard.







Eklof & Daigle               Informational                      [Page 7]

RFC 2969             Wide Area Directory Deployment         October 2000


2.4.2 Index Objects and Referral Index size

   As described in more detail [complex], there are many factors that
   can influence the growth factor of index objects (as more data is
   indexed).  That work dealt specifically with tokenized data for
   Whois++ centroids, and is not immediately generalizable to all forms
   of the Tagged Index Object.  However, the particular structure of the
   TIO used for the TISDAG project is similar enough in structure to a
   centroid that the same "order of magnitude" and growth
   characteristics are applicable.

   Factors that affects the size of the data ("number of entries"):

       .  Number of generated tokens
          The number of tokens generated from the directory data depends
          on what is tokenized. If data is tokenized on names and
          addresses (i.e. not unique data like phone numbers) a rough
          estimation is that the number_of_tokens = 0.2 *
          number_of_data_records. The growth is linear in the span from
          a few thousand to at least 1.2 million records. The growth
          should then level off since the sets of names and addresses
          are finite, but the current tests have not shown a break
          point.

          If data is tokenized on something that is unique, e.g. phone
          numbers, then a rough estimation is that the number_of_tokens
          = number_of_data_records. Note that it is possible to tokenize
          in different ways, for example divide the phone numbers in
          parts. This would result in fewer tokens.

       .  Number of directories
          Since the tokens are generated individually for each
          directory, the data size depends on the number of directories.
          10 directories with 100.000 records will generate the same
          amount of tokens as one directory with 1.000.000 records.

2.4.3 Index Object and Query Performance

   Factors that affects the performance ("queries/second"):

       .  Type of query (exact, substring, etc.)
          A 'substring' query is slower than an 'exact' query due to:
          1) somewhat slower look-up in the internal DAG database than
             an exact query.
          2) Mostly, a larger amount of data is fetched from the
             internal DAG database due to more hits, which generates
             more index processing.




Eklof & Daigle               Informational                      [Page 8]

RFC 2969             Wide Area Directory Deployment         October 2000


          3) Substring queries are sent to the directory servers which
             also results in more hits and more data fetched. The
             directory servers may also be more or less effective in
             handling substring queries.

       .  Number of search attributes
          A query with one or few attributes will most of the time
          result in many hits, which results in a lot of data, both
          internally in DAG and from the directory servers. On the other
          hand, a query with many attributes will result in a somewhat
          slower look-up in the internal DAG database.

       .  Number of directories
          A larger number of directories may result in many referrals,
          but it depends on the query. A simple query will generate a
          lot of referrals, which means a lot of data from the
          directories has to be fetched. It will also result in a
          somewhat slower look-up in the internal DAG database.

       .  Number of chained referrals
          Queries that are not chained are faster, since the result data
          does not have to be sent through the DAG system. Chained
          queries to several directories can be processed in parallel in
          the SAPs, but all data has to be processed in the CAP before
          sent to the client.

       .  Response time in the directory servers
          The response time from the directory servers are of course
          critical. The total response time for DAG is never faster than
          the slowest involved directory server.

       .  Number of tokens (size of Tagged Index Objects)
          The number of tokens has little impact on the look-up time in
          the internal DAG database.

2.5 Some evolutions

   To date, the TISDAG project has been "alive" for just over two years.
   During that time, there have been a number of evolutions -- in terms
   of technologies and ideas outside the project (e.g., user and service
   provider expectations, deployment of related software, etc) as well
   as goals and understanding within the scope of the project.

   Chief among these last is the fact that the project set out to
   primarily fulfill the role of a national referral service, and
   gradually evolved towards becoming more of a transparent protocol
   proxy service, fulfilling client queries as completely as possible,
   within the client protocol's semantics.  This evolution was probably



Eklof & Daigle               Informational                      [Page 9]

RFC 2969             Wide Area Directory Deployment         October 2000


   provoked by a number of reasons -- existing client & server software
   has a narrower range of accepted (expected) behaviour than their
   protocol specs may describe, once the technology was there for some
   proxying, going all the way seemed to be within reach, etc.

   >From the point of view of providing a national whitepages service,
   this is a very positive evolution.  However, it did place some
   strains on the original system architecture, for which some
   adjustments have been proposed (more detail below).  What is less
   clear is the impact this evolution will have on the flexibility of
   the system architecture -- in terms of addressing other applications,
   different protocols (and protocol paradigms), etc.  That is, the
   original intention of the system was to very simply fulfill an
   unsophisticated role -- "find things that sort of match the input
   query and let the client itself determine if the match is close
   enough".  As the requirements become more sophisticated, the
   simplicity of the system is impacted, and perhaps more brittle.
   (Some proposals for avoiding this are outlined in [DAG++], which
   attempts to return to the underlying principles and propose steps
   forward at that level).

   In terms of impact within the TISDAG project, this evolution lead to
   the following technical adjustments:

       .  The latest version of the technical specification makes a
          distinction (in the internal protocol grammar) between queries
          directed at the Referral Index, and those passed to SAPs to
          fulfill a query.  This distinction keeps the query-routing
          queries simple, but allows more sophistication in expressing a
          query designed to fulfill the client's original semantic
          expression.

       .  The additional constraints in the SAP query language is still
          not enough to allow the internal protocol to express very
          sophisticated queries.  Originally intended only for query-
          routing queries, the DAG/IP expects all queries to be token-
          based (whereas LDAP queries are phrase-oriented).  This means
          that SAPs have to do a good deal of "post-pruning" of WDSP
          result sets to match the DAG/IP query sent by a CAP for query
          fulfillment.  And, CAPs must in turn do more post-pruning to
          match the DAG/IP results (from the SAPs) to the original query
          semantics.

   The real strength of the TISDAG project was that it separated the
   technical framework needed to support the service from the
   configuration required in order to support a particular application
   or service -- query & schema mapping, configuration for protocols,




Eklof & Daigle               Informational                     [Page 10]

RFC 2969             Wide Area Directory Deployment         October 2000


   etc.  Future improvements should focus on evolving that framework,
   maintaining the separation from the specific applications, services,
   and protocols that may use it.

3.0 Related Projects

   The TISDAG project is not alone in attempting to solve the problems
   of providing coordinated access to resources managed by multiple,
   disparate services.

3.1 The Norwegian Directory of Directories (NDD)

   Described in [NDD], the Norwegian Directory of Directories project
   also aims to provide necessary infrastructure for a national
   directory service.  It assumes LDAP (v2 or v3) accessibility of WDSP
   information (provided by the WDSP itself, or through other
   arrangements), and aims to resolve some of the trickier issues
   associated with hooking together already-operational LDAP servers
   into a coherent network:  uniform distinguished naming scheme, and
   content-based referrals.  It also addresses some of the pragmatic
   realities of being compatible with different versions of LDAP clients
   -- e.g., v2, which does not support referrals, and v3, which does.

   At the heart of the system is the "Referral Index and Organizational
   information" (RIO) server, which provides a searchable catalogue over
   Norwegian organization. This facilitates the location of whitepages
   servers for individual organizations (assuming the query includes
   information about which organization(s) is(are) interesting).

   This work can be seen as being complementary to the TISDAG work, in
   that it provides a more focused service for integrating LDAP
   directory servers.  However, there is still some requirement that one
   knows the organization to which a person belongs before doing a
   search for their e-mail address. This may be reasonable for seeking
   mail addresses associated with a person's work organization, but is
   less often successful when it comes to finding a personal e-mail
   address -- in an age where ISPs abound, a priori knowledge of a
   user's ISP identification is unlikely.

3.2 DESIRE Directory Services

   The EC funded project DESIRE II (http://www.desire.org) is developing
   a distributed European indexing system for information on Research
   and Education. The Directory Services work undertaken by DANTE and
   SURFnet proposes an architecture applied to a server mesh structure
   to create a wide-area directory service infrastructure.





Eklof & Daigle               Informational                     [Page 11]

RFC 2969             Wide Area Directory Deployment         October 2000


   This service is intended to support both whitepages information with
   LDAP servers at WDSPs, as well as a Web-search meshes at various
   places using Whois++ for information about resources and routing of
   queries to other index-based services.

   Like the TISDAG project, the DESIRE directory services project aims
   to act as a focal point for queries, allowing client software to
   access appropriate resources from a wide range of disparate services.

   There are architectural differences between the approach used in the
   TISDAG project and the DESIRE directory service project, but many of
   the driving needs are the same, and the approach of using content-
   based indexing and referrals was also selected.

4.0 Some Directions for TISDAG Next Steps

   The fun thing with technology is that there are always more tweaks
   and changes that can be made.  However, a service should evolve in
   response to specific customer needs, and there are several ways in
   which the TISDAG service itself could advance. Some of them are
   outlined below, in terms of possibilities perceived at this time,
   rather than specific recommendations for underlying technology
   changes that would be necessary to fulfill them.  A related topic,
   networking DAG servers (meshes), is discussed in [DAG-Mesh].

4.1 Security support

   There is a need for security considerations when making use of a
   wide-scaled directory system in other application areas than the
   public white-pages application of the TISDAG project.  There are
   issues whether the directory service is distributed across the
   Internet, or even if it functions completely within an internal,
   closed network.

4.2 WDSPs attributes and schemas

   Today the DAG system makes use of 2 information schemas -- the
   DAGPERSON schema for information about specific people, and the
   DAGORGROLE schema for organizational roles. The technical
   specification includes a definition of the schema, as well as an
   understood mapping to (and from) some standard schemas used in the
   supported protocols.  Nevertheless, to include new WDSPs which may
   not have all attributes in schemas, may use different schemas as well
   as query attributes, it should be possible to provide creation and
   use of new customized/standardized schemas and perform schema mapping
   if it's necessary. It might also be possible to constrain queries to
   desired query attributes, templates, or object classes.




Eklof & Daigle               Informational                     [Page 12]

RFC 2969             Wide Area Directory Deployment         October 2000


   In practice, this means that different WDSP's may choose to use
   different subparts of one defined schema, or even implement local
   customizations.

5.0 Some conclusions

   Although fewer people now hold out the hope of a unified global
   directory service, based on standardize protocols,  it is interesting
   to see more projects providing infrastructure that permits unified
   access to what is otherwise an unforgivingly diverse and dislocated
   set of information servers.  What cannot be dictated (in standardized
   protocols and schemas) may yet be accommodated through service
   infrastructure.  The right approach seems to be to build better and
   better frameworks for supporting such diversified services, without
   making the framework architecture dependent on specific technologies.

6.0 Security Considerations

   To date, the TISDAG project has focused on serving only publicly-
   sharable information.  As noted in Section 4.1, any future work will
   have to provide additional facilities for providing authentication,
   authorization, encryption, and otherwise handling sensitive data in
   an open environment.

7.0 Acknowledgements

   This document outlines the perspectives and opinions of the authors,
   based on experience as well as many fruitful and enlightening
   discussions with others:  Roland Hedberg, Torbjorn Granat, Patrik
   Granholm, Rikard Wessblad and Sandro Mazzucato.

   The work described in this document was carried out as part of an
   on-going project of Ericsson.  For further information regarding that
   project, contact:

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?