rfc1913.txt
来自「RFC 的详细文档!」· 文本 代码 · 共 900 行 · 第 1/3 页
TXT
900 行
Network Working Group C. Weider
Request for Comments: 1913 Bunyip
Category: Standards Track J. Fullton
CNIDR
S. Spero
EIT
February 1996
Architecture of the Whois++ Index Service
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Abstract
The authors describe an architecture for indexing in distributed
databases, and apply this to the WHOIS++ protocol.
1. Purpose:
The WHOIS++ directory service [Deutsch, et al, 1995] is intended to
provide a simple, extensible directory service predicated on a
template-based information model and a flexible query language. This
document describes a general architecture designed for indexing
distributed databases, and then applys that architecture to link
together many of these WHOIS++ servers into a distributed, searchable
wide area directory service.
2. Scope:
This document details a distributed, easily maintained architecture
for providing a unified index to a large number of distributed
WHOIS++ servers. This architecture can be used with systems other
than WHOIS++ to provide a distributed directory service which is also
searchable.
3. Motivation and Introduction:
It seems clear that with the vast amount of directory information
potentially available on the Internet, it is simply not feasible to
build a centralized directory to serve all this information. If we
are to distribute the directory service, the easiest (although not
Weider, et al Standards Track [Page 1]
RFC 1913 Architecture of the Whois++ Index Service February 1996
necessarily the best) way of building the directory service is to
build a hierarchy of directory information collection agents. In this
architecture, a directory query is delivered to a certain agent in
the tree, and then handed up or down, as appropriate, so that the
query is delivered to the agent which holds the information which
fills the query. This approach has been tried before, most notably
in some implementations of the X.500 standard. However, there are
number of major flaws with the approach as it has been taken. This
new Index Service is designed to fix these flaws.
3.1. The search problem
One of the primary assumptions made by recent implementations of
distributed directory services is that every entry resides in some
location in a hierarchical name space. While this arrangement is
ideal for reading the entry once one knows its location, it is not as
good when one is searching for the location in the namespace of those
entries which meet some set of criteria. If the only criteria we know
about a desired entry are items which do not appear in the namespace,
we are forced to do a global query. Whenever we issue a global query
(at the root of the namespace), or a query at the top of a given
subtree in the namespace, that query is replicated to "all" subtrees
of the starting point. The replication of the query to all subtrees
is not necessarily a problem; queries are cheap. However, every
server to which the query has been replicated must process that
query, even if it has no entries which match the specified criteria.
This part of the global query processing is quite expensive. A poorly
designed namespace or a thin namespace can cause the vast majority of
queries to be replicated globally, but a very broad namespace can
cause its own navigation problems. Because of these problems, search
has been turned off at high levels of the X.500 namespace.
3.2. The location problem
With global search turned off, one must know in advance how the name
space is laid out so that one can guide a query to a proper location.
Also, the layout of the namespace then becomes critical to a user's
ability to find the desired information. Thus there are endless
battles about how to lay out the name space to best serve a given set
of users, and enormous headaches whenever it becomes apparent that
the current namespace is unsuited to the current usages and must be
changed (as recently happened in X.500). Also, assuming one does
impose multiple hierarchies on the entries through use of the
namespace, the mechanisms to maintain these multiple hierarchies in
X.500 do not exist yet, and it is possible to move entries out from
under their pointers. Also, there is as yet no agreement on how the
X.500 namespace should look even for the White Pages types of
information that is currently installed in the X.500 pilot project.
Weider, et al Standards Track [Page 2]
RFC 1913 Architecture of the Whois++ Index Service February 1996
3.3. The Yellow Pages problem
Current implementations of this hierarchical architecture have also
been unsuited to solving the Yellow Pages problem; that is, the
problem of easily and flexibly building special-purpose directories
(say of molecular biologists) and of automatically maintaining these
directories once they have been built. In particular, the attributes
appropriate to the new directory must be built into the namespace
because that is the only way to segregate related entries into a
place where they can be found without a global search. Also, there is
a classification problem; how does one adequately specify the proper
categories so that people other than the creator of the directory can
find the correct subtree? Additionally, there is the problem of
actually finding the data to put into the subtree; if one must
traverse the hierarchy to find the data, we have to look globally for
the proper entries.
3.4. Solutions
The problems examined in this section can be addressed by a
combination of two new techniques: directory meshes and forward
knowledge.
4. Directory meshes and forward knowledge
We'll hold off for a moment on describing the actual architecture
used in our solution to these problems and concentrate on a high
level description of what solutions are provided by our conceptual
approach. To begin with, although every entry in WHOIS++ does indeed
have a unique identifier (resides in a specific location in the
namespace) the navigational algorithms to reach a specific entry do
not necessarily depend on the identifier the entry has been assigned.
The Index Service gets around the namespace and hierarchy problems by
creating a directory mesh on top of the entries. Each layer of the
mesh has a set of 'forward knowledge' which indicates the contents of
the various servers at the next lower layer of the mesh. Thus when a
query is received by a server in a given layer of the mesh, it can
prune the search tree and hand the query off to only those lower
level servers which have indicated that they might be able to answer
it. Thus search becomes feasible at all levels of the mesh. In the
current version of this architecture, we have chosen a certain set of
information to hand up the mesh as forward knowledge. This may or may
not be exactly the set of information required to construct a truly
searchable directory, but the protocol itself doesn't restrict the
types of information which can be handed around.
In addition, the protocols designed to maintain the forward knowledge
will also work perfectly well to provide replication of servers for
Weider, et al Standards Track [Page 3]
RFC 1913 Architecture of the Whois++ Index Service February 1996
redundancy and robustness. In this case, the forward knowledge handed
around by the protocols is the entire database of entries held by the
replicated server.
Another benefit provided by the mesh of index servers is that since
the entry identification scheme has been decoupled from the
navigation service, multiple hierarchies can be built and easily
maintained on top of the existing data. Also, the user does not need
to know in advance where in the mesh the entry is contained.
Also, the Yellow Pages problem now becomes tractable, as the index
servers can pick and choose between information proffered by a given
server; because we have an architecture that allows for automatic
polling of data, special purpose directories become easy to construct
and to maintain.
5. Components of the Index Service:
5.1. WHOIS++ servers
The whois++ service is described in [Deutsch, et al, 1995]. As that
service specifies only the query language, the information model, and
the server responses, whois++ services can be provided by a wide
variety of databases and directory services. However, to participate
in the Index Service, that underlying database must also be able to
generate a 'centroid', or some other type of forward knowledge, for
the data it serves.
5.2. Centroids as forward knowledge
The centroid of a server is comprised of a list of the templates and
attributes used by that server, and a word list for each attribute.
The word list for a given attribute contains one occurrence of every
word which appears at least once in that attribute in some record in
that server's data, and nothing else.
A word is any token delimited by blank spaces, newlines, or the '@'
character, in the value of an attribute.
For example, if a whois++ server contains exactly three records, as
follows:
Record 1 Record 2
Template: User Template: User
First Name: John First Name: Joe
Last Name: Smith Last Name: Smith
Favourite Drink: Labatt Beer Favourite Drink: Molson Beer
Weider, et al Standards Track [Page 4]
RFC 1913 Architecture of the Whois++ Index Service February 1996
Record 3
Template: Domain
Domain Name: foo.edu
Contact Name: Mike Foobar
the centroid for this server would be
Template: User
First Name: Joe
John
Last Name: Smith
Favourite Drink: Beer
Labatt
Molson
Template: Domain
Domain Name: foo.edu
Contact Name: Mike
Foobar
It is this information which is handed up the tree to provide forward
knowledge. As we mention above, this may not turn out to be the
ideal solution for forward knowledge, and we suspect that there may
be a number of different sets of forward knowledge used in the Index
Service. However, the directory architecture is in a very real sense
independent of what types of forward knowledge are handed around, and
it is entirely possible to build a unified directory which uses many
types of forward knowledge.
5.3. Index servers and Index server Architecture
A whois++ index server collects and collates the centroids (or other
forward knowledge) of either a number of whois++ servers or of a
number of other index servers. An index server must be able to
generate a centroid for the information it contains. In addition, an
index server can index any other server it wishes, which allows one
base level server (or index server) to participate in many
hierarchies in the directory mesh.
5.3.1. Queries to index servers
An index server will take a query in standard whois++ format, search
its collections of centroids and other forward information, determine
which servers hold records which may fill that query, and then
notifies the user's client of the next servers to contact to submit
the query (referral in the X.500 model). An index server can also
contain primary data of its own; and thus act a both an index server
and a base level server. In this case, the index server's response to
Weider, et al Standards Track [Page 5]
RFC 1913 Architecture of the Whois++ Index Service February 1996
a query may be a mix of records and referral pointers.
5.3.2. Index server distribution model and centroid propogation
The diagram on the next page illustrates how a mesh of index servers
might be created for a set of whois++ servers. Although it looks like
a hierarchy, the protocols allow (for example) server A to be indexed
by both server D and by server H.
whois++ index index
servers servers servers
for for
whois++ lower-level
servers index servers
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?