rfc2655.txt
来自「RFC 的详细文档!」· 文本 代码 · 共 956 行 · 第 1/3 页
TXT
956 行
Network Working Group T. Hardie
Request for Comments: 2655 Equinix
Category: Experimental M. Bowman
Transarc
D. Hardy
Netscape
M. Schwartz
Affinia, Inc.
D. Wessels
NLANR
August 1999
CIP Index Object Format for SOIF Objects
Status of this Memo
This memo defines an Experimental Protocol for the Internet
community. It does not specify an Internet standard of any kind.
Discussion and suggestions for improvement are requested.
Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (1999). All Rights Reserved.
1. Abstract
The Common Indexing Protocol (CIP) allows servers to form a referral
mesh for query handling by defining a mechanism by which cooperating
servers exchange hints about the searchable indices they maintain.
The structure and transport of CIP are described in (Ref. 1), as are
general rules for the definition of index object types. This
document describes SOIF, the Summary Object Interchange Format, as an
index object type in the context of the CIP framework. SOIF is a
machine-readable syntax for transmitting structured summary objects,
currently used primarily in the context of the World Wide Web.
Query referral has often been dismissed as an ineffective strategy
for handling searches of Web resources, and Web resources certainly
present challenges not present in structured directory services like
Rwhois. In situations where a keyword-based free text search is
desired, query referral is not likely to be effective because the
query will probably be routed to every server participating in the
referral mesh. Where a search can be limited by reference to a
specific resource attribute, however, query referral is an effective
tool. SOIF can be used to create such a known-attribute query mesh
because it provides a method for associating attributes with net-
addressable resources.
Hardie, et al. Experimental [Page 1]
RFC 2655 CIP Index Object Format for SOIF Objects August 1999
1.1 History
SOIF was first defined by the Harvest project [Ref 2.] in January
1994. SOIF was derived from a combination of the Internet Anonymous
FTP Archives IETF Working Group (IAFA) templates [Ref 3.] and the
BibTeX bibliography format [Ref 4.]. The combination was originally
noted for its advantages of providing a convenient and intuitive way
for delimiting objects within a stream, and setting apart the URL for
easy object access or invocation, while still preserving
compatibility with IAFA templates.
Mic Bowman, Darren Hardy, Mike Schwartz, and Duane Wessels each
contributed to the creation of the SOIF format as part of the Harvest
Project; later work took place as part of the FIND working group.
2. Name
The index object described below will have the MIME type of
application/index.obj.HARVEST-SOIF-1.
3. Payload Format
Each summary object has 3 fundamental components: a template type, a
URL, and zero or more ATTRIBUTE-VALUE pairs. Because the VALUEs in
the ATTRIBUTE-VALUE pairs may contain arbitrary data (cf. Section
3.5), SOIF objects should be encoded in Base64 unless the template
type unambiguously establishes that the VALUEs do not contain binary
data.
3.1 Template Type
The Template type is used to identify the set of ATTRIBUTEs contained
within a particular SOIF object. SOIF does not define the template
types themselves; it only provides a way to associate the summary
object with a predefined template type name. Template types may be
registered or unregistered. Unregistered template types provide an
indication of available ATTRIBUTE-VALUE pairs, but these may vary
both according to the original resource and the method by which the
summary object was generated. Registered template types must refer
to a formally specified description of all mandatory and optional
ATTRIBUTE-VALUE pairs available for that type. See [10] for a
description of the process of registering template types with the
IANA.
Historically, the template types used by SOIF were derived from IAFA
template types (Ref. 3). SOIF objects generated by the Harvest system
have a "FILE" template type; in current practice this is the most
common template type. The "FILE" template type is a generic template
Hardie, et al. Experimental [Page 2]
RFC 2655 CIP Index Object Format for SOIF Objects August 1999
type meant to handle a large variety of web-based resources. No
formal specification of it is available, though a list of ATTRIBUTE-
VALUE pairs common to the "FILE" template type is found in Appendix
A. "DOCUMENT" and "OBJECT" are other generic template-types.
The use of unregistered template types obviously presents some
problems to the correct operation of query referral. Two efforts
have been mounted to allow peer-to-peer agreement on the association
of template types with specific attribute sets: Netscape's RDM (Ref.
6) and the STARTS project (Ref. 7). Initially, CIP meshes based on
systems which use unregisterested template types may need to use
these or similar methods to associate template types with specific
attribute sets.
Mesh operators are strongly encouraged, however, to migrate to
registered template types as soon as is practical. Registered
template types allow CIP meshes to derive the definitions of
attributes, which enables multiple-language interfaces to the base
attributes. In addition, registered template types allow CIP meshes
and other users of SOIF to establish the permitted data types and
encodings of the VALUEs associated with each ATTRIBUTE. This makes
deriving the appropriate matching semantics for a particular VALUE
much more straightforward and eliminates the limitations of the
default octet-by-octet matching (cf. Section 4.).
3.2 URL
Uniform Resource Locators (URLs) (Ref 5.) are used by SOIF as object
IDENTIFIERs. SOIF associates its summary objects with net-
addressable resources by using the URL by which the resource was
addressed as the initial field of the object body. See section 3.4
for the formal grammar associated with SOIF objects.
This association allows the same resource to have multiple summary
objects, differentiated only by the URL by which the resource was
accessed. This possibility does not, however, impact the usability
of the URL as an object IDENTIFIER. Furthermore, since it can be
argued that the net address is a salient part of the metadata, there
may be compensating benefits to using the URL as an object
IDENTIFIER.
As noted in Appendix A, the Harvest project used several additional
identity attributes ("Gatherer-Name", "Gatherer-Host", "Gatherer-
Port" and "Gatherer-Version") to further identify the provenance of a
particular object. Within the context of CIP, it may be useful to
identify the base sources of particular index objects; see Appendix B
for one example of how a SOIF-based CIP hint could use the base
source URL.
Hardie, et al. Experimental [Page 3]
RFC 2655 CIP Index Object Format for SOIF Objects August 1999
3.3 ATTRIBUTE-VALUE pairs.
Each summary object has zero or more ATTRIBUTE-VALUE pairs, which
contain metadata about the net-addressable resource referenced by the
URL. Pairs are composed of an ATTRIBUTE IDENTIFIER, the length of
the VALUE, a delimeter, and the VALUE. It should be stressed that
ATTRIBUTE VALUE pairs are not CR/LF terminated, but parsed according
to grammar set out in section 3.4. In the examples in Section 3.6
and in many other representations of SOIF objects, ATTRIBUTE-VALUE
pairs are represented on individual lines to enhance readability.
VALUEs may contain CR/LF, however, and implementors must be careful
to parse the full VALUE. Implementors of SOIF parsers MUST ignore
<CR>,<LF>,<TAB>,<SPACE>, or other whitespace found between the VALUE
of an ATTRIBUTE-VALUE pair and the ATTRIBUTE-IDENTIFIER of the
subsequent pair.
The SOIF syntax does not explicitly allow for a single ATTRIBUTE to
have multiple VALUEs. To handle multiple VALUEs for the same
ATTRIBUTE, SOIF uses an ATTRIBUTE naming convention; a hyphen and
positive integer are appended to the ATTRIBUTE name to create an
ATTRIBUTE IDENTIFIER VALUE associated with a specific ATTRIBUTE. For
example, the ATTRIBUTE IDENTIFIERs "Author-1", "Author-2", and
"Author-3" can be used to represent three VALUEs associated with the
ATTRIBUTE "Author" where a specific resource has three authors. See
section 4 for the implications of this strategy on matching
semantics.
3.4 SOIF Grammar
The SOIF syntax is defined by the following grammar:
SOIF ::= OBJECT SOIF |
OBJECT
OBJECT ::= @ TEMPLATE-TYPE { URL ATTRIBUTE-LIST }
TEMPLATE-TYPE ::= IDENTIFIER
ATTRIBUTE-LIST ::= ATTRIBUTE ATTRIBUTE-LIST |
ATTRIBUTE |
NULL
ATTRIBUTE ::= IDENTIFIER {VALUE-SIZE} DELIMITER VALUE
URL ::= RFC1738-URL-Syntax | "-"
IDENTIFIER ::= ALPHA-NUMERIC-STRING
VALUE ::= ARBITRARY-DATA
VALUE-SIZE ::= NUMERIC-STRING
DELIMITER ::= ":<TAB>"
Hardie, et al. Experimental [Page 4]
RFC 2655 CIP Index Object Format for SOIF Objects August 1999
3.5 Grammar Description
URL
a Uniform Resource Locator encoded in the syntax defined by RFC
1738 [3]. If the summary object has no URL associated with it,
then a Latin-1 hyphen (octal \055) is used instead.
IDENTIFIER
an ASCII character string that only contains alphanumeric
characters and hyphens or underscores. IDENTIFIERs should avoid
including hyphens followed by positive integers except when
constructing multiple-VALUE ATTRIBUTE IDENTIFIERs.
VALUE
a buffer of VALUE-SIZE octets containing the VALUE. The VALUE may
contain data in arbitrary formats or encodings, which recipients
recognize based on Template-Type.
VALUE-SIZE
a non-negative integer encoded as an ASCII character string. The
integer indicates how many octets the VALUE occupies after the
DELIMITER.
DELIMITER
a two octet delimiter which is a Latin-1 colon (:) and a tab (\t),
(octal \072\011).
{ } the Latin-1 curly braces (octal \173 and \175) are used to wrap
the VALUE-SIZE (no spaces) as well as the URL and ATTRIBUTE-LIST
combination.
@TEMPLATE-TYPE
the Latin-1 @ (octal \100) and TEMPLATE-TYPE (no space between
them) is used to mark the beginning of the SOIF object.
NUMERIC-STRING
Zero or more ASCII numerals.
ALPHA-NUMERIC-STRING
Zero or more ASCII letters or numerals, plus hyphens or
underscore. [a-z,A-Z,0-9,- and _].
ARBITRARY-DATA
Octets of data in arbitrary formats or encodings.
Hardie, et al. Experimental [Page 5]
RFC 2655 CIP Index Object Format for SOIF Objects August 1999
4. Matching Semantics
As was discussed in Section 1, query referral of SOIF objects will be
most effective when a query identifies a particular ATTRIBUTE or set
of ATTRIBUTEs as the target of the query match. A query-identified
ATTRIBUTE should be considered to match a SOIF ATTRIBUTE when a
case-insentive character-by-character comparison matches that portion
of the ATTRIBUTE IDENTIFIER prior to any hyphen-integer suffix. For
example, a query which asks for a match on the ATTRIBUTE "author"
should match the IDENTIFIERs "author", "Author", "AUTHOR", and
"Author-1". [10] discourages the registration of template types
containing ATTRIBUTEs which have previously been registered with
substantially different definitions. This will help eliminate mis-
referral, but a CIP mesh may nonetheless need to maintain a thesaurus
matching ATTRIBUTEs from particular template-types to those of other,
especially unregistered, template-types.
The matching semantics appropriate for a particular VALUE are derived
from its data type and encoding. For VALUEs associated with
ATTRIBUTEs which are part of a registered template type, the data
type and encoding are readily available. For VALUEs associated with
ATTRIBUTES associated with unregistered template-types, an octet-by-
octet comparison is the default. In cases where previous experience
has demonstrated that a particular ATTRIBUTE contains string data, a
case-insensitive substring match may be used. For example, in a
query against the "AUTHOR" ATTRIBUTE of the generic "DOCUMENT"
template type, the query VALUE "Garcia" should match the SOIF VALUEs
"Garcia", "GARCIA", and "Jose Garcia y Montes".
Over time, there may well emerge an understanding of which attributes
tend to produce correct query referrals within a mesh. As such
understandings emerge, mesh maintainers may wish to define a
particular SOIF TEMPLATE-TYPE which restricts included ATTRIBUTES to
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?