rfc2655.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 956 行 · 第 1/3 页

TXT
956
字号






Network Working Group                                           T. Hardie
Request for Comments: 2655                                        Equinix
Category: Experimental                                          M. Bowman
                                                                 Transarc
                                                                 D. Hardy
                                                                 Netscape
                                                              M. Schwartz
                                                            Affinia, Inc.
                                                               D. Wessels
                                                                    NLANR
                                                              August 1999

                CIP Index Object Format for SOIF Objects

Status of this Memo

   This memo defines an Experimental Protocol for the Internet
   community.  It does not specify an Internet standard of any kind.
   Discussion and suggestions for improvement are requested.
   Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1999).  All Rights Reserved.

1.  Abstract

   The Common Indexing Protocol (CIP) allows servers to form a referral
   mesh for query handling by defining a mechanism by which cooperating
   servers exchange hints about the searchable indices they maintain.
   The structure and transport of CIP are described in (Ref. 1), as are
   general rules for the definition of index object types.  This
   document describes SOIF, the Summary Object Interchange Format, as an
   index object type in the context of the CIP framework.  SOIF is a
   machine-readable syntax for transmitting structured summary objects,
   currently used primarily in the context of the World Wide Web.

   Query referral has often been dismissed as an ineffective strategy
   for handling searches of Web resources, and Web resources certainly
   present challenges not present in structured directory services like
   Rwhois.  In situations where a keyword-based free text search is
   desired, query referral is not likely to be effective because the
   query will probably be routed to every server participating in the
   referral mesh.  Where a search can be limited by reference to a
   specific resource attribute, however, query referral is an effective
   tool.  SOIF can be used to create such a known-attribute query mesh
   because it provides a method for associating attributes with net-
   addressable resources.



Hardie, et al.                Experimental                      [Page 1]

RFC 2655        CIP Index Object Format for SOIF Objects     August 1999


1.1 History

   SOIF was first defined by the Harvest project [Ref 2.] in January
   1994.  SOIF was derived from a combination of the Internet Anonymous
   FTP Archives IETF Working Group (IAFA) templates [Ref 3.] and the
   BibTeX bibliography format [Ref 4.].  The combination was originally
   noted for its advantages of providing a convenient and intuitive way
   for delimiting objects within a stream, and setting apart the URL for
   easy object access or invocation, while still preserving
   compatibility with IAFA templates.

   Mic Bowman, Darren Hardy, Mike Schwartz, and Duane Wessels each
   contributed to the creation of the SOIF format as part of the Harvest
   Project; later work took place as part of the FIND working group.

2.  Name

   The index object described below will have the MIME type of
   application/index.obj.HARVEST-SOIF-1.

3.  Payload Format

   Each summary object has 3 fundamental components: a template type, a
   URL, and zero or more ATTRIBUTE-VALUE pairs.  Because the VALUEs in
   the ATTRIBUTE-VALUE pairs may contain arbitrary data (cf. Section
   3.5), SOIF objects should be encoded in Base64 unless the template
   type unambiguously establishes that the VALUEs do not contain binary
   data.

3.1  Template Type

   The Template type is used to identify the set of ATTRIBUTEs contained
   within a particular SOIF object.  SOIF does not define the template
   types themselves; it only provides a way to associate the summary
   object with a predefined template type name.  Template types may be
   registered or unregistered.  Unregistered template types provide an
   indication of available ATTRIBUTE-VALUE pairs, but these may vary
   both according to the original resource and the method by which the
   summary object was generated.  Registered template types must refer
   to a formally specified description of all mandatory and optional
   ATTRIBUTE-VALUE pairs available for that type.  See [10] for a
   description of the process of registering template types with the
   IANA.

   Historically, the template types used by SOIF were derived from IAFA
   template types (Ref. 3). SOIF objects generated by the Harvest system
   have a "FILE" template type; in current practice this is the most
   common template type.  The "FILE" template type is a generic template



Hardie, et al.                Experimental                      [Page 2]

RFC 2655        CIP Index Object Format for SOIF Objects     August 1999


   type meant to handle a large variety of web-based resources.  No
   formal specification of it is available, though a list of ATTRIBUTE-
   VALUE pairs common to the "FILE" template type is found in Appendix
   A.  "DOCUMENT" and "OBJECT" are other generic template-types.

   The use of unregistered template types obviously presents some
   problems to the correct operation of query referral.  Two efforts
   have been mounted to allow peer-to-peer agreement on the association
   of template types with specific attribute sets: Netscape's RDM (Ref.
   6) and the STARTS project (Ref. 7).  Initially, CIP meshes based on
   systems which use unregisterested template types may need to use
   these or similar methods to associate template types with specific
   attribute sets.

   Mesh operators are strongly encouraged, however, to migrate to
   registered template types as soon as is practical.  Registered
   template types allow CIP meshes to derive the definitions of
   attributes, which enables multiple-language interfaces to the base
   attributes.  In addition, registered template types allow CIP meshes
   and other users of SOIF to establish the permitted data types and
   encodings of the VALUEs associated with each ATTRIBUTE.  This makes
   deriving the appropriate matching semantics for a particular VALUE
   much more straightforward and eliminates the limitations of the
   default octet-by-octet matching (cf. Section 4.).

3.2  URL

   Uniform Resource Locators (URLs) (Ref 5.) are used by SOIF as object
   IDENTIFIERs.  SOIF associates its summary objects with net-
   addressable resources by using the URL by which the resource was
   addressed as the initial field of the object body.  See section 3.4
   for the formal grammar associated with SOIF objects.

   This association allows the same resource to have multiple summary
   objects, differentiated only by the URL by which the resource was
   accessed.  This possibility does not, however, impact the usability
   of the URL as an object IDENTIFIER. Furthermore, since it can be
   argued that the net address is a salient part of the metadata, there
   may be compensating benefits to using the URL as an object
   IDENTIFIER.

   As noted in Appendix A, the Harvest project used several additional
   identity attributes ("Gatherer-Name", "Gatherer-Host", "Gatherer-
   Port" and "Gatherer-Version") to further identify the provenance of a
   particular object.  Within the context of CIP, it may be useful to
   identify the base sources of particular index objects; see Appendix B
   for one example of how a SOIF-based CIP hint could use the base
   source URL.



Hardie, et al.                Experimental                      [Page 3]

RFC 2655        CIP Index Object Format for SOIF Objects     August 1999


3.3  ATTRIBUTE-VALUE pairs.

   Each summary object has zero or more ATTRIBUTE-VALUE pairs, which
   contain metadata about the net-addressable resource referenced by the
   URL.  Pairs are composed of an ATTRIBUTE IDENTIFIER, the length of
   the VALUE, a delimeter, and the VALUE.  It should be stressed that
   ATTRIBUTE VALUE pairs are not CR/LF terminated, but parsed according
   to grammar set out in section 3.4.  In the examples in Section 3.6
   and in many other representations of SOIF objects, ATTRIBUTE-VALUE
   pairs are represented on individual lines to enhance readability.
   VALUEs may contain CR/LF, however, and implementors must be careful
   to parse the full VALUE.  Implementors of SOIF parsers MUST ignore
   <CR>,<LF>,<TAB>,<SPACE>, or other whitespace found between the VALUE
   of an ATTRIBUTE-VALUE pair and the ATTRIBUTE-IDENTIFIER of the
   subsequent pair.

   The SOIF syntax does not explicitly allow for a single ATTRIBUTE to
   have multiple VALUEs.  To handle multiple VALUEs for the same
   ATTRIBUTE, SOIF uses an ATTRIBUTE naming convention; a hyphen and
   positive integer are appended to the ATTRIBUTE name to create an
   ATTRIBUTE IDENTIFIER VALUE associated with a specific ATTRIBUTE.  For
   example, the ATTRIBUTE IDENTIFIERs "Author-1", "Author-2", and
   "Author-3" can be used to represent three VALUEs associated with the
   ATTRIBUTE "Author" where a specific resource has three authors.  See
   section 4 for the implications of this strategy on matching
   semantics.

3.4  SOIF Grammar

   The SOIF syntax is defined by the following grammar:

      SOIF            ::=  OBJECT SOIF |
                           OBJECT
      OBJECT          ::=  @ TEMPLATE-TYPE { URL ATTRIBUTE-LIST }
      TEMPLATE-TYPE   ::=  IDENTIFIER
      ATTRIBUTE-LIST  ::=  ATTRIBUTE ATTRIBUTE-LIST |
                           ATTRIBUTE |
                           NULL
      ATTRIBUTE       ::=  IDENTIFIER {VALUE-SIZE} DELIMITER VALUE
      URL             ::=  RFC1738-URL-Syntax | "-"
      IDENTIFIER      ::=  ALPHA-NUMERIC-STRING
      VALUE           ::=  ARBITRARY-DATA
      VALUE-SIZE      ::=  NUMERIC-STRING
      DELIMITER       ::=  ":<TAB>"







Hardie, et al.                Experimental                      [Page 4]

RFC 2655        CIP Index Object Format for SOIF Objects     August 1999


3.5   Grammar Description

   URL
      a Uniform Resource Locator encoded in the syntax defined by RFC
      1738 [3].  If the summary object has no URL associated with it,
      then a Latin-1 hyphen (octal \055) is used instead.

   IDENTIFIER
      an ASCII character string that only contains alphanumeric
      characters and hyphens or underscores.  IDENTIFIERs should avoid
      including hyphens followed by positive integers except when
      constructing multiple-VALUE ATTRIBUTE IDENTIFIERs.

   VALUE
      a buffer of VALUE-SIZE octets containing the VALUE.  The VALUE may
      contain data in arbitrary formats or encodings, which recipients
      recognize based on Template-Type.

   VALUE-SIZE
      a non-negative integer encoded as an ASCII character string.  The
      integer indicates how many octets the VALUE occupies after the
      DELIMITER.

   DELIMITER
      a two octet delimiter which is a Latin-1 colon (:) and a tab (\t),
      (octal \072\011).

   { }  the Latin-1 curly braces (octal \173 and \175) are used to wrap
      the VALUE-SIZE (no spaces) as well as the URL and ATTRIBUTE-LIST
      combination.

   @TEMPLATE-TYPE
      the Latin-1 @ (octal \100) and TEMPLATE-TYPE (no space between
      them) is used to mark the beginning of the SOIF object.

   NUMERIC-STRING
      Zero or more ASCII numerals.

   ALPHA-NUMERIC-STRING
      Zero or more ASCII letters or numerals, plus hyphens or
      underscore.  [a-z,A-Z,0-9,- and _].

   ARBITRARY-DATA
      Octets of data in arbitrary formats or encodings.







Hardie, et al.                Experimental                      [Page 5]

RFC 2655        CIP Index Object Format for SOIF Objects     August 1999


4.  Matching Semantics

   As was discussed in Section 1, query referral of SOIF objects will be
   most effective when a query identifies a particular ATTRIBUTE or set
   of ATTRIBUTEs as the target of the query match.  A query-identified
   ATTRIBUTE should be considered to match a SOIF ATTRIBUTE when a
   case-insentive character-by-character comparison matches that portion
   of the ATTRIBUTE IDENTIFIER prior to any hyphen-integer suffix.  For
   example, a query which asks for a match on the ATTRIBUTE "author"
   should match the IDENTIFIERs "author", "Author", "AUTHOR", and
   "Author-1".  [10] discourages the registration of template types
   containing ATTRIBUTEs which have previously been registered with
   substantially different definitions.  This will help eliminate mis-
   referral, but a CIP mesh may nonetheless need to maintain a thesaurus
   matching ATTRIBUTEs from particular template-types to those of other,
   especially unregistered, template-types.

   The matching semantics appropriate for a particular VALUE are derived
   from its data type and encoding.  For VALUEs associated with
   ATTRIBUTEs which are part of a registered template type, the data
   type and encoding are readily available.  For VALUEs associated with
   ATTRIBUTES associated with unregistered template-types, an octet-by-
   octet comparison is the default.  In cases where previous experience
   has demonstrated that a particular ATTRIBUTE contains string data, a
   case-insensitive substring match may be used.  For example, in a
   query against the "AUTHOR" ATTRIBUTE of the generic "DOCUMENT"
   template type, the query VALUE "Garcia" should match the SOIF VALUEs
   "Garcia", "GARCIA", and "Jose Garcia y Montes".

   Over time, there may well emerge an understanding of which attributes
   tend to produce correct query referrals within a mesh.  As such
   understandings emerge, mesh maintainers may wish to define a
   particular SOIF TEMPLATE-TYPE which restricts included ATTRIBUTES to

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?