http:^^www.cs.cornell.edu^info^people^lagoze^papers^www.html

来自「This data set contains WWW-pages collect」· HTML 代码 · 共 776 行 · 第 1/3 页

HTML
776
字号
MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 24-Nov-96 23:01:11 GMT
Content-Type: text/html
Content-Length: 35526
Last-Modified: Friday, 26-Apr-96 17:42:50 GMT

<HTML><HEAD><TITLE>"Drop-in" publishing with the World Wide Web</TITLE><META name=author content="Davis &amp; Lagoze"></HEAD><BODY><H1>"Drop-in" publishing with the World Wide Web</h1><h2>Jim Davis and Carl Lagoze<br>Xerox Inc. and Cornell University<br></H2><h4>Abstract</h4><blockquote>   The goal of drop-in publishing is to simplify digital publishing   over the Internet.  We would like digital publishing of   non-commercial matter (e.g. technical reports, course notes,   brochures) be as easy as sending email is now, but with the virtues   of archival storage and easy searching that we associate with   electronic libraries.  We propose a protocol, Dienst, to allow   communication between clients and document servers by encoding   object-oriented messages within URL's.  A preliminary version of   this protocol now runs at eight sites, and we describe some of its   features.  Next we present tools for automating the maintenance of   document collections.  Finally, we discuss the problems we've   had with the Web as it stands, hoping to motivate changes that   would improve performance of digital library systems such as ours.</blockquote><h2>A library with no limits...</h2><blockquote>"However one may sing the praises of those who by their virtue eitherdefend or increase the glory of their country, their actions onlyaffect worldly prosperity, and within narrow limits....[but] Aldus isbuilding up a library which has no other limits than the world itself."</blockquote>Desiderius Erasmus wrote these words in praise of his friend Aldus, abook publisher of the 16th century.  More than 400 years later,digital publishing may finally enable us to fulfill this vision,providing universal access to all the world's information.  What's inthe way?<p>The existing technologies (WWW, gopher, and even anonymous FTP) makereproduction and transmission fairly fast and cheap, but do little ornothing to help writers write or readers find or read documents.  Inour view, the problem is that they provide too little structure to thedocument collection.  All of them present basically the sameabstraction, namely a hierarchy of files, but do nothing to help theuser locate a file within a hierarchy.  Every site is different. Somegroup reports by year, others by project name; but even if every siteon the Internet organized its hierarchy identically, it would not beenough, because every site also has its own conventions for namingfiles, indicating data formats, and making searchable indices.  Awriter who wishes to contribute has basically the same problem - it'seasy to copy a file into an anonymous FTP area, but hard to make surethat it's indexed properly.  A considerate writer might want toprovide the same document in several formats, to increase the chancesof accessibility, but this is a nuisance.  We claim what's needed is anew, higher level protocol that hides the underlying details, andthe underlying tools to simply library management.<p>This paper presents our first steps towards the universallibrary.  We describe a protocol for universal access and the serverthat implements it.  (For those familiar with our server - in thispaper we describe not the currently running protocol, but rather theone we have submitted as an Internet Draft <!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><a HREF="#DIENSTPROT">[DIENSTPROT] </a>, which corrects  a number ofdesign flaws in the working version.  We regret any confusion thiscauses.)  We present a number of tools that integrate with our serverto make publishing a document on-line relatively easy.  We alsodiscuss the steps we took to bring a large, existing collection onlinefrom paper.  Finally, since our protocol is based on the World WideWeb, we also describe some of the problems we've observed in using it,in the hope that others at this conference will have solutions we canadopt.<p>Our focus on non-commercial publishing requires explanation.  Werealize that some content providers will not place their intellectualproperty on the net until clear definitions of legal rights andmechanisms for payment and protection are in place.  We have nothingto contribute in these areas.  Nevertheless there are a number ofproviders, such as universities or corporate internal groups, for whomthese issues are less pressing, and we believe that we can thus makesome useful contribution without working on the additional issuesraised by economics.<p><h2>Dienst provides a uniform protocol for document access</h2>Dienst is a protocol for search, retrieval and display of documents.Dienst models the digital library as a flat set of documents, each ofwhich has a unique name, can be in many formats (e.g., TIFF, GIF,Postscript) and consists of a set of named parts.<p>Dienst supports a message-passing interface to this document model.Messages may be addressed to every document server, to a particularserver, to one document, or to a particular part of a document.  Amessage is encoded into the "path" portion of a URL, and contains thename of the message, the recipient, and the arguments, if any.  Amessage may be sent to any convenient Dienst server (the nearest, forexample), which will execute it locally if or forward it asappropriate.  Dienst appears to be a single virtual documentcollection, and hides the details of the server distribution.  (Notethat the actual implementation does not use anobject oriented language, we use message passing only as a convenientconceptual model.)<p> Each document in Dienst has a unique identifier which names eachdocument in a location-independent manner.  This identifier, called a<b>DocID</b>, serves exactly the same role as a URN, and when URNs arefully specified we will adopt them.  A DocId has three components: a<b>naming convention</b>, a <b>publisher</b> and a <b>number</b>.  Toensure that each DocID is unique, each component (or rather, theinstitution that issues each component) guarantees that the nextcomponent is unique - thus each naming convention controls a namespaceof publishers, and each publisher issues a set of numbers.<p> For each publisher, there must be at least one server to handlemessages for the documents issued by that publisher.  In our view, theminimum commitment a publisher must make to issue a document is tostore and deliver the document to the network.  When a Dienst serverreceives a message for a document it locates the closest server forthe document's publisher and forwards the message to it.<p> Dienst messages address four types of digital library services:<b>user interface</b> services which present library information in aformat designed for human readability, <b>repository</b> services,which store the document, and support retrieval of all or part,<b>index</b> services, which provide search, and <b>miscellaneous</b>services, which provide general information about a server.<p>Of these four services, only the first is used directly by a human.The others used by programs, in particular other Dienst servers, butalso by other digital library or publishing systems.  For example, theStanford Information Filtering Tool (<!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><A HREF="#SIFT">[SIFT]</A>)obtains bibliographic records through the index interface, and we arecurrently designing a gateway to the WATERS (<!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><AHREF="#WATERS">[WATERS]</A>) system.  We encourage other developers ofdigital library systems to provide both user-interface andapplication-interfaces to their systems.<p>All services except the last are optional at a given site.  Thisallows maximal flexibility in the way that particular serverimplementations interoperate.  For example, one server may existsolely as a user interface gateway, providing transparent access forusers to a particular domain of indexes and repositories.  We see thisflexible interoperability as key to the development of a digitallibrary infrastructure where the "collection" will span multiple sitesand continents.<h3>Repository servers store documents in multiple formats</h3>A key difference between Dienst and other current digital librarysystems is its ability to represent documents in multiple formats.Most current digital libraries present documents in exactly one form,PostScript.  Although PostScript is almost always available for newlyproduced documents, there are problems with relying on it to theexclusion of all other formats.  First, most older works are onlyavailable in paper, making scanned page images the only practicalmeans of bringing the material online.  (We describe our experiencesin doing that below.)  Second, looking forward we can expect to seeother document representations become popular.  (Surely at a WorldWide Web conference we can claim that HTML will be used.)  A thirdreason is that for some applications, other formats are just better.For example, if one wishes to do full text indexing on a documentcollection, the plain text is  more useful than the PostScriptfile, and if one wishes to display just a single page, a collection ofpage images may be better than searching through PostScript.Therefore, Dienst's conceptual data model, allows each document to bestored in one or more formats.<p>The Dienst protocol includes a message that requests a document for alist of formats in which it is available.  We specify formats withMIME (<!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><A HREF="#MIME">[MIME]</A>) Content-types.  Dienst does notsupport the notion of explicit conversion between document formats (asdoes System 33 <!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><A HREF="#Putz">[Putz]</A>).  A repository willing andable to provide a document in a given format should simply list thatformat, even if it is only obtained through a conversion service.<p>Diversity is the rule on the Internet, and each site supportingDienst is likely to store their documents in a different way.  TheDienst protocol hides all detail of the underlying storageorganization -- this is in sharp contrast to FTP, Gopher, and "bare"HTTP, where the underlying hierarchy is visible.  Each Dienstrepository includes a function which maps from a DocID and format tothe actual storage pathname on that server.  This hides both detailsof file system structure and file typing or naming conventions fromoutside users.  Thus one may request, say, the second page of the TIFFversion of a document from a server without needing to know whereand how it is stored.<h3>Index servers support search</h3>An index server accepts queries (in some query language) and searchesfor document records that satisfy the query.  In our model, an indexserver is totally distinct from a repository.  Repository data is likelyto be huge, but index servers store only meta-data, which is quitemodest in size.  The choice of a query language is crucial to the power of an index server.  As we did not wish to make this choice,the Dienst protocol is designed with one initial query language,and provision for extension to support others.  <p>Every query language is based on an underlying model for the meta datait queries.  The initial query language in Dienst assumes a minimaldata model, where documents have an author, title, and abstract inaddition to the publisher and number.  A query may refer to any ofthese fields; if it refers to more than one then the terms areconnected with an implicit "and".  Thus one might query for alldocuments published by author "Wilson" at publisher "Stanford".<p> A search request returns a document of type<code>text/x-dienst-response</code>, consisting of records containingmeta-information on all the matching documents.  This meta-informationfollows the encoding proposed for Uniform Resource Characteristics(URC) <!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><a HREF="#URC"> [URC] </a>.  The URC draft proposes fields suchas title, author and Content-type and URL, all of which which areobviously applicable; we have added a number of experimental attributes.<h2>A prototype implementation runs at eight sites</h2>An initial version of Dienst and a prototype implementation weredeveloped as part of the Computer Science Technical Report (CSTR)

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?