首页 › 资源下载 › 其他 › This data set contai › 源码查看
http:^^www.ai.mit.edu^projects^haystack^

来自「This data set contains WWW-pages collect」· EDU^PROJECTS^HAYSTACK^ 代码 · 共 207 行
EDU^PROJECTS^HAYSTACK^
207 行
Date: Mon, 25 Nov 1996 23:59:43 GMT
Server: Apache/1.2-dev
Connection: close
Content-Type: text/html
Expires: Mon, 25 Nov 1996 23:59:43 GMT
Last-Modified: Fri, 30 Aug 1996 19:35:47 GMT
ETag: "c99de-21f5-32274293"
Content-Length: 8693
Accept-Ranges: bytes

<html><head><title>Haystack Home Page</title></head><body text="#000000" bgcolor="#ffffff" link="#cc0000" alink="#000080" vlink="#cc0000"><center><h5>The internal Haystack page has moved.  (<!WA0><ahref="mailto:haystack@ai.mit.edu">Send mail</a> if you need to find it.)<br><!WA1><img src="http://www.ai.mit.edu/icons/basic/construction3.gif">  This public page is underconstruction.</h5></center><!WA2><IMG align="top" SRC="http://www.ai.mit.edu/gifs/line.colorbar.gif" ALT="-----"><br><h3><center><!WA3><IMG align="top" SRC="http://www.ai.mit.edu/projects/haystack/haystack.jpg"><p><p>Welcome to the Haystack.<br>Pull up some straw and make yourself at home...</h3></center><p><!WA4><IMG align="top" SRC="http://www.ai.mit.edu/gifs/line.colorbar.gif" ALT="-----"><p>A great deal of research in information retrieval has been detachedfrom the users who could eventually benefit from such research.  Onthe one hand, traditional IR research systems have been cut off by aninconvenient interface or limited to an unchanging text collectionwith a fixed set of evaluation queries that become increasingly out ofdate as time passes.  On the other hand, the recent surge in websearch tools has resulted in many deployed IR systems, with moreconvenient but often limited interfaces and generally fixed (or atleast non-modifiable) corpora.    A few systems, such as Harvest orContent Routing, have attempted to address the gaps between these twoextremes, focusing on the construction of a more flexible substratewhich allows users and communities to build their own repositories orqueries. <p>The Haystack project is aimed at the individual customization end ofthese more realistic ``living'' information retrieval systems.  We areinterested in building on customizable substrates, such as thoseprovided by Harvest or Content Routing, to create a community ofindividual but interacting ``haystacks'': personal informationrepositories which archive not only base content but alsouser-specific meta-information, enabling them to adapt to theparticular needs of their users.  We believe that such a system willlet us address several questions:<menu>  <li>  How can individuals use an information retrieval system to       organize their own personal collection of information?  <li>  How might an information retrieval system learn from its users  and evolve over time into a more effective system?  <li>  As individuals build up their own collections and information  retrieval systems, how can they search for information that might be  located in others' collections, especially when such information is  organized by information retrieval systems that may differ greatly  from their own?</menu><p>Our first step towards this goal has been to design a simple andconvenient user interface to and annotation format for an informationretrieval system.  Our current annotations emphasize user-independenttext meta-information, but the format for and structure of theseannotations are intended to encompass hand-generated and automaticuser-specific annotations.  The annotations themselves are first-classdocuments in our system, so that, for example, search information canbe reified and treated as an indexable object.<p>In our implementation, we have chosen to detach the informationretrieval ``engine'' from the user interface and annotation system,specifying only that the engine should accept a natural language queryand return documents that ``match'' under whatever criteria it uses.We have begun by using the ``MG'' information retrieval system, butare concurrently investigating other ``back ends'' including ContentRouting, Harvest, and an in-house image-based IR system.<p>On top of this arbitrary engine, we are implementing severalinterfaces for retrieval as well as annotation editing.  The first isa web-proxy based interface, which allows users to connect to theinformation retrieval engine via their favorite web browser.  Tomaximize ease of use, we are also developing shell- and emacs-basedtools for talking to the collection.  Haystack is intended to archiveany objects from which text can be extracted; we are initiallyimplementing (or appropriating) ``textifiers'' for ascii, postscript,html, and scanned documents, but have an architecture that is easilyextensible to other documents types.<p>It is our intent that the simple standalone version of Haystack willbe easy to integrate into everyday use.  Since we project that even aminimal system will be of use to people aiming to organize their mail,file system, and favorite web pages, we therefore expect to attract amoderate-size community of users at MIT.  Once the system is in use,we will be able to leverage the annotation facilities to exploreseveral questions.<p>The first such question is how an information retrieval system willactually be used in practice.  By gathering usage data (withpermission) we hope to learn about the kinds of queries peopletypically use.  Are they usually boolean in nature?  Single words?  Dothey tend to be over-precise and find no documents, or do theyovergeneralize and get swamped with useless results?  How do theyreact to what comes back?  What refinement strategies do they use?Each haystack will provide a user-specific set of answers to thesequestions. <p>A second question is how a system might learn from interaction withits user.  Consider the scenario in which a user types an initialquery <I>Q</I>, then undertakes several stages of refinement to homein on the document <I>D</I> he wants.  For the future, the systemshould learn that when the user types a query like <I>Q</I>, document<I>D</I> is likely to be relevant even if it does not appear to be agood match.  The annotation system allows for both user andsystem-level support for this learning process.  The system mightannotate a document with terms that do not appear in it but that theuser types when he expects to find that document.  The user might alsoadd keywords or mnemonic phrases to a document in the expectation offuture searches for it.  Ultimately, the system may be able to makeuser-specific generalizations based on automatically or manuallyentered ``optimization'' annotations.<p>Given that individuals are organizing the information they care about,it is natural to ask how one user can benefit from the work of otherusers.  Consider that the typical way to search for a paper book is toask one's office-neighbor for it.  Analogously, we would like to letindividuals search for information in other people's haystacks.  Bothto limit the costs of a search and to improve the filtering of what isreturned, it is important for the system to learn over time whichother individuals are most likely to have information that a givenuser finds relevant---these haystack ``neighbors'' are the systemsthat should be queried first and whose results should be most trusted.<p>Another opportunity that this linking of haystacks creates is inconnecting individuals to other people who can address theirinformation need.  The information I have stored in my haystack islikely a good indicator of my knowledge and interests.  A questionthat matches a lot of material in my haystack is likely to be aquestion I can usefully answer.  The haystack system can thereforeserve as an ``information brokerage'' connecting questioners toexperts.  <p>Sharing haystacks also raises the issue of generalizing fromindividuals' customization of their own haystacks to larger (pooled)data-sets.  This provides another opportunity to test the adaptabilityof query strategies and a test of the generalization of the underlyinglearning algorithms.<p>The common thread among the above ideas is user-specific customizationof information, repositories, and retrieval processes.  These areissues that are possible to explore only in the hybrid world providedby the newest generation of information access tools.  By developingthe Haystack system, we will attract the community of users who willprovide the necessary testbed for exploring these questions aboutevolving, interacting customized information systems.<p><!WA5><IMG align="top" SRC="http://www.ai.mit.edu/gifs/line.colorbar.gif" ALT="-----"><br><blockquote><h3>Hayfolk:</h3><strong><i><menu>  <li> <!WA6><a href="http://theory.lcs.mit.edu/~karger">David Karger</a>  <li> <!WA7><a href="http://www.ai.mit.edu/people/las">Lynn Andrea Stein</a>  <li> <!WA8><a href="http://zone.mit.edu/">Eytan Adar</a>  <li> Mark Asdoorian  <li> Dwaine Clarke  <li> Lili Liu  <li> Eric Prebys  <li> Chuck Van Buren</menu></i></strong></blockquote><!WA9><IMG align="top" SRC="http://www.ai.mit.edu/gifs/line.colorbar.gif" ALT="-----"><p>Comments to the <!WA10><a href="mailto:haymaster@ai.mit.edu">HayMaster</a></html>
http:^^www.ai.mit.edu^projects^haystack^ - 源码说明

本页面展示了「This data set contains WWW-pages collected from computer science departments of various universities」中的 http:^^www.ai.mit.edu^projects^haystack^ 源码文件，采用 EDU^PROJECTS^HAYSTACK^ 编程语言编写，共 207 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫开发者社区收录了大量与数据集相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?