http:^^www.cs.washington.edu^research^projects^softbots^papers^metacrawler^www4^html^overview.html

来自「This data set contains WWW-pages collect」· HTML 代码 · 共 1,261 行 · 第 1/4 页

HTML
1,261
字号
is to use provider-created interfaces. Providers could createan interface for the MetaCrawler to access their service which, inaddition to returning relevant hits, also returns the appropriate advertisement. Another solution involves theMetaCrawler accepting advertisements, and forming aprofit-sharing relationship with the service providers. We arecurrently investigating these and other methods of mutually beneficialco-existence with service providers.<P><A NAME=conclusions></A><H2><A NAME=SECTION00060000000000000000> Conclusions</A></H2><P>In this paper we have presented the MetaCrawler, a meta-service for Websearching with additional features designed to return more referencesof higher quality than standard search services. We demonstrated thatusers follow references reported by a variety of different searchservices, confirming that a single service is not sufficient (Table<!WA58><!WA58><!WA58><!WA58><A HREF="#followedtable">2</A>).  Further, due to the expressive power of theMetaCrawler's interface, the MetaCrawler was able to automaticallydetermine that up to 75% of the hits returned can be discarded. Finally, theperformance benchmarks and usage logs also show that the featuresprovided by the MetaCrawler are both reasonably fast and actually usedin practice.<P>The MetaCrawler provides a ``Consumer Reports'' of sorts for Web searchers.The individual service data extracted from the MetaCrawler'slogs is compelling evidence concerning the quality of each service. Bycomparing services using the same query text and recording whatlinks users follow, we are ableto evaluate the services from a user's point of view. As far as weknow, we are the first to quantitatively compare thesearch services used by MetaCrawler on a large sample of authentic userqueries.<P>While it is possible that some MetaCrawler features could beintegrated into the search services, others are intrinsic tometa-services.  By definition, only a meta-service can provide thecoverage gained by using multiple services. Also, as argued earlier,client-side meta-services can offer user and site customizations, andabsorb the load caused by post-processing of search results. Finally,there are some features that do not belong under control of searchservices for purely pragmatic reasons. For example, as more commercialsearch services become available, tools will emerge that select whichservices to use on the basis of cost.  An impartial meta-service suchas the MetaCrawler avoids the conflict of interest that would arise ifsuch a tool were offered by one of the commercial services.<P>New Web services are constantly being created. As the number andvariety of services grows, it is natural to group existing servicesunder one umbrella.  The MetaCrawler goes further than merelyorganizing services by creating an integrated <em> meta-service</em> thatmoves the interface (and the associated computational load) closer tothe user.  We believe that this trend of moving up the information``food chain'' will continue.  The MetaCrawler is one of the firstpopular meta-services, but many more will follow.<P><H2><A NAME=SECTION00070000000000000000> Acknowledgments</A></H2><P>The research presented in this paper could not have been accomplishedwithout the help of many individuals.  We would like to thank MaryKaye Rodgers, for editing assistance and for putting up with latenights. Ruth Etzioni and Ellen Spertus provided comments on an earlierdraft.  Dan Weld, Rich Segal, Keith Golden, George Forman, and DonaldChinn were very vocal and active in testing the early prototypes ofthe MetaCrawler, and Craig Horman and Nancy Johnson Burr wereextremely helpful and patient in dealing with it when it ranamok. Lara Lewis was very helpful in finding references upondemand. The Internet Softbot group provided early insight intodesirable features of the MetaCrawler, and Brian Bershad and Hank Levycontributed ideas relating to the impact the MetaCrawler could have onthe Web. Ken Waln aided in early development for his form patches tothe WWW C library, and Lou Montulli helped in later development byunlocking the secrets of <tt> nph</tt>-scripts and Netscape caching.MetaCrawler development was supported by gifts from US West andRockwell International Palo Alto Research. Etzioni's Softbot researchis supported by Office of Naval Research grant 92-J-1946 and byNational Science Foundation grant IRI-9357772.<P><P><A NAME=SECTIONREF><H2>References</H2></A><P><DL COMPACT><DT><A NAME=wwwinfoseek><STRONG>1</STRONG></A><DD>InfoSeek Corporation. InfoSeek Home Page. <BR> URL: <!WA59><!WA59><!WA59><!WA59><A NAME=tex2html17 HREF="http://www.infoseek.com"><tt>  http://www.infoseek.com</tt></A>.<P><DT><A NAME=wwwallinone><STRONG>2</STRONG></A><DD>William Cross. All-In-One Internet Search Page. <BR> URL: <!WA60><!WA60><!WA60><!WA60><A NAME=tex2html18 HREF="http://www.albany.net/~wcross/all1srch.html"><tt> http://www.albany.net/&#126;wcross/all1srch.html</tt></A>.<P><DT><A NAME=wwwsavvy><STRONG>3</STRONG></A><DD>Daniel Dreilinger. Savvy Search Home Page. <BR> URL: <!WA61><!WA61><!WA61><!WA61><A NAME=tex2html19 HREF="http://www.cs.colostate.edu/~dreiling/smartform.html"><tt> http://www.cs.colostate.edu/&#126;dreiling/smartform.html</tt></A>.<P><DT><A NAME=dreilingersavvy><STRONG>4</STRONG></A><DD>Daniel Dreilinger. Integrating Heterogeneous WWW Search Engines. <BR> URL: <!WA62><!WA62><!WA62><!WA62><A NAME=tex2html20 HREF="ftp://132.239.54.5/savvy/report.ps.gz"><tt>  ftp://132.239.54.5/savvy/report.ps.gz</tt></A>, May 1995.<P><DT><A NAME=wwwgalaxy><STRONG>5</STRONG></A><DD>EINet. Galaxy Home Page. <BR> URL: <!WA63><!WA63><!WA63><!WA63><A NAME=tex2html21 HREF="http://galaxy.einet.net/galaxy.html"><tt>  http://galaxy.einet.net/galaxy.html</tt></A>.<P><DT><A NAME=harvesttr><STRONG>6</STRONG></A><DD>C. Mic Bowman et al. Harvest: A Scalable, Customizable Discovery and Access System. Technical Report CU-CS-732-94, Department of Computer Science,  University of Colorado, Boulder, Colorado, March 1995. <BR> URL: <!WA64><!WA64><!WA64><!WA64><A NAME=tex2html22 HREF="http://harvest.cs.colorado.edu/harvest/papers.html"><tt>  http://harvest.cs.colorado.edu/harvest/papers.html</tt></A>.<P><DT><A NAME=wwwharvesthomepage><STRONG>7</STRONG></A><DD>Michael Schwartz et al. WWW Home Pages Harvest Broker. <BR> URL: <!WA65><!WA65><!WA65><!WA65><A NAME=tex2html23 HREF="http://town.hall.org/Harvest/brokers/www-home-pages/"><tt>  http://town.hall.org/Harvest/brokers/www-home-pages/</tt></A>.<P><DT><A NAME=etzioniuicacm><STRONG>8</STRONG></A><DD>O. Etzioni and D. Weld. A softbot-based interface to the internet. <em> CACM</em>, 37(7):72--76, July 1994. <BR> URL: <!WA66><!WA66><!WA66><!WA66><A NAME=tex2html24 HREF="http://www.cs.washington.edu/research/softbots"><tt>  http://www.cs.washington.edu/research/softbots</tt></A>.<P><DT><A NAME=wwwyahoo><STRONG>9</STRONG></A><DD>David Filo and Jerry Yang. Yahoo Home Page. <BR> URL: <!WA67><!WA67><!WA67><!WA67><A NAME=tex2html25 HREF="http://www.yahoo.com"><tt>  http://www.yahoo.com</tt></A>.<P><DT><A NAME=javawhitepaper><STRONG>10</STRONG></A><DD>James Gosling and Henry McGilton. The Java Language Environment: A White Paper. <BR> URL: <!WA68><!WA68><!WA68><!WA68><A NAME=tex2html26 HREF="http://java.sun.com/whitePaper/javawhitepaper_1.html"><tt>  http://java.sun.com/whitePaper/javawhitepaper_1.html</tt></A>.<P><DT><A NAME=wwwinfoMarket><STRONG>11</STRONG></A><DD>IBM, Inc. infoMarket Search Home Page. <BR> URL: <!WA69><!WA69><!WA69><!WA69><A NAME=tex2html27 HREF="http://www.infomkt.ibm.com"><tt>  http://www.infomkt.ibm.com</tt></A>.<P><DT><A NAME=wwwqbic><STRONG>12</STRONG></A><DD>IBM, Inc. Query By Image Content Home Page. <BR> URL: <!WA70><!WA70><!WA70><!WA70><A NAME=tex2html28 HREF="http://wwwqbic.almaden.ibm.com/~qbic/qbic.html"><tt> http://wwwqbic.almaden.ibm.com/&#126;qbic/qbic.html</tt></A>.<P><DT><A NAME=kosterrobots><STRONG>13</STRONG></A><DD>Martijn Koster. Robots in the Web: threat or treat? <em> ConneXions</em>, 9(4), April 1995.<P><DT><A NAME=wwwlexisnexis><STRONG>14</STRONG></A><DD>LEXIS-NEXIS. LEXIS-NEXIS Communication Center. <BR> URL: <!WA71><!WA71><!WA71><!WA71><A NAME=tex2html29 HREF="http://www.lexis-nexis.com"><tt>  http://www.lexis-nexis.com</tt></A>.<P><DT><A NAME=wwwlycos><STRONG>15</STRONG></A><DD>Michael Mauldin. Lycos Home Page. <BR> URL: <!WA72><!WA72><!WA72><!WA72><A NAME=tex2html30 HREF="http://lycos.cs.cmu.edu"><tt>  http://lycos.cs.cmu.edu</tt></A>.<P><DT><A NAME=lycossignidrv><STRONG>16</STRONG></A><DD>Michael L. Mauldin and John R. R. Leavitt. Web Agent Related Research at the Center for Machine Translation. In <em> Proceedings of SIGNIDR V</em>, McLean, Virginia, August 1994.<P><DT><A NAME=wwwhomr><STRONG>17</STRONG></A><DD>Max Metral. Helpful Online Music Recommendation Service. <BR> URL: <!WA73><!WA73><!WA73><!WA73><A NAME=tex2html31 HREF="http://rg.media.mit.edu/ringo/ringo.html"><tt>  http://rg.media.mit.edu/ringo/ringo.html</tt></A>.<P><DT><A NAME=wwwcusi><STRONG>18</STRONG></A><DD>Nexor. CUSI (Configurable Universal Search Interface). <BR> URL: <!WA74><!WA74><!WA74><!WA74><A NAME=tex2html32 HREF="http://pubweb.nexor.co.uk/public/cusi/cusi.html"><tt>  http://pubweb.nexor.co.uk/public/cusi/cusi.html</tt></A>.<P><DT><A NAME=wwww3enginelist><STRONG>19</STRONG></A><DD>University of Geneva. W3 Search Engines. <BR> URL: <!WA75><!WA75><!WA75><!WA75><A NAME=tex2html33 HREF="http://cuiwww.unige.ch/meta-index.html"><tt>  http://cuiwww.unige.ch/meta-index.html</tt></A>.<P><DT><A NAME=wwwopentext><STRONG>20</STRONG></A><DD>Open Text, Inc. Open Text Web Index Home Page. <BR> URL: <!WA76><!WA76><!WA76><!WA76><A NAME=tex2html34 HREF="http://www.opentext.com:8080/omw/f-omw.html"><tt>  http://www.opentext.com:8080/omw/f-omw.html</tt></A>.<P><DT><A NAME=wwwpls><STRONG>21</STRONG></A><DD>Personal Library Software, Inc. Personal Library Software Home Page. <BR> URL: <!WA77><!WA77><!WA77><!WA77><A NAME=tex2html35 HREF="http://www.pls.com"><tt>  http://www.pls.com</tt></A>.<P><DT><A NAME=wwwwebcrawler><STRONG>22</STRONG></A><DD>Brian Pinkerton. WebCrawler Home Page. <BR> URL: <!WA78><!WA78><!WA78><!WA78><A NAME=tex2html36 HREF="http://webcrawler.com"><tt>  http://webcrawler.com</tt></A>.<P><DT><A NAME=webcrawlerwww2><STRONG>23</STRONG></A><DD>Brian Pinkerton. Finding What People Want: Experiences with the WebCrawler. In <em> Proceedings of the Second World Wide Web Conference '94:  Mosaic and the Web</em>, Chicago IL USA, October 1993.<P><DT><A NAME=wwwvirtourist><STRONG>24</STRONG></A><DD>Brandon Plewe. The Virtual Tourist Home Page. <BR> URL: <!WA79><!WA79><!WA79><!WA79><A NAME=tex2html37 HREF="http://wings.buffalo.edu/world"><tt>  http://wings.buffalo.edu/world</tt></A>.<P><DT><A NAME=genmagicguidetocodewarrior><STRONG>25</STRONG></A><DD>Daniel Sears. Guide to CodeWarrior Magic/MPW. Development Release 1<BR> URL: <!WA80><!WA80><!WA80><!WA80><A NAME=tex2html38 HREF="http://www.genmagic.com/MagicCapDocs/CodeWarriorMagic/introduction.html"><tt>  http://www.genmagic.com/MagicCapDocs/CodeWarriorMagic/introduction.html</tt></A>, May  1995.<P><DT><A NAME=wwwdejanews><STRONG>26</STRONG></A><DD>DejaNews Research Service. DejaNews Home Page. <BR> URL: <!WA81><!WA81><!WA81><!WA81><A NAME=tex2html39 HREF="http://www.dejanews.com"><tt>  http://www.dejanews.com</tt></A>.<P><DT><A NAME=wwwsunmultithreaded><STRONG>27</STRONG></A><DD>Sun Microsystems, Inc. Multithreaded Query Page. <BR> URL: <!WA82><!WA82><!WA82><!WA82><A NAME=tex2html40 HREF="http://www.sun.com/cgi-bin/show?search/mtquery/index.body"><tt>  http://www.sun.com/cgi-bin/show?search/mtquery/index.body</tt></A>.<P><DT><A NAME=wwwmoviedatabase><STRONG>28</STRONG></A><DD>The Internet Movie Database Team. The Internet Movie Database. <BR> URL: <!WA83><!WA83><!WA83><!WA83><A NAME=tex2html41 HREF="http://www.msstate.edu"><tt>  http://www.msstate.edu</tt></A>.<P><DT><A NAME=wwwmckinley><STRONG>29</STRONG></A><DD>The McKinley Group, Inc. Magellan: McKinley's Internet Directory. <BR> URL: <!WA84><!WA84><!WA84><!WA84><A NAME=tex2html42 HREF="http://www.mckinley.com"><tt>  http://www.mckinley.com</tt></A>.<P><DT><A NAME=wwwverity><STRONG>30</STRONG></A><DD>Verity, Inc. Verity Home Page. <BR> URL: <!WA85><!WA85><!WA85><!WA85><A NAME=tex2html43 HREF="http://www.verity.com"><tt>  http://www.verity.com</tt></A>.</DL><P><H2><A NAME=SECTION00080000000000000000> About the Authors</A></H2><P>Erik Selberg, <em>  <!WA86><!WA86><!WA86><!WA86><A NAME=tex2html13 HREF="mailto:selberg@cs.washington.edu">selberg@cs.washington.edu</A></em>, <!WA87><!WA87><!WA87><!WA87><A NAME=tex2html15 HREF="http://www.cs.washington.edu/homes/selberg">http://www.cs.washington.edu/homes/selberg</A><BR> Department of Computer Science and Engineering <BR> Box 352350 <BR> University of Washington <BR> Seattle, WA 98195<P>Erik Selberg is pursuing his Ph.D. in computer science at theUniversity of Washington. His primary research area involves WorldWide Web search, although he also has interests regarding system performanceand security as well as multi-agent coordination andplanning. In April, 1995 he created the MetaCrawler, a parallel Websearch meta-service. He graduated from Carnegie Mellon University in1993 with a double major in computer science and logic, and receivedthe first Allen Newell Award for Excellence in Undergraduate Research.<P>Oren Etzioni, <em>  <!WA88><!WA88><!WA88><!WA88><A NAME=tex2html14 HREF="mailto:etzioni@cs.washington.edu">etzioni@cs.washington.edu</A></em>, <!WA89><!WA89><!WA89><!WA89><A NAME=tex2html16 HREF="http://www.cs.washington.edu/homes/etzioni">http://www.cs.washington.edu/homes/etzioni</A><BR> Department of Computer Science and Engineering <BR> Box 352350 <BR> University of Washington <BR> Seattle, WA 98195<P>Oren Etzioni received his bachelor's degree in computer science fromHarvard University in June 1986, and his Ph.D. from Carnegie MellonUniversity in January 1991.  He joined the University of Washington asassistant professor of computer science and engineering in February1991.  In the fall of 1991, he launched the Internet Softbots project.In 1993, Etzioni received an NSF Young Investigator Award.  In 1995,Etzioni was chosen as one of 5 finalists in the Discover Awards forTechnological Innovation in Computer Software for his work on Internet Softbots.<P>His research interests include: software agents, machine learning, andhuman-computer interaction.<P><H2><A NAME=SECTION000100000000000000000>   About this document ... </A></H2><P> <STRONG>Multi-Service Search and Comparison Using the         MetaCrawler</STRONG><P>This document was generated using the <!WA90><!WA90><!WA90><!WA90><A HREF="http://cbl.leeds.ac.uk/nikos/tex2html/doc/latex2html/latex2html.html"><STRONG>LaTeX</STRONG>2<tt>HTML</tt></A> translator Version 95.1 (Fri Jan 20 1995) Copyright &#169; 1993, 1994,  <!WA91><!WA91><!WA91><!WA91><A HREF="http://cbl.leeds.ac.uk/nikos/personal.html">Nikos Drakos</A>, Computer Based Learning Unit, University of Leeds. <P> The command line arguments were: <BR><STRONG>latex2html</STRONG> <tt>-split 0 www4-final.tex</tt>. <P>The translation was initiated by Erik Selberg on Mon Oct  9 17:24:12 PDT 1995<BR> <HR><P><ADDRESS><I>Erik Selberg <BR>Mon Oct  9 17:24:12 PDT 1995</I></ADDRESS></BODY>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?