index-eng.html

来自「Larbin互联网蜘蛛索引系统」· HTML 代码 · 共 97 行

HTML

97 行

<html><head>   <meta http-equiv="Content-Type" content="text/html">   <title>Larbin : Parcourir le web, telle est ma passion</title></head><body bgcolor="#FFFFFF"><table border=0 width="100%"><tr><td align="center"><font color="#FF0000"><h1>Larbin</h1></font><h1>Multi-purpose web crawler</h1></td><td align="right" width="5%"><a href="index.html"><img SRC="l-fr.jpg" ALT="version fran鏰ise"></a></td></tr></table><h2>Introduction</h2>Larbin is a web crawler (also called (web) robot, spider,scooter...). It is intended to fetch a large number of web pages tofill the database of a search engine. With a network fast enough,Larbin should be able to fetch more than 100 millions pages on astandard PC.<p>Larbin is (just) a web crawler, NOT an indexer.<p>Larbin was initially developped for the XYLEME project in the VERSOteam at INRIA. The goal of Larbin was to go and fetch xml pages on theweb to fill the database of an xml-oriented search engine. Thanks toits origins, Larbin is very generalistic (and easy to customize).<p><a href="use-eng.html">How to use Larbin</a><br><a href="custom-eng.html">How to customize Larbin</a><h2>Availability (<a href="download.html">Download</a>)</h2>Larbin is freely available on the web. It is under the GPL. Commentsare welcomed ! Please mail me if you use Larbin; I'll be very happy toknow it.<br>However, this program is not suited for personnal use, and mightbe ill-used (wget or ht://dig are often more appropriate).<p>Whatever you might do with Larbin, don't forget I'm not the leastresponsible for the damages you might cause.<h2>Current state</h2>The current version of Larbin can fetch 5,000,000 pages a day on astandard PC (pentium II 300, 128 Mo SDRAM and a 10 Mbit ethernetcard).<br>Larbin works under Linux and uses standard libraries, plus <ahref="http://www.chiark.greenend.org.uk/~ian/adns/">adns</a>. Theprogram is multithreaded but prefers using select instead of a lot ofthreads (for efficiency purposes).<br>The advantage of Larbin over wget or ht://dig is that it is muchfaster (because it opens a lot of connexions at a time) and verygeneralistic (in particular very easy to customize).<h2>To do</h2>I have a lot of improvements in mind, but if you need somethingspecific, mail me (<A HREF="mailto:sebastien.ailleret@inria.fr">sebastien.ailleret@inria.fr</A>).Here are the things I want to do :<ul><li>Allow the program to run on multiple hosts.<li>Solaris compatibility.<li>Efficiency (less memory, less dns calls)</ul>Here is what you can do with it :<ul><li>A crawler for a standard search engine.<li>A crawler for a specialized search engine (xml, images, mp3...).<li>Statistics on the web (about servers or page contents)</ul><hr><table border=0 width="100%"><tr><td><a HREF="mailto:sebastien.ailleret@inria.fr">sebastien.ailleret@inria.fr</a></td><td align="right"><a href="http://pauillac.inria.fr/~ailleret/index-eng.html">Home Page</a></td></tr></table></body></html>

index-eng.html - 源码说明

本页面展示了「Larbin互联网蜘蛛索引系统」中的 index-eng.html 源码文件，采用 HTML 编程语言编写，共 97 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与Larbin相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?