⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 notes

📁 harvest是一个下载html网页得机器人
💻
字号:
This file describes how to use the (collector) interface to the Gatherer,and how to run the gatherd daemon.Last-Modified: 1995/03/24 16:55:54		Configuration File for Access Control                -------------------------------------Gatherd also supports a simple configuration file that supports accesscontrol.  The configuration file has two directives "Allow" and "Deny".  Aline that starts with "Allow" is followed by any number of domain or hostnames that are allowed to connect to gatherd.  If the word "all" is used,then all hosts are allowed to connect to gatherd (the default).  "Deny"is the opposite of "Allow".  For example, this configuation	Allow	cs.colorado.edu ftp.wustl.edu	Deny	all	only allows hosts from cs.colorado.edu and the host ftp.wustl.edu toconnect to gatherd.You may also add the 'Gzip' tag to gatherd.cf for the full path to gzip:        Gzip /usr/local/bin/gzipOr, simply add the directory in which gzip lies in your PATH before running gatherd.		  The Collector/Gatherer Protocol		  -------------------------------First, you must have a description of a Gatherer.  The descriptionis, for now, a SOIF template, that looks like this:    @GATHERER { http://rd.cs.colorado.edu/~hardy/www-home-pages-gatherer.soif    Gatherer-Host{22}:	powell.cs.colorado.edu    Gatherer-Name{39}:	Global - Selected Text - WWW Home Pages    Gatherer-Port{4}:	1171    Gatherer-Version{3}:	0.1    Last-Modification-Time{9}:	772263506    Refresh-Rate{6}:	604800    Time-to-Live{7}:	2419200    Update-Time{9}:	772263510    }The protocol supports the following commands:	HELLO <hostname>         - Friendly Greeting	HELP                     - This message	SEND-OBJECT <oid>        - Send an Object Description	SEND-UPDATE <timestamp>  - Send all Object Descriptions that				   have been changed/created since timestamp	SET compression          - Enable GNU zip compressed transfers	QUIT                     - Close sessionFor example, to retrieve all of the gatherer's files the session wouldlook like (simple implementation is included as gather.c):	HELLO client.host.name	SEND-UPDATE 0	QUITTo retrieve all of the templates that have been changed/created in thelast week would look like (where t == time(NULL) - (1 * WEEK), andWEEK = (60 * 60 * 24 * 7)):	HELLO client.host.name	SEND-UPDATE t	QUITThe protocol has an NNTP flavor; you can 'telnet localhost 1171' and playaround interactively ala NNTP or SMTP.  A welcome message is sent by theserver initially.  Then after each command that the client submits, aerror/success message is sent by the server.  The first 3 characters ofthese messages contains one of the following error codes.  If the clientdoesn't send a command in 5 minutes or so, the connection times out.Protocol Error Codes:	000 Successful Greeting sent by server	001 Unknown command	002 Unimplemented command	003 Access Denied	100 Successful HELLO command	101 Invalid usage of HELLO command	102 DNS name & given name don't match in HELLO command	200 Successful HELP command	300 Successful SEND-OBJECT command	400 Successful SEND-UPDATE command	401 Invalid usage of SEND-UPDATE command	499 End of SEND-UPDATE output	999 GoodbyeThe WELCOME messageThe welcome message contains information about the protocol version.  Thewelcome message that the server sends is in following format:	000 - HELLO <version> <server host> - are you <client host>?like this:	000 - HELLO 0.1 powell.cs.colorado.edu - are you burton.cs.colorado.edu?The SEND-UPDATE commandThe output of the SEND-UPDATE command looks like this:	400 - Sending all Object Descriptions since 0	@DELETE { }	@REFRESH { }	@UPDATE { 	@DOCUMENT { /* template for object 1 */ }	@DOCUMENT { /* template for object 2 */ }	@DOCUMENT { /* template for object 3 */ }	...	@DOCUMENT { /* template for object n */ }	}	499 - Sent n Object DescriptionsCurrently, only the @UPDATE section is implemented.Some simple compression support has been added.  The 'SET' command can beused to set the server in compression mode.  When the server receives thecommand 'SET COMPRESSION', the all SEND-UPDATE requests are compressedusing GNU zip (gzip).  However, when all of the data for the SEND-UPDATEanswer is sent, the socket is closed.  This is so that the client doesn'tneed too look for a end-transmittion message (the 499 response normally).So when the SET COMPRESSION command has been issued, the output of theSEND-UPDATE command looks like this:	400 - Sending all Object Descriptions since 0	...GNU zip'ed data here...	...socket closed at end-of-transmission...The server can use faster index files to serve the templates.  When giventhe original database, mkindex will build an index for gatherd to use toserve the templates faster.  It will also support a cache for sendingall templates in compressed mode; use the mkcompressed command to buildthis cache.Version 0.3.x of the interface, packetizes the gzip data so that it doesn'tclose the connection at the end-of-transmission.  This code hasn't beenintegrated yet.  Also, both the @DELETE and @REFRESH sections have not beenimplemented yet.-Darren Hardy, July 1994

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -