⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 notes

📁 harvest是一个下载html网页得机器人
💻
字号:
The following routines are provided:void init_url()		Initializes the URL library.URL *url_open(char *url);	Parses the given URL and returns a URL data structure	if the access method of the URL is supported by liburl	and (optionally) if the hostname of the URL is valid.  	Otherwise, returns NULL.  Note that the data for the 	URL is not retrieved.int url_read(char *buf, int bufsiz, int offset, URL *up);	Reads at most bufsiz bytes from the URL into buf.  	url_read() begins reading the data offset bytes from 	the beginning of the URL.  If the URL hasn't been 	retrieved yet, then url_read() will call url_retrieve() 	to access the URL's data.  Returns the number of bytes 	read into buf; or negative on error.int url_retrieve(URL *up);	Retrieves the URL and saves the data into a local, 	temporary file.  Returns non-zero on error or if the 	URL could not be accessed; otherwise returns zero.	If you define USE_MD5 in url.h, then url_retrieve() will	compute an MD5 value once the data has been retrieved.void url_close(URL *up);	Cleans up a URL structure and removes the temporary 	file containing the URL's data (if present).void finish_url()		Cleans up the URL library.A typical usage is as follows:	/* Usage: test URL  	--- prints the URL to stdout */	#include <stdio.h>	#include "url.h"	main(argc, argv)	int argc;	char **argv;	{		URL *up;		int off = 0, n = 0;		char buf[1024];		init_url();		if ((up = url_open(argv[1])) == NULL) {			fprintf(stderr, "URL is invalid: %s\n", argv[1]);			exit(1);		}		while ((n = url_read(buf, 1024, off, up)) > 0) {			off += n;			fwrite(buf, 1, n, stdout);		}		url_close(up);		finish_url();	}To add support for new types: 	- add the type to url.h as URL_typename (if needed)	- modify url_parse() in url.c to parse the host/port/etc. from the URL	- modify url_open() & url_retrieve() to support the new type	- write a type_get(URL *up) function that retrieves the URL's data	  and places it into up->filename.The URL spec is available via anonymous ftp as:ftp://ftp.isi.edu/internet-drafts/draft-ietf-uri-url-06.txtThis directory also contains an FTP connection cache that caches the controlconnection to an FTP server, and allows many files to be transfered withoutreopening connections to the FTP server.  If a control connection hasn'tbeen used for some length of time, then it's closed.  It was written byDavid Merkel and Mark Peterson of the University of Colorado, Boulder.  -Darren Hardy, July 1994

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -