⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 news.c

📁 harvest是一个下载html网页得机器人
💻 C
字号:
static char rcsid[] = "news.c,v 1.13 1996/01/05 20:28:27 duane Exp";/* *  news.c - Retrieves USENET articles * *  Copyright (c) 1994, 1995.  All rights reserved. *   *    The Harvest software was developed by the Internet Research Task *    Force Research Group on Resource Discovery (IRTF-RD): *   *          Mic Bowman of Transarc Corporation. *          Peter Danzig of the University of Southern California. *          Darren R. Hardy of the University of Colorado at Boulder. *          Udi Manber of the University of Arizona. *          Michael F. Schwartz of the University of Colorado at Boulder. *          Duane Wessels of the University of Colorado at Boulder. *   *    This copyright notice applies to software in the Harvest *    ``src/'' directory only.  Users should consult the individual *    copyright notices in the ``components/'' subdirectories for *    copyright information about other software bundled with the *    Harvest source code distribution. *   *  TERMS OF USE *     *    The Harvest software may be used and re-distributed without *    charge, provided that the software origin and research team are *    cited in any use of the system.  Most commonly this is *    accomplished by including a link to the Harvest Home Page *    (http://harvest.cs.colorado.edu/) from the query page of any *    Broker you deploy, as well as in the query result pages.  These *    links are generated automatically by the standard Broker *    software distribution. *     *    The Harvest software is provided ``as is'', without express or *    implied warranty, and with no support nor obligation to assist *    in its use, correction, modification or enhancement.  We assume *    no liability with respect to the infringement of copyrights, *    trade secrets, or any patents, and are not responsible for *    consequential damages.  Proper use of the Harvest software is *    entirely the responsibility of the user. *   *  DERIVATIVE WORKS *   *    Users may make derivative works from the Harvest software, subject  *    to the following constraints: *   *      - You must include the above copyright notice and these  *        accompanying paragraphs in all forms of derivative works,  *        and any documentation and other materials related to such  *        distribution and use acknowledge that the software was  *        developed at the above institutions. *   *      - You must notify IRTF-RD regarding your distribution of  *        the derivative work. *   *      - You must clearly notify users that your are distributing  *        a modified version and not the original Harvest software. *   *      - Any derivative product is also subject to these copyright  *        and use restrictions. *   *    Note that the Harvest software is NOT in the public domain.  We *    retain copyright, as specified above. *   *  HISTORY OF FREE SOFTWARE STATUS *   *    Originally we required sites to license the software in cases *    where they were going to build commercial products/services *    around Harvest.  In June 1995 we changed this policy.  We now *    allow people to use the core Harvest software (the code found in *    the Harvest ``src/'' directory) for free.  We made this change *    in the interest of encouraging the widest possible deployment of *    the technology.  The Harvest software is really a reference *    implementation of a set of protocols and formats, some of which *    we intend to standardize.  We encourage commercial *    re-implementations of code complying to this set of standards.   *   */#include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <sys/ioctl.h>#include "util.h"#include "url.h"#if HOLD_NNTPint nntp_sock = -1;#define NNTP_PORT 119#endifstatic int open_nntp_sock _PARAMS((void));/* *  news_get() - retrieves the URL and prints into the file up->fp. *  Returns non-zero on error; 0 on success. */int news_get(up)     URL *up;{	static char cmd[BUFSIZ];	long nbytes;	char *buf = NULL;	int rc;#if HOLD_NNTP	if (nntp_sock == -1)		nntp_sock = open_nntp_sock();	if (nntp_sock != (-1)) {#ifdef FIONREAD		/* drain the socket of any leftover data */		buf = (char *) xmalloc(BUFSIZ);		do {			nbytes = 0;			ioctl(nntp_sock, FIONREAD, (caddr_t) & nbytes);			read(nntp_sock, buf, nbytes > BUFSIZ ? BUFSIZ : nbytes);#ifdef HAVE_USLEEP			usleep(500000);#endif /* HAVE_USLEEP */		} while (nbytes > 0);		xfree(buf);#endif /* FIONREAD */		sprintf(cmd, "newsget.pl -fd %d \"%s\" \"%s\"",		    nntp_sock,		    up->filename,		    up->url);	} else#endif /* HOLD_NNTP */		sprintf(cmd, "newsget.pl \"%s\" \"%s\"",		    up->filename,		    up->url);	rc = run_cmd(cmd);#if HOLD_NNTP	if (rc < 0) {		/* exited due to signal */		close(nntp_sock);		nntp_sock = -1;	}#endif /* HOLD_NNTP */	return rc;}static int open_nntp_sock(){	char *nntpserver;	struct sockaddr_in sa;	struct hostent *h;	int len;	char c;	int sock = -1;	nntpserver = getenv("NNTPSERVER");	if (nntpserver == NULL)		nntpserver = "news";	if ((h = gethostbyname(nntpserver)) == 0) {		errorlog("Unknown host: %s\n", nntpserver);		exit(1);	}	Log("Opening NNTP connection to %s\n", nntpserver);	sock = socket(AF_INET, SOCK_STREAM, 0);	if (sock < 0) {		log_errno("socket");		exit(1);	}	sa.sin_family = AF_INET;	sa.sin_port = htons(NNTP_PORT);	memcpy((char *) &sa.sin_addr, *(h->h_addr_list), h->h_length);	/*sa.sin_addr  = inet_addr (inet_ntoa (*(h->h_addr_list))); */	len = sizeof(sa);	if (connect(sock, (struct sockaddr *) &sa, len) < 0) {		log_errno(nntpserver);		exit(1);	}	do		read(sock, &c, 1);	while (c != '\n');	return sock;}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -