⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 gatherd.c

📁 harvest是一个下载html网页得机器人
💻 C
📖 第 1 页 / 共 2 页
字号:
static char rcsid[] = "gatherd.c,v 1.67 1996/01/17 10:34:56 duane Exp";/* *  gatherd.c - Server for Gatherer-Collector interface. * *  Usage:  gatherd    [-db | -index | -log | -zip | -cf file] [-d dir] port *          in.gatherd [-db | -index | -log | -zip | -cf file] [-d dir] * *  The -dir flag is short-hand for specifying the -db, -index, and -log *  flags.  It prepends the given directory with the default names for  *  -db (PRODUCTION.gdbm), -index (INDEX.gdbm), and -log (gatherd.log) *  *  The -zip file is a gzip'ed files that contains all of the templates. *  gatherd will send this file upon a SEND-UPDATE 0 command with *  compression enabled.  This saves gzip cycles on the server. * *  DEBUG: none *  AUTHOR: Harvest derived * *  Harvest Indexer http://www.tardis.ed.ac.uk/harvest/ *  --------------------------------------------------- * *  The Harvest Indexer is a continued development of code developed by *  the Harvest Project. Development is carried out by numerous individuals *  in the Internet community, and is not officially connected with the *  original Harvest Project or its funding sources. *  *  Please mail harvest@tardis.ed.ac.uk if you are interested in participating *  in the development effort. * *  This program is free software; you can redistribute it and/or modify *  it under the terms of the GNU General Public License as published by *  the Free Software Foundation; either version 2 of the License, or *  (at your option) any later version. *   *  This program is distributed in the hope that it will be useful, *  but WITHOUT ANY WARRANTY; without even the implied warranty of *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the *  GNU General Public License for more details. *   *  You should have received a copy of the GNU General Public License *  along with this program; if not, write to the Free Software *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. *//*  ---------------------------------------------------------------------- *  Copyright (c) 1994, 1995.  All rights reserved. *   *    The Harvest software was developed by the Internet Research Task *    Force Research Group on Resource Discovery (IRTF-RD): *   *          Mic Bowman of Transarc Corporation. *          Peter Danzig of the University of Southern California. *          Darren R. Hardy of the University of Colorado at Boulder. *          Udi Manber of the University of Arizona. *          Michael F. Schwartz of the University of Colorado at Boulder. *          Duane Wessels of the University of Colorado at Boulder. *   *    This copyright notice applies to software in the Harvest *    ``src/'' directory only.  Users should consult the individual *    copyright notices in the ``components/'' subdirectories for *    copyright information about other software bundled with the *    Harvest source code distribution. *   *  TERMS OF USE *     *    The Harvest software may be used and re-distributed without *    charge, provided that the software origin and research team are *    cited in any use of the system.  Most commonly this is *    accomplished by including a link to the Harvest Home Page *    (http://harvest.cs.colorado.edu/) from the query page of any *    Broker you deploy, as well as in the query result pages.  These *    links are generated automatically by the standard Broker *    software distribution. *     *    The Harvest software is provided ``as is'', without express or *    implied warranty, and with no support nor obligation to assist *    in its use, correction, modification or enhancement.  We assume *    no liability with respect to the infringement of copyrights, *    trade secrets, or any patents, and are not responsible for *    consequential damages.  Proper use of the Harvest software is *    entirely the responsibility of the user. *   *  DERIVATIVE WORKS *   *    Users may make derivative works from the Harvest software, subject  *    to the following constraints: *   *      - You must include the above copyright notice and these  *        accompanying paragraphs in all forms of derivative works,  *        and any documentation and other materials related to such  *        distribution and use acknowledge that the software was  *        developed at the above institutions. *   *      - You must notify IRTF-RD regarding your distribution of  *        the derivative work. *   *      - You must clearly notify users that your are distributing  *        a modified version and not the original Harvest software. *   *      - Any derivative product is also subject to these copyright  *        and use restrictions. *   *    Note that the Harvest software is NOT in the public domain.  We *    retain copyright, as specified above. *   *  HISTORY OF FREE SOFTWARE STATUS *   *    Originally we required sites to license the software in cases *    where they were going to build commercial products/services *    around Harvest.  In June 1995 we changed this policy.  We now *    allow people to use the core Harvest software (the code found in *    the Harvest ``src/'' directory) for free.  We made this change *    in the interest of encouraging the widest possible deployment of *    the technology.  The Harvest software is really a reference *    implementation of a set of protocols and formats, some of which *    we intend to standardize.  We encourage commercial *    re-implementations of code complying to this set of standards.   *   */#define USE_TIMEOUT#include <stdio.h>#include <string.h>#include <unistd.h>#include <errno.h>#include <signal.h>#include <sys/types.h>#include <sys/socket.h>#include <fcntl.h>#include <sys/stat.h>#include <netinet/in.h>#ifdef USE_TIMEOUT#ifdef _AIX#include <sys/select.h>#endif#endif#include "util.h"/* Global Variables */char *dbfile = "PRODUCTION.gdbm";char *logfile = "gatherd.log";char *cffile = "gatherd.cf";char *pidfile = "gatherd.pid";char *indexfile = "INDEX.gdbm";char *allzipped = "All-Templates.gz";char *infofile = "INFO.soif";char *cmd_gzip = "gzip";char *topdir = NULL;int allow_all = 0;int deny_all = 0;char *allow_hosts[BUFSIZ];char *deny_hosts[BUFSIZ];int master_pid = 0;/* External Functions */extern int serve_client();/* Local Functions */static void disconnect();static void master_slave_mode();static void usage();static void load_configuration();static void write_pid();static void sigreap(){	while (waitpid(-1, NULL, WNOHANG) > 0);		/* catch any zombies  */#ifdef _HARVEST_SYSV_	(void) signal(SIGCHLD, sigreap);#endif}static void sigcleanup(sig, code, scp, addr)     int sig, code;     struct sigcontext *scp;     char *addr;{	if (getpid() == master_pid)		unlink(pidfile);	Log("exiting due to signal %d\n", sig);	exit(1);}static void usage(){	fprintf(stderr, "\Usage:  gatherd    [-db | -index | -log | -zip | -cf file] [-d dir] port\n\        in.gatherd [-db | -index | -log | -zip | -cf file] [-d dir]\n");	exit(1);}int main(argc, argv)     int argc;     char *argv[];{	char *pgm = strdup(argv[0]);	static char buf[BUFSIZ];	int pid;	FILE *logfp = NULL;	signal(SIGCHLD, sigreap);	/* Process the command line arguments */	for (argc--, argv++; argc > 0 && **argv == '-'; argc--, argv++) {		if (!strcmp(*argv, "-db")) {			if (--argc < 1)				usage();			dbfile = strdup(*++argv);		} else if (!strcmp(*argv, "-index")) {			if (--argc < 1)				usage();			indexfile = strdup(*++argv);		} else if (!strcmp(*argv, "-log")) {			if (--argc < 1)				usage();			logfile = strdup(*++argv);		} else if (!strcmp(*argv, "-cf")) {			if (--argc < 1)				usage();			cffile = strdup(*++argv);		} else if (!strcmp(*argv, "-d") ||		    !strcmp(*argv, "-dir")) {	/* -dir is old */			if (--argc < 1)				usage();			topdir = strdup(*++argv);			/* Set gatherd's CWD to / if an absolute datadir pathname was given.   */			/* Otherwise, a relative pathname was given and just leave CWD where   */			/* gatherd was invoked from.  Don't chdir after the fork.              */			if (*topdir == '/')				if (chdir("/") < 0) {					perror("chdir: /:");					exit(1);				}			sprintf(buf, "%s/PRODUCTION.gdbm", topdir);			dbfile = strdup(buf);			sprintf(buf, "%s/INDEX.gdbm", topdir);			indexfile = strdup(buf);			sprintf(buf, "%s/gatherd.log", topdir);			logfile = strdup(buf);			sprintf(buf, "%s/gatherd.pid", topdir);			pidfile = strdup(buf);			sprintf(buf, "%s/gatherd.cf", topdir);			cffile = strdup(buf);			sprintf(buf, "%s/INFO.soif", topdir);			infofile = strdup(buf);			sprintf(buf, "%s/All-Templates.gz", topdir);			allzipped = strdup(buf);		}	}	/*	 * Before we go any further, make sure another gatherd isn't	 * already running.	 */	if ((pid = read_pid()) > -1) {		if (kill(pid, 0) > -1) {	/* gatherd already running *//* * Lets try it first without printing these messages.  If another gatherd * is running, just die silently (the way it used to work). * * If we print these messages, some users might take it as a real error * and then get stressed out about killing the old gatherd and having * to start a new one.  * * Note, gatherd traps almost every signal so it can remove its gatherd.pid * before it exits.  Therefore, we can be relatively sure that if  * a gatherd.pid exists and is killable, then its real. * fprintf (stderr, "A 'gatherd' process (pid %d) is already running for this gatherer\n", pid); fprintf (stderr, "Please remove %s\nif this is incorrect.\n", pidfile); * */			exit(0);		}	}	if ((logfp = fopen(logfile, "a+")) != NULL) {		setbuf(logfp, NULL);		init_log(logfp, logfp);	} else {		setbuf(stderr, NULL);		init_log(stderr, stderr);	}	if (access(indexfile, R_OK)) {		fatal("Cannot read: %s\n", indexfile);	}	if (access(infofile, R_OK)) {		Log("WARNING: Statistics file not readable: %s\n", infofile);		infofile = NULL;	}	if (access(allzipped, R_OK)) {		Log("WARNING: Cache file not readable: %s\n", allzipped);		allzipped = NULL;

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -