⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 registry.c

📁 harvest是一个下载html网页得机器人
💻 C
📖 第 1 页 / 共 3 页
字号:
static char rcsid[] = "$Id: registry.c,v 2.2 2000/01/21 17:37:33 sxw Exp $";/* *  registry.c -- Registry Module, Keeps track of all objects in the Broker. * *  DEBUG:  section  70, level 1, 5, 9	Broker registry *          section  73, level 1, 5, 9	Broker registry hash tables *  AUTHOR: Harvest derived (William G. Camargo, Darren Hardy) * *  Harvest Indexer http://harvest.sourceforge.net/ *  ----------------------------------------------- * *  The Harvest Indexer is a continued development of code developed by *  the Harvest Project. Development is carried out by numerous individuals *  in the Internet community, and is not officially connected with the *  original Harvest Project or its funding sources. * *  Please mail lee@arco.de if you are interested in participating *  in the development effort. * *  This program is free software; you can redistribute it and/or modify *  it under the terms of the GNU General Public License as published by *  the Free Software Foundation; either version 2 of the License, or *  (at your option) any later version. * *  This program is distributed in the hope that it will be useful, *  but WITHOUT ANY WARRANTY; without even the implied warranty of *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the *  GNU General Public License for more details. * *  You should have received a copy of the GNU General Public License *  along with this program; if not, write to the Free Software *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. *//*  ---------------------------------------------------------------------- *  Copyright (c) 1994, 1995.  All rights reserved. * *    The Harvest software was developed by the Internet Research Task *    Force Research Group on Resource Discovery (IRTF-RD): * *          Mic Bowman of Transarc Corporation. *          Peter Danzig of the University of Southern California. *          Darren R. Hardy of the University of Colorado at Boulder. *          Udi Manber of the University of Arizona. *          Michael F. Schwartz of the University of Colorado at Boulder. *          Duane Wessels of the University of Colorado at Boulder. * *    This copyright notice applies to software in the Harvest *    ``src/'' directory only.  Users should consult the individual *    copyright notices in the ``components/'' subdirectories for *    copyright information about other software bundled with the *    Harvest source code distribution. * *  TERMS OF USE * *    The Harvest software may be used and re-distributed without *    charge, provided that the software origin and research team are *    cited in any use of the system.  Most commonly this is *    accomplished by including a link to the Harvest Home Page *    (http://harvest.cs.colorado.edu/) from the query page of any *    Broker you deploy, as well as in the query result pages.  These *    links are generated automatically by the standard Broker *    software distribution. * *    The Harvest software is provided ``as is'', without express or *    implied warranty, and with no support nor obligation to assist *    in its use, correction, modification or enhancement.  We assume *    no liability with respect to the infringement of copyrights, *    trade secrets, or any patents, and are not responsible for *    consequential damages.  Proper use of the Harvest software is *    entirely the responsibility of the user. * *  DERIVATIVE WORKS * *    Users may make derivative works from the Harvest software, subject *    to the following constraints: * *      - You must include the above copyright notice and these *        accompanying paragraphs in all forms of derivative works, *        and any documentation and other materials related to such *        distribution and use acknowledge that the software was *        developed at the above institutions. * *      - You must notify IRTF-RD regarding your distribution of *        the derivative work. * *      - You must clearly notify users that your are distributing *        a modified version and not the original Harvest software. * *      - Any derivative product is also subject to these copyright *        and use restrictions. * *    Note that the Harvest software is NOT in the public domain.  We *    retain copyright, as specified above. * *  HISTORY OF FREE SOFTWARE STATUS * *    Originally we required sites to license the software in cases *    where they were going to build commercial products/services *    around Harvest.  In June 1995 we changed this policy.  We now *    allow people to use the core Harvest software (the code found in *    the Harvest ``src/'' directory) for free.  We made this change *    in the interest of encouraging the widest possible deployment of *    the technology.  The Harvest software is really a reference *    implementation of a set of protocols and formats, some of which *    we intend to standardize.  We encourage commercial *    re-implementations of code complying to this set of standards. * */#include "broker.h"#include "log.h"/* Global variables */REGISTRY_HEADER *RegHdr = NULL;reg_t *Registry = NULL;extern char *DIRpath;extern int reg_limit;extern int do_fast_start;extern int qsock;/* *  Hash search structure for the Registry.  We use a fixed size hash table *  and chaining to quickly search the Registry for a FD without needing *  to page in the entire registry. * *  We also build a hash table for URLs to cut down the search *  time needed to locate Registry objects during collections. */#if 0/* *  Here are some good prime number choices.  It's important not to *  choose a prime number that is too close to exact powers of 2. */#undef  HASH_SIZE 103		/* prime number < 128 */#undef  HASH_SIZE 229		/* prime number < 256 */#undef  HASH_SIZE 467		/* prime number < 512 */#undef  HASH_SIZE 977		/* prime number < 1024 */#undef  HASH_SIZE 1979		/* prime number < 2048 */#undef  HASH_SIZE 4019		/* prime number < 4096 */#undef  HASH_SIZE 6037		/* prime number < 6144 */#undef  HASH_SIZE 7951		/* prime number < 8192 */#undef  HASH_SIZE 12149		/* prime number < 12288 */#undef  HASH_SIZE 16231		/* prime number < 16384 */#undef  HASH_SIZE 33493		/* prime number < 32768 */#undef  HASH_SIZE 65357		/* prime number < 65536 */#endif#define HASH_SIZE 6037		/* prime number < 6144 */#undef  uhash#define uhash(x)	((x) % HASH_SIZE)	/* for unsigned */#undef  hash#define hash(x)		(((x) < 0 ? -(x) : (x)) % HASH_SIZE)typedef struct HASH_LINK {	reg_t *item;	struct HASH_LINK *next;} hash_link;static hash_link *fdhtable[HASH_SIZE];	/* Hash table based on fd's */static hash_link *urlhtable[HASH_SIZE];	/* Hash table based on url's */static hash_link *md5htable[HASH_SIZE];	/* Hash table based on md5's *//* Local functions */static reg_t *RG_hash_search_byfd();static hash_link *RG_hash_url_bucket();static hash_link *RG_hash_md5_bucket();static void RG_hash_init();static void RG_hash_build();static void RG_hash_insert();static void RG_hash_delete();static void RG_hash_destroy();static void RG_hash_print();static int hash_md5();static int hash_url();static void RG_print_reg_t();/*********************************************************************** 			Initialization and Tear Down Routines ***********************************************************************//* -----------------------------------------------------------------   RG_Init() -- initialize the registry; build from files if neccessary.   ----------------------------------------------------------------- */int RG_Init(){	int status = SUCCESS;	/* Set the Registry file */	if (init_registry_file() == ERROR)		return ERROR;	RG_hash_init();		/* reset the hash table */	RG_gid_init();		/* reset the Gatherer ID mgmt */	Registry = NULL;	/* start of the linked list */	if (RegHdr != NULL)		xfree(RegHdr);	/* Must read the header first */	if ((RegHdr = read_header()) == NULL)		return ERROR;	/* There's nothing in the Registry, probably non-existant */	if (RegHdr == (REGISTRY_HEADER *) REGISTRY_EOF) {		RegHdr = (REGISTRY_HEADER *) xmalloc(sizeof(REGISTRY_HEADER));		RegHdr->magic = REGISTRY_MAGIC;		RegHdr->version = REGISTRY_VERSION;		RegHdr->nrecords = 0;		RegHdr->nrecords_deleted = 0;		RegHdr->nrecords_valid = 0;		status = RG_Sync_Registry();	} else {		/* We have a good registry file so read it into memory */		status = RG_Build_Registry();	}	return (status);}int RG_Sync_Registry(){	Log("Syncing Registry file.\n");	if (write_header(RegHdr) == ERROR) {		errorlog("Could not rewrite the Registry file Header.\n");		return ERROR;	}	return SUCCESS;}/* -----------------------------------------------------------------   RG_Build_Registry() -- build registry from disk.   Assume that Registry == NULL on entry.   ----------------------------------------------------------------- */int RG_Build_Registry(){	reg_t *tmp;	int ncount = 0, vcount = 0, dcount = 0, stale = 0, status;	Log("Building the in-memory Registry from disk %s...\n",	    do_fast_start ? "(in fast mode)" : "");	/* Must issue read_header first to reset the Registry file ptr */	if (read_header() == NULL)		return ERROR;	tmp = (reg_t *) xmalloc(sizeof(reg_t));	while (1) {		/* Grab the next Registry entry from disk */		memset(tmp, '\0', sizeof(reg_t));		status = get_record(tmp);		/* No more Registry entries */		if (status == REGISTRY_EOF) {        		RG_Free_Entry(tmp);             /* free memory */			break;		}		/* Something went wrong, so stop processing */		if (status == ERROR) {        		RG_Free_Entry(tmp);             /* free memory */			return ERROR;		}		ncount++;	/* number of objs processed */		/* just skip deleted objects, reuse tmp buffer */		/* deleted objects are determined by the header */		if (status == ENTRY_DELETED) {			dcount++;			continue;		}		/*		 *  See if the object really exists in the storage		 *  manager.  If it doesn't then, delete the object		 *  from the registry file.   Remember to mark, then		 *  restore the current place in the Registry when		 *  deleting the bad record.		 *  Otherwise, add it to the in-memory Registry by		 *  placing it at the front of the Registry.		 */		Debug(70,9,("Read Registry record: %d\n", tmp->FD));		if (!do_fast_start && (SM_Exist_Obj(tmp->FD) == FALSE)) {			Log("WARNING: Missing object (FD %d), deleting from Registry file.\n", tmp->FD);			set_registry_mark();        		if (remove_record(tmp) == ERROR)				errorlog("Cannot delete OBJ%d\n", tmp->FD);			else				dcount++;        		RG_Free_Entry(tmp);             /* free memory */			restore_registry_mark();		} else {			tmp->next = Registry;			tmp->prev = NULL;			if (Registry)				Registry->prev = tmp;			Registry = tmp;			vcount++;		}                if ((ncount & 0x1F) == 0) { /* check on pending connections */                        (void)select_loop(0, 0, 0);                }		tmp = (reg_t *) xmalloc(sizeof(reg_t));	}	RG_hash_build();	/* build the hash table for searching */	/* do some sanity checks */	if (RegHdr->nrecords != ncount) {		Log("WARNING: Stale Registry header: record cnt mismatch: %d != %d\n", RegHdr->nrecords, ncount);		RegHdr->nrecords = ncount;		stale = 1;	}	if (RegHdr->nrecords_deleted != dcount) {		Log("WARNING: Stale Registry header: delete cnt mismatch: %d != %d\n", RegHdr->nrecords_deleted, dcount);		RegHdr->nrecords_deleted = dcount;		stale = 1;	}	if (RegHdr->nrecords_valid != vcount) {		Log("WARNING: Stale Registry header: valid cnt mismatch: %d != %d\n", RegHdr->nrecords_valid, vcount);		RegHdr->nrecords_valid = vcount;		stale = 1;	}	if (stale == 1) {		(void) RG_Sync_Registry();	}	return SUCCESS;}/* -----------------------------------------------------------------   RG_Registry_Shutdown() -- save header and close registry file.   Does not free the registry, use RG_Free_Registry() if you want to.   ----------------------------------------------------------------- */void RG_Registry_Shutdown(){	(void)RG_Sync_Registry();	finish_registry_file();}/*********************************************************************** 		Registry maintenance: adding, deleting, compressing ***********************************************************************//* ----------------------------------------------------------------- *   RG_Register() -- add an OID to the registry.   ----------------------------------------------------------------- */int RG_Register(new_item)reg_t *new_item;{	reg_t *add = new_item;	Debug(70,1,("RG_Register: Adding object: %s\n", new_item->url));	/* Add to the Registry file */	if (append_new_record(add) == ERROR)		return ERROR;	/* add new entry by prepending it to the Registry */	add->prev = NULL;	add->next = Registry;	if (Registry != NULL) {		Registry->prev = add;	}	Registry = add;	if (debug_ok(70,9))		RG_print_reg_t(add);	RG_hash_insert(add);	/* add to hash table */	RegHdr->nrecords++;	/* update Registry header */	RegHdr->nrecords_valid++;	return SUCCESS;}/* ----------------------------------------------------------------- *   RG_Unregister() -- unregister OID.   ----------------------------------------------------------------- */int RG_Unregister(tmp)reg_t *tmp;{	Debug(70,1,("RG_Unregister: Deleting object: %s\n", tmp->url));	if (debug_ok(70,9))		RG_print_reg_t(tmp);	RG_hash_delete(tmp);		/* remove hash entry */

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -