⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 broker.h

📁 harvest是一个下载html网页得机器人
💻 H
📖 第 1 页 / 共 2 页
字号:
/* *  broker.h -- Global definitions and data type for the broker. * *  DEBUG: none *  AUTHOR: Harvest derived * *  $Id: broker.h,v 2.4 2000/01/21 17:37:33 sxw Exp $ * *  Harvest Indexer http://harvest.sourceforge.net/ *  ----------------------------------------------- * *  The Harvest Indexer is a continued development of code developed by *  the Harvest Project. Development is carried out by numerous individuals *  in the Internet community, and is not officially connected with the *  original Harvest Project or its funding sources. * *  Please mail lee@arco.de if you are interested in participating *  in the development effort. * *  This program is free software; you can redistribute it and/or modify *  it under the terms of the GNU General Public License as published by *  the Free Software Foundation; either version 2 of the License, or *  (at your option) any later version. * *  This program is distributed in the hope that it will be useful, *  but WITHOUT ANY WARRANTY; without even the implied warranty of *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the *  GNU General Public License for more details. * *  You should have received a copy of the GNU General Public License *  along with this program; if not, write to the Free Software *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. *//*  ---------------------------------------------------------------------- *  Copyright (c) 1994, 1995.  All rights reserved. * *    The Harvest software was developed by the Internet Research Task *    Force Research Group on Resource Discovery (IRTF-RD): * *          Mic Bowman of Transarc Corporation. *          Peter Danzig of the University of Southern California. *          Darren R. Hardy of the University of Colorado at Boulder. *          Udi Manber of the University of Arizona. *          Michael F. Schwartz of the University of Colorado at Boulder. *          Duane Wessels of the University of Colorado at Boulder. * *    This copyright notice applies to software in the Harvest *    ``src/'' directory only.  Users should consult the individual *    copyright notices in the ``components/'' subdirectories for *    copyright information about other software bundled with the *    Harvest source code distribution. * *  TERMS OF USE * *    The Harvest software may be used and re-distributed without *    charge, provided that the software origin and research team are *    cited in any use of the system.  Most commonly this is *    accomplished by including a link to the Harvest Home Page *    (http://harvest.cs.colorado.edu/) from the query page of any *    Broker you deploy, as well as in the query result pages.  These *    links are generated automatically by the standard Broker *    software distribution. * *    The Harvest software is provided ``as is'', without express or *    implied warranty, and with no support nor obligation to assist *    in its use, correction, modification or enhancement.  We assume *    no liability with respect to the infringement of copyrights, *    trade secrets, or any patents, and are not responsible for *    consequential damages.  Proper use of the Harvest software is *    entirely the responsibility of the user. * *  DERIVATIVE WORKS * *    Users may make derivative works from the Harvest software, subject *    to the following constraints: * *      - You must include the above copyright notice and these *        accompanying paragraphs in all forms of derivative works, *        and any documentation and other materials related to such *        distribution and use acknowledge that the software was *        developed at the above institutions. * *      - You must notify IRTF-RD regarding your distribution of *        the derivative work. * *      - You must clearly notify users that your are distributing *        a modified version and not the original Harvest software. * *      - Any derivative product is also subject to these copyright *        and use restrictions. * *    Note that the Harvest software is NOT in the public domain.  We *    retain copyright, as specified above. * *  HISTORY OF FREE SOFTWARE STATUS * *    Originally we required sites to license the software in cases *    where they were going to build commercial products/services *    around Harvest.  In June 1995 we changed this policy.  We now *    allow people to use the core Harvest software (the code found in *    the Harvest ``src/'' directory) for free.  We made this change *    in the interest of encouraging the widest possible deployment of *    the technology.  The Harvest software is really a reference *    implementation of a set of protocols and formats, some of which *    we intend to standardize.  We encourage commercial *    re-implementations of code complying to this set of standards. * */#ifndef _BROKER_H_#define _BROKER_H_#include <stdio.h>#include <string.h>#include <stdlib.h>#include <unistd.h>#include <malloc.h>#include <errno.h>#include <time.h>#include <ctype.h>#include <memory.h>#include <netdb.h>#include <signal.h>#include <limits.h>#include <fcntl.h>#include <sys/types.h>#include <sys/stat.h>#include <sys/param.h>#include <sys/time.h>#ifdef __STRICT_ANSI__#include <stdarg.h>#else#include <varargs.h>#endif#include <sys/socket.h>#ifdef _AIX#include <sys/select.h>#endif#include <netinet/in.h>#include "config.h"#include "util.h"#include "template.h"/* *  MAX_EVENTS - max # of clients the broker will hold in its queue */#define MAX_EVENTS      15/* *  READ_QUERY_TIMEOUT - A hack to make sure we get the whole query. *  Sometimes TCP breaks up long query packets, so we wait for a *  timeout to occur before thinking we have the whole query.  This *  value is microseconds (500000 is 0.5 seconds).  This value *  can be overridden with "Read-Query-Timeout" in the broker.conf file. */#define READ_QUERY_TIMEOUT 500000/* *  TRUNCATE_DESCRIPTIONS - should the broker only save one line of description */#ifndef TRUNCATE_DESCRIPTIONS#define TRUNCATE_DESCRIPTIONS#endif/* *  FORK_ON_BULK - If defined, then the Broker writes the results of *  a bulk query to a temporary file, then forks a process to send *  the results over the network. */#ifndef FORK_ON_BULK#define FORK_ON_BULK#endif/* *  QM_RET_EMBED_ATT - define for the broker to return embedded *  attributes requested with the #attribute directive. *  See broker/query_man.c. */#ifndef QM_RET_EMBED_ATT#define QM_RET_EMBED_ATT#endif/* ============ Return Values =========== *//* we want to always redefine these values */#undef ERROR#undef FAIL#undef SUCCESS#undef TRUE#undef FALSE#define ERROR 	-1#define FAIL 	0#define SUCCESS	1#define TRUE 	1#define FALSE 	0/* *  The Registry needs a 32-bit number for storage in on the file system. *  Define the num32 type as a 32-bit number as per the architecture. *  For example, on a DEC Alpha int's are 4 bytes and long's are 8 bytes. */#if SIZEOF_LONG == 4typedef long num32;#elif SIZEOF_INT == 4typedef int num32;#elsetypedef long num32;		/* assume that long's are 32bit */#endif#define NUM32LEN sizeof(num32)/* ============ Parser Modes =========== */#define NO_MODE 	0#define UPD_MODE 	1#define DEL_MODE 	2#define REF_MODE 	3#define LIST_MODE 	4/* ============ Collector Modes  =========== */#define UPD 	0#define DEL 	1#define REF 	2/* ============ Debug =========== */#ifdef DEBUG#undef DEBUG#define DEBUG 99		/* Set to 0 to remove all messages */#endif/* Debugging/tracing levels */#define DEBUG0		(DEBUG > 0)#define DEBUG1 		(DEBUG > 1)#define DEBUG2 		(DEBUG > 2)#define DEBUG3 		(DEBUG > 3)#define DEBUG9 		(DEBUG > 9)/* ============ Indexer Definitions  =========== */#define FULLI   	"Full"#define INCRI   	"Incremental"#define PEROBJI 	"Per-Object"#define I_FULL    	123#define I_INCR    	132#define I_PER_OBJ 	231/* ============ Default Definitions =========== */#define COLLECT_RATE 	86400	/* 24 hrs in sec */#define CLEAN_RATE 	43200	/* 12 hrs in sec */#define REFRESH_RATE 	2419200	/* 1 month in sec */#define Q_PORT 		8501	/* default port */ /* ============ Configuration Tags =========== */#define S_BRKDIR     	"Broker-Directory"#define S_BRKHP     	"Broker-Home-Page"#define S_BRKOFFLINE    "Collection-Only"#define S_CLEANR     	"Clean-Rate"#define S_COLLR      	"Collection-Rate"#define S_DESC    	"Description-Tag"#define S_RFR        	"Refresh-Rate"#define S_PORT       	"Port"#define S_GATHER       	"Gather"#define S_LOGK       	"Log-Key"#define S_TERSELOG      "Terse-Logging"#define S_FASTSTART     "Fast-Start"#define S_FCOLL      	"Collection-Time"#define S_INDTP      	"Index-Type"#define S_INDEXER      	"Indexer-Type"#define S_DEDLIM     	"Dead-Entry-Limit"#define S_MAXEV         "Event-Queue-Limit"#define S_WEBS       	"Web-Server"#define S_WEBPATH    	"Web-Path"#define S_COLCONF    	"Collection-Configure-File"#define S_PASSWD        "Admin-Password"#define S_APROC         "Admin-Process"#define S_RDQTO         "Read-Query-Timeout"/* ============ Misc. Definitions =========== */#define MAX_URL 	512#define MAX_FN_SIZE 	4096#define UFULL_U    	0#define UPARTIAL_U 	1#define CFULL_U    	2#define CPARTIAL_U 	3#define BAFULL_U    	4#define BAPARTIAL_U 	5#define BQFULL_U    	6#define BQPARTIAL_U 	7#define LIST_RG 	1#define MAX_DEAD 	6000#define MAX_QUERY 	2048#define SWRITE(_S, _B, _L) \	if (write(_S, _B, _L) == -1) { \		log_errno("socket write"); \		return ERROR; \	}/* ============ Module Names =========== */#define STMGR    	"Storage Manager: "#define REGIS    	"Registry: "#define COLLECT  	"Collector: "#define SCANNER  	"Scanner: "#define PARSER   	"Parser: "#define LOGG     	"Statistics Log:"/* ============ Error definitions =========== */#define FD_ERR   	"Unable to get FD"#define OPEN_ERR 	"Unable to open file"#define UNLINK_ERR 	"Unable to unlink file"#define CLOSE_ERR 	"Unable to close file"#define WRITE_ERR 	"Unable to write file"#define READ_ERR 	"Unable to read file"#define OBJECT_ERR 	"Object not found"#define ENTRY_ERR  	"Insufficient data supplied"#define TIME_ERR  	"Unable to get current time"#define PARSE_ERR 	"Parse Error"/* ============= Field Name definitions =========== */#define UPDATE_A        "update-time"#define MD5             "md5"#define TTL       	"time-to-live"#define LMT_A    	"last-modification-time"#define REFRESH_A 	"refresh-rate"#define GATH_HOST 	"gatherer-host"#define GATH_NAME 	"gatherer-name"#define GATH_VER  	"gatherer-version"#define NESTED_FN 	"nested-filename"#define OBJ_DESC  	"description"#define TTL_S       	12	/* cache of attribute name lengths */#define LMT_A_S    	22#define REFRESH_A_S 	12#define GATH_HOST_S 	13#define GATH_NAME_S 	13#define GATH_VER_S  	16#define NESTED_FN_S 	15#define OBJ_DESC_S  	11#define UPDATE_A_S      11#define MD5_S           3#define MD5LEN          32/* ============ Global Data Type ======== */typedef num32 fd_t;/* Gatherer Identifier */#define MAX_GATHERER_ID         8192typedef struct _broker_gatherer_id {        int GID;        /* Identifying number for gid cache */        char *gn;       /* Gatherer-Name */        num32 gns;      /* Gatherer-Name size */        char *gh;       /* Gatherer-Host */        num32 ghs;      /* Gatherer-Host size */        char *gv;       /* Gatherer-Version */        num32 gvs;      /* Gatherer-Version size */} GathererID;/* An in-memory Registry entry */typedef struct REG_T {

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -