⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 url.h

📁 harvest是一个下载html网页得机器人
💻 H
字号:
/* *  url.h - URL Processing (parsing & retrieval) * *  $Id: url.h,v 2.1 1997/03/21 19:21:01 sxw Exp $ * *  AUTHOR: Harvest derived * *  Harvest Indexer http://harvest.sourceforge.net/ *  ----------------------------------------------- * *  The Harvest Indexer is a continued development of code developed by *  the Harvest Project. Development is carried out by numerous individuals *  in the Internet community, and is not officially connected with the *  original Harvest Project or its funding sources. * *  Please mail lee@arco.de if you are interested in participating *  in the development effort. * *  This program is free software; you can redistribute it and/or modify *  it under the terms of the GNU General Public License as published by *  the Free Software Foundation; either version 2 of the License, or *  (at your option) any later version. * *  This program is distributed in the hope that it will be useful, *  but WITHOUT ANY WARRANTY; without even the implied warranty of *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the *  GNU General Public License for more details. * *  You should have received a copy of the GNU General Public License *  along with this program; if not, write to the Free Software *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. *//* *  ---------------------------------------------------------------------- *  Copyright (c) 1994, 1995.  All rights reserved. * *    The Harvest software was developed by the Internet Research Task *    Force Research Group on Resource Discovery (IRTF-RD): * *          Mic Bowman of Transarc Corporation. *          Peter Danzig of the University of Southern California. *          Darren R. Hardy of the University of Colorado at Boulder. *          Udi Manber of the University of Arizona. *          Michael F. Schwartz of the University of Colorado at Boulder. *          Duane Wessels of the University of Colorado at Boulder. * *    This copyright notice applies to software in the Harvest *    ``src/'' directory only.  Users should consult the individual *    copyright notices in the ``components/'' subdirectories for *    copyright information about other software bundled with the *    Harvest source code distribution. * *  TERMS OF USE * *    The Harvest software may be used and re-distributed without *    charge, provided that the software origin and research team are *    cited in any use of the system.  Most commonly this is *    accomplished by including a link to the Harvest Home Page *    (http://harvest.cs.colorado.edu/) from the query page of any *    Broker you deploy, as well as in the query result pages.  These *    links are generated automatically by the standard Broker *    software distribution. * *    The Harvest software is provided ``as is'', without express or *    implied warranty, and with no support nor obligation to assist *    in its use, correction, modification or enhancement.  We assume *    no liability with respect to the infringement of copyrights, *    trade secrets, or any patents, and are not responsible for *    consequential damages.  Proper use of the Harvest software is *    entirely the responsibility of the user. * *  DERIVATIVE WORKS * *    Users may make derivative works from the Harvest software, subject *    to the following constraints: * *      - You must include the above copyright notice and these *        accompanying paragraphs in all forms of derivative works, *        and any documentation and other materials related to such *        distribution and use acknowledge that the software was *        developed at the above institutions. * *      - You must notify IRTF-RD regarding your distribution of *        the derivative work. * *      - You must clearly notify users that your are distributing *        a modified version and not the original Harvest software. * *      - Any derivative product is also subject to these copyright *        and use restrictions. * *    Note that the Harvest software is NOT in the public domain.  We *    retain copyright, as specified above. * *  HISTORY OF FREE SOFTWARE STATUS * *    Originally we required sites to license the software in cases *    where they were going to build commercial products/services *    around Harvest.  In June 1995 we changed this policy.  We now *    allow people to use the core Harvest software (the code found in *    the Harvest ``src/'' directory) for free.  We made this change *    in the interest of encouraging the widest possible deployment of *    the technology.  The Harvest software is really a reference *    implementation of a set of protocols and formats, some of which *    we intend to standardize.  We encourage commercial *    re-implementations of code complying to this set of standards. * */#ifndef _URL_H_#define _URL_H_#include "config.h"#include <time.h>/* *  The supported URLs look like: * *      file://host/pathname *      gopher://host[:port][/TypeDigitGopherRequest] *      http://host[:port][/[pathname][#name][?search]] *      ftp://[user[:password]@]host[:port][/pathname] * *  where host is either a fully qualified hostname, an IP number, or *  a relative hostname. * *  For http, any '#name' or '?search' directives are ignored. *  For ftp, any user, password, or port directives are unsupported. */struct url {	char *url;		/* Complete, normalized URL */	char *redir_from_url;	/* the original URL when REDIRECT'ed */	char *raw_pathname;	/* pathname portion of the URL, w/ escapes */	char *pathname;		/* pathname portion of the URL, w/o escapes */	char *host;		/* fully qualified hostname */	int type;		/* file, ftp, http, gopher, etc. */	int port;		/* TCP/IP port *//* Information for FTP/HTTP processing */	char *user;		/* Login name for ftp */	char *password;		/* password for ftp *//* Information for Gopher processing */	int gophertype;		/* Numeric type for gopher request *//* Information for HTTP processing */	char *http_version;	/* HTTP/1.0 Version */	int   http_status_code;	/* HTTP/1.0 Status Code */	char *http_reason_line;	/* HTTP/1.0 Reason Line */	char *http_mime_hdr;	/* HTTP/1.0 MIME Response Header *//* Information for local copy processing */	char *filename;		/* local filename */	char *shsafe_filename;	/* filename suitable within "'s of sh(1) */	FILE *fp;		/* ptr to local filename */	time_t lmt;		/* Last-Modification-Time */	int flags;		/* bitfield */#ifdef USE_MD5	char *md5;		/* MD5 value of URL */#endif#ifdef HTTP_AUTHENTICATION	char *auth_type;	char *auth_realm;	char *auth_str;#endif};typedef struct url URL;#define URL_FLAG_NONE		0#define URL_FLAG_PASS_USERINFO	(1<<0)#define URL_FLAG_LOCAL_MAPPED	(1<<1)#define URL_FLAG_NEED_UNLINK	(1<<2)#define URL_FLAG_SET(flag, bit) (flag |= bit)#define URL_FLAG_CLR(flag, bit) (flag &= ~bit)enum url_types {		/* Constants for URL types */	URL_UNKNOWN,	URL_FILE,		/* NOTE: the array default_URL_port[]	*/	URL_FTP,		/* in src/common/url/url.c depends on	*/	URL_GOPHER,		/* the order of this list.  Change one,	*/	URL_HTTP,		/* change the other.			*/	URL_NEWS,	URL_NOP,	URL_TELNET,	URL_WAIS,	URL_X,	URL_MAILTO};#ifndef _PARAMS#if defined(__STDC__) || defined(__cplusplus) || defined(__STRICT_ANSI__)#define _PARAMS(ARGS) ARGS#else /* Traditional C */#define _PARAMS(ARGS) ()#endif /* __STDC__ */#endif /* _PARAMS */URL *url_open _PARAMS((char *));int url_read _PARAMS((char *, int, int, URL *));int url_retrieve _PARAMS((URL *));int url_confirm _PARAMS((URL *));void url_close _PARAMS((URL *));void init_url _PARAMS(());void finish_url _PARAMS(());void url_purge _PARAMS(());URL *dup_url _PARAMS((URL *));void print_url _PARAMS((URL *));int http_get _PARAMS((URL *));int ftp_get _PARAMS((URL *));int ftp_get_auth _PARAMS((URL *));int gopher_get _PARAMS((URL *));int news_get _PARAMS((URL *));char *rfc1738_escape _PARAMS((char *));void rfc1738_unescape _PARAMS((char *));char *url_parse_relative _PARAMS((char *, char*));#ifdef USE_LOCAL_CACHEvoid init_cache _PARAMS(());char *lookup_cache _PARAMS((char *));time_t lmt_cache _PARAMS((char *));void add_cache _PARAMS((char *, char *, time_t));void finish_cache _PARAMS(());void expire_cache _PARAMS(());#endif#ifdef USE_CCACHEvoid url_initCache _PARAMS((int, long));void url_shutdowncache _PARAMS(());#endifextern struct _url_table {        char *scheme;        int port;        int (*get_func) ();} url_table[];/* from db.c */void urldb_init _PARAMS((char *));char *urldb_getmd5 _PARAMS((char *));void urldb_writesoif _PARAMS((char *, FILE *));char *urldb_getrefs _PARAMS((char *));int urldb_getlmt _PARAMS((char *));#endif /* _URL_H_ */

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -