⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 creategatherer

📁 harvest是一个下载html网页得机器人
💻
📖 第 1 页 / 共 2 页
字号:
#!/usr/bin/perl -w##  CreateGatherer - Genetrates a new Gatherer in multi-gatherer model##  Usage: CreateGatherer##  	For example, CreateGatherer##  $Id: CreateGatherer,v 1.1 2002/03/03 12:45:55 sxw Exp $#$VERSION= '1.1';############################################################################### Script Added by Patrick Cao Huu Thien#	<patrick.cao-huu-thien@inrp.fr>#	2002/04/12## FILES## 1 - The file $ENV{'HARVEST_HOME'}/.CreateGatherer is read to set# different variables##	$vars{'Gatherer-Group'}#	$vars{'Gatherer-Host'}#	$vars{'Gatherer-Port'}## 2 - All files are create from skeleton in directory# $ENV{'HARVEST_HOME'}/gatherers/skeleton###################################################################  Harvest Indexer http://harvest.sourceforge.net/#  -----------------------------------------------##  The Harvest Indexer is a continued development of code developed by#  the Harvest Project. Development is carried out by numerous individuals#  in the Internet community, and is not officially connected with the#  original Harvest Project or its funding sources.##  Please mail lee@arco.de if you are interested in participating#  in the development effort.##  This program is free software; you can redistribute it and/or modify#  it under the terms of the GNU General Public License as published by#  the Free Software Foundation; either version 2 of the License, or#  (at your option) any later version.##  This program is distributed in the hope that it will be useful,#  but WITHOUT ANY WARRANTY; without even the implied warranty of#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the#  GNU General Public License for more details.##  You should have received a copy of the GNU General Public License#  along with this program; if not, write to the Free Software#  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.##########################################################################  Copyright (c) 1994, 1995.  All rights reserved.##    The Harvest software was developed by the Internet Research Task#    Force Research Group on Resource Discovery (IRTF-RD):##          Mic Bowman of Transarc Corporation.#          Peter Danzig of the University of Southern California.#          Darren R. Hardy of the University of Colorado at Boulder.#          Udi Manber of the University of Arizona.#          Michael F. Schwartz of the University of Colorado at Boulder.#          Duane Wessels of the University of Colorado at Boulder.##    This copyright notice applies to software in the Harvest#    ``src/'' directory only.  Users should consult the individual#    copyright notices in the ``components/'' subdirectories for#    copyright information about other software bundled with the#    Harvest source code distribution.##  TERMS OF USE##    The Harvest software may be used and re-distributed without#    charge, provided that the software origin and research team are#    cited in any use of the system.  Most commonly this is#    accomplished by including a link to the Harvest Home Page#    (http://harvest.cs.colorado.edu/) from the query page of any#    Broker you deploy, as well as in the query result pages.  These#    links are generated automatically by the standard Broker#    software distribution.##    The Harvest software is provided ``as is'', without express or#    implied warranty, and with no support nor obligation to assist#    in its use, correction, modification or enhancement.  We assume#    no liability with respect to the infringement of copyrights,#    trade secrets, or any patents, and are not responsible for#    consequential damages.  Proper use of the Harvest software is#    entirely the responsibility of the user.##  DERIVATIVE WORKS##    Users may make derivative works from the Harvest software, subject#    to the following constraints:##      - You must include the above copyright notice and these#        accompanying paragraphs in all forms of derivative works,#        and any documentation and other materials related to such#        distribution and use acknowledge that the software was#        developed at the above institutions.##      - You must notify IRTF-RD regarding your distribution of#        the derivative work.##      - You must clearly notify users that your are distributing#        a modified version and not the original Harvest software.##      - Any derivative product is also subject to these copyright#        and use restrictions.##    Note that the Harvest software is NOT in the public domain.  We#    retain copyright, as specified above.##  HISTORY OF FREE SOFTWARE STATUS##    Originally we required sites to license the software in cases#    where they were going to build commercial products/services#    around Harvest.  In June 1995 we changed this policy.  We now#    allow people to use the core Harvest software (the code found in#    the Harvest ``src/'' directory) for free.  We made this change#    in the interest of encouraging the widest possible deployment of#    the technology.  The Harvest software is really a reference#    implementation of a set of protocols and formats, some of which#    we intend to standardize.  We encourage commercial#    re-implementations of code complying to this set of standards.#################################################################### You can modify the init_vars function to change the default answers################################################################## this part is to avoid root to execute this script##if($ENV{'USER'} eq 'root') {#	print <<EOT;#    You execute this script under user ROOT !!!!!##    You MUST run it with the user that install Harvest##EOT#exit 1;#}$ENV{'HARVEST_HOME'} = "/usr/local/harvest" if (!defined($ENV{'HARVEST_HOME'}));$ENV{'PATH'} = "$ENV{'HARVEST_HOME'}/bin" . ":" .               "$ENV{'HARVEST_HOME'}/lib/gatherer". ":" .               "$ENV{'HARVEST_HOME'}/lib" . ":" .               "$ENV{'PATH'}";$whoami_cmd = "whoami";$whoami_cmd = "/bin/whoami" if (-x "/bin/whoami");$whoami_cmd = "/usr/bin/whoami" if (-x "/usr/bin/whoami");$whoami_cmd = "/usr/ucb/whoami" if (-x "/usr/ucb/whoami");chop($this_person = `$whoami_cmd`);$hostname_cmd = "hostname";$hostname_cmd = "/usr/bin/hostname" if (-x "/usr/bin/hostname");$hostname_cmd = "/usr/ucb/hostname" if (-x "/usr/ucb/hostname");$hostname_cmd = "/bin/hostname" if (-x "/bin/hostname");chop($this_host = `$hostname_cmd`);($this_host_full, @x) = gethostbyname($this_host);undef @x;&hr;print <<EOT;   Welcome to the CreateGatherer   This script will create a new directory with files to have a new   gatherer.   I will ask you some informations.EOTwhile(1) {        &init_vars();        &get_user_input();        &print_info();        $vals{"YorN"} = "yes";        &get_ans("Is this information correct?", "YorN");        last if ($vals{"YorN"} =~ /^y/io);}&build_gatherer;&save_data;exit 0;############################################################## Functionssub get_ans {        my($prompt, $tag) = @_;        my $default_value = "";        $default_value = $vals{$tag} if (defined($vals{$tag}));        print "$prompt", " [$default_value]: ";        my $in = <STDIN>;        chop($in);        $vals{$tag} = $in if ($in !~ /^\s*$/io);}sub get_user_input {### email&get_ans("Enter a email",	"Contact-Email");### Host&get_ans("Enter the Host of Gatherer",	"Gatherer-Host");### group&get_ans("Enter the Group of this Gatherer or return for no group",	"Gatherer-Group");if($vals{'Gatherer-Group'}){  $vals{"Gatherer-Directory"} .= "/$vals{'Gatherer-Group'}";  $vals{'mkdir-group'} = $vals{"Gatherer-Directory"}	unless(-d "$vals{'Gatherer-Directory'}");}### namewhile(1) {	&get_ans("Enter a one-word name for your Gatherer",		"Gatherer-Name");        last if ($vals{"Gatherer-Name"} !~ /^no_default/io);}$vals{"Gatherer-Directory"} .= '/'.$vals{"Gatherer-Name"};### directory&get_ans("Enter the home directory for your Gatherer",	"Gatherer-Directory");$vals{'mkdir'} = $vals{"Gatherer-Directory"}	unless(-d $vals{"Gatherer-Directory"});while(-d $vals{"Gatherer-Directory"}){	$vals{"YorN"} = 'no';	&get_ans("The Home directory ".$vals{"Gatherer-Directory"}." always exists, would you like to keep it ?",		"YorN");	if($vals{YorN} =~ /y/io){		$vals{"mkdir"} = '';		last;		}	&get_ans("Enter the home directory for your Gatherer",		"Gatherer-Directory");}&get_ans("Enter the port for your Gatherer",	"Gatherer-Port");}sub init_vars {&hr;print <<EOT;   Initialize the variables ....EOTmy $fileconf = "$ENV{'HARVEST_HOME'}/.CreateGatherer";if(-f $fileconf){  print <<EOT;   From configuration file ....EOT  open(IN,"< $fileconf") || die("can't open file $fileconf : $!");  while($_ = <IN>){    next if /^$/o;    next if /^\s*#/;    if(($key,$value) = /^\s*([\w\-]+)\s+([\w\-\@\.]+)\s*$/i){      $vals{"$key"} = $value;      printf("   %21s : %s\n",$key,$value)    }  }  print "\n";  close(IN);}$vals{"Gatherer-Host"} = $this_host_full	unless $vals{"Gatherer-Host"};$vals{"Gatherer-Group"} = ""	unless $vals{"Gatherer-Group"};$vals{"Gatherer-Name"} = "No_Default"	unless $vals{"Gatherer-Name"};$vals{"Gatherer-Port"} = "8601"	unless $vals{"Gatherer-Port"};$vals{"Gatherer-Directory"} = "$ENV{'HARVEST_HOME'}/gatherers"	unless $vals{"Gatherer-Directory"};$vals{"Contact-Email"} = "$this_person\@$this_host_full"	unless $vals{"Contact-Email"};$vals{"Skeleton-Directory"} = "$ENV{'HARVEST_HOME'}/gatherers/skeleton"	unless $vals{"Skeleton-Directory"};}sub add_url_point {my($file) = @_;my $result = '';&hr;print <<EOT;	This section allow you to add RootNode in the cofiguration file	$file.	See documentation for more detailsEOT$vals{'Collection-URL'} = "No_Default";# URLwhile(1) {	&get_ans("Enter the Base URL from which to collect",		"Collection-URL");        last if ($vals{"Collection-URL"} !~ /^no_default/io);}$result .= $vals{'Collection-URL'};

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -