⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 newsget.pl

📁 harvest是一个下载html网页得机器人
💻 PL
字号:
: # *-*-perl-*-*    eval 'exec perl -S $0 "$@"'    if $running_under_some_shell;  ## newsget.pl,v 1.14 1996/01/22 21:29:23 duane Exp#########################################################################  Copyright (c) 1994, 1995.  All rights reserved.#  #    The Harvest software was developed by the Internet Research Task#    Force Research Group on Resource Discovery (IRTF-RD):#  #          Mic Bowman of Transarc Corporation.#          Peter Danzig of the University of Southern California.#          Darren R. Hardy of the University of Colorado at Boulder.#          Udi Manber of the University of Arizona.#          Michael F. Schwartz of the University of Colorado at Boulder.#          Duane Wessels of the University of Colorado at Boulder.#  #    This copyright notice applies to software in the Harvest#    ``src/'' directory only.  Users should consult the individual#    copyright notices in the ``components/'' subdirectories for#    copyright information about other software bundled with the#    Harvest source code distribution.#  #  TERMS OF USE#    #    The Harvest software may be used and re-distributed without#    charge, provided that the software origin and research team are#    cited in any use of the system.  Most commonly this is#    accomplished by including a link to the Harvest Home Page#    (http://harvest.cs.colorado.edu/) from the query page of any#    Broker you deploy, as well as in the query result pages.  These#    links are generated automatically by the standard Broker#    software distribution.#    #    The Harvest software is provided ``as is'', without express or#    implied warranty, and with no support nor obligation to assist#    in its use, correction, modification or enhancement.  We assume#    no liability with respect to the infringement of copyrights,#    trade secrets, or any patents, and are not responsible for#    consequential damages.  Proper use of the Harvest software is#    entirely the responsibility of the user.#  #  DERIVATIVE WORKS#  #    Users may make derivative works from the Harvest software, subject #    to the following constraints:#  #      - You must include the above copyright notice and these #        accompanying paragraphs in all forms of derivative works, #        and any documentation and other materials related to such #        distribution and use acknowledge that the software was #        developed at the above institutions.#  #      - You must notify IRTF-RD regarding your distribution of #        the derivative work.#  #      - You must clearly notify users that your are distributing #        a modified version and not the original Harvest software.#  #      - Any derivative product is also subject to these copyright #        and use restrictions.#  #    Note that the Harvest software is NOT in the public domain.  We#    retain copyright, as specified above.#  #  HISTORY OF FREE SOFTWARE STATUS#  #    Originally we required sites to license the software in cases#    where they were going to build commercial products/services#    around Harvest.  In June 1995 we changed this policy.  We now#    allow people to use the core Harvest software (the code found in#    the Harvest ``src/'' directory) for free.  We made this change#    in the interest of encouraging the widest possible deployment of#    the technology.  The Harvest software is really a reference#    implementation of a set of protocols and formats, some of which#    we intend to standardize.  We encourage commercial#    re-implementations of code complying to this set of standards.  #  $ENV{'HARVEST_HOME'} = "/usr/local/harvest" if (!defined($ENV{'HARVEST_HOME'}));unshift(@INC, "$ENV{'HARVEST_HOME'}/lib");	# use local files @F = split('/', $0); $prog = pop @F; undef @F;require "socket.ph";$debug	= 0;$XFER_TIMEOUT = $ENV{'HARVEST_XFER_TIMEOUT'} || 120;	# timeout of 2 minutes between readsalarm (300);		# limit process to 5 minutes until start reading$nntp_sock = -1;if ($ARGV[0] eq "-fd") {	shift;	$nntp_sock = shift;}die "usage: $0 localfile news:groupname\n  or   $0 localfile news:msgid\n"  if ($#ARGV != 1);$F   = shift;$URL = shift;$host='news';$host=$ENV{'NNTPSERVER'} if ($ENV{'NNTPSERVER'} ne '');$port=119;open (F, ">$F")	|| &mydie("$F: $!\n");if ( $nntp_sock == -1 ) {	local ($sockaddr) = 'S n a4 x8';	local ($name, $aliases, $proto) = getprotobyname('tcp');	local ($connected) = 0;	# Lookup addresses for remote hostname	#	local($w,$x,$y,$z,@thataddrs) = gethostbyname($host);	&mydie("Unknown Host: $host\n")		unless (@thataddrs);        # bind local socket to INADDR_ANY        #	local($thissock) = pack($sockaddr, &AF_INET, 0, "\0\0\0\0");        &mydie("socket: $!\n") unless                socket (NNTP, &AF_INET, &SOCK_STREAM, $proto);        &mydie("bind: $!\n") unless                bind (NNTP, $thissock);	# Try all addresses	#	foreach $thataddr (@thataddrs) {		local ($that) = pack($sockaddr, &AF_INET, $port, $thataddr);		if (connect (NNTP, $that)) {			$connected = 1;			last;		}	}	&mydie("$host:$port: $!\n")		unless ($connected);	$NNTPR = NNTP;	$NNTPW = NNTP;	print STDERR "Connected to $host:$port\n" if ($debug);} else {	&mydie("nntp_sock ($nntp_sock): $!\n") unless		open (NNTPR, "<&$nntp_sock");	&mydie("nntp_sock ($nntp_sock): $!\n") unless		open (NNTPW, ">&$nntp_sock");	$NNTPR = NNTPR;	$NNTPW = NNTPW;	print STDERR "Connected to fd $nntp_sock\n" if ($debug);}select($NNTPW); $| = 1; select(F);if ( $nntp_sock == -1 ) {	$reply = &read_reply($NNTPR);	($code, @stuff) = split (/[ \t\n]+/, $reply);	&mydie("Bad welcome: $reply")		if ($code != 200);}($access, $path) = split (/:/, $URL);&mydie("Not a news URL")	if ($access ne "news");if ($path =~ /\@/) {	&do_msgid ($path);} else {	&do_group ($path);}close F;if ( $nntp_sock == -1 ) {	&send_cmd("QUIT");	close NNTP;}exit(0);# ----- SUBROUTINES --------------------------------------------------sub do_group {	local ($group) = shift @_;	&send_cmd("MODE READER");	$reply = &read_reply($NNTPR);	&send_cmd("GROUP $group");	$reply = &read_reply($NNTPR);	($code, $nmsgs, $min, $max, $realgroup) = split (/\s+/, $reply);	&mydie($reply)		if ($code != 211);	&mydie("No messages")		if ($max == 0);	&send_cmd("XOVER $min-$max");	$reply = &read_reply($NNTPR);	($code, @stuff) = split (/\s+/, $reply);	&mydie($reply)		if ($code != 224);	while (<$NNTPR>) {		alarm ($XFER_TIMEOUT);	# timeout until next read		s/\r//g;		# remove CR		last if (/^\.$/);		print F $_;	}}sub do_msgid {	local ($msgid) = shift @_;	&send_cmd("ARTICLE <$msgid>");	$reply = &read_reply($NNTPR);	($code, $nmsgs, $min, $max, $realgroup) = split (/\s+/, $reply);	&mydie($reply)		if ($code != 220);	while (<$NNTPR>) {		alarm ($XFER_TIMEOUT);	# timeout until next read		s/\r//g;                # remove CR		last if (/^\.$/);		print F $_;	}}sub mydie {	$_ = shift;	print STDERR "$prog: $URL: $_\n";	exit(1);}sub send_cmd {	local ($cmd) = @_;	print STDERR "--> $cmd\n" if ($debug);	print $NNTPW "$cmd\r\n";}sub read_reply {	$reply = <$NNTPR>;	$reply =~ s/[\r\n]//g;	print STDERR "<-- $reply\n" if ($debug);	return $reply;}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -