⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 soif-mem-efficient.pl

📁 harvest是一个下载html网页得机器人
💻 PL
字号:
#-*-perl-*-##  soif.pl - Processing for the SOIF format.##  Darren Hardy, hardy@cs.colorado.edu, January 1995##  soif-mem-efficient.pl,v 1.2 1995/09/05 20:18:15 hardy Exp#########################################################################  Memory efficient modifications by D. Wessels  6/7/95.##  This version runs about 20% slower, but parses the SOIF one#  one attribute at a time.  It doesn't use assoc arrays which#  can use up large amounts of VM.##  For backwards compatibility, &soif'parse() is implemented using#  the two new routines &soif'get_object() and &soif'get_av().#########################################################################  Usage:##    require 'soif-mem-efficient.pl';##    # BACKWARDS COMPATIBLE (not efficient):#    $soif'input = 'WHATEVER'; 	# defaults to STDIN#    ($ttype, $url, %SOIF) = &soif'parse();##    # MEMORY EFFICIENT:#    $soif'input = 'WHATEVER'; 	# defaults to STDIN#    ($ttype, $url) = &soif'get_object();#    while (($att,$val) = &soif'get_av()) {#        ...#    }##########################################################################  Copyright (c) 1994, 1995.  All rights reserved.##    The Harvest software was developed by the Internet Research Task#    Force Research Group on Resource Discovery (IRTF-RD):##          Mic Bowman of Transarc Corporation.#          Peter Danzig of the University of Southern California.#          Darren R. Hardy of the University of Colorado at Boulder.#          Udi Manber of the University of Arizona.#          Michael F. Schwartz of the University of Colorado at Boulder.#          Duane Wessels of the University of Colorado at Boulder.##    This copyright notice applies to software in the Harvest#    ``src/'' directory only.  Users should consult the individual#    copyright notices in the ``components/'' subdirectories for#    copyright information about other software bundled with the#    Harvest source code distribution.##  TERMS OF USE##    The Harvest software may be used and re-distributed without#    charge, provided that the software origin and research team are#    cited in any use of the system.  Most commonly this is#    accomplished by including a link to the Harvest Home Page#    (http://harvest.cs.colorado.edu/) from the query page of any#    Broker you deploy, as well as in the query result pages.  These#    links are generated automatically by the standard Broker#    software distribution.##    The Harvest software is provided ``as is'', without express or#    implied warranty, and with no support nor obligation to assist#    in its use, correction, modification or enhancement.  We assume#    no liability with respect to the infringement of copyrights,#    trade secrets, or any patents, and are not responsible for#    consequential damages.  Proper use of the Harvest software is#    entirely the responsibility of the user.##  DERIVATIVE WORKS##    Users may make derivative works from the Harvest software, subject#    to the following constraints:##      - You must include the above copyright notice and these#        accompanying paragraphs in all forms of derivative works,#        and any documentation and other materials related to such#        distribution and use acknowledge that the software was#        developed at the above institutions.##      - You must notify IRTF-RD regarding your distribution of#        the derivative work.##      - You must clearly notify users that your are distributing#        a modified version and not the original Harvest software.##      - Any derivative product is also subject to these copyright#        and use restrictions.##    Note that the Harvest software is NOT in the public domain.  We#    retain copyright, as specified above.##  HISTORY OF FREE SOFTWARE STATUS##    Originally we required sites to license the software in cases#    where they were going to build commercial products/services#    around Harvest.  In June 1995 we changed this policy.  We now#    allow people to use the core Harvest software (the code found in#    the Harvest ``src/'' directory) for free.  We made this change#    in the interest of encouraging the widest possible deployment of#    the technology.  The Harvest software is really a reference#    implementation of a set of protocols and formats, some of which#    we intend to standardize.  We encourage commercial#    re-implementations of code complying to this set of standards.###package soif;$soif'debug = 0;$soif'input = 'STDIN';$soif'output = 'STDOUT';$soif'sort_on_output = 0;##  soif'get_av#sub soif'get_av {	local ($attr, $vsize, $value, $end_value);	while (<$soif'input>) {		if (/^\s*([^{]+){(\d+)}:\t(.*\n)/o) {			$attr = $1;			$vsize = $2;			$value = $3;			if (length($value) < $vsize) {				$nleft = $vsize - length($value) + 1;				$end_value = "";				$x = read($soif'input, $end_value, $nleft);				die "Cannot read $nleft bytes: $!"					if ($x != $nleft);				$value .= $end_value;				undef $end_value;			}			chop ($value) if ($value =~ /\n$/);			undef $end_value;			return ($attr,$value);		}		return () if (/^}/o);	}}sub soif'get_object {	while (<$soif'input>) {		print "READING input line: $_\n" if ($soif'debug);		last if (/^\@\S+\s*{\s*\S+\s*$/o);	}	if (/^\@(\S+)\s*{\s*(\S+)\s*$/o) {		return ($1, $2);	# $template_type = $1, $url = $2	}	return ();}##  soif'parse - $soif'input is the file descriptor from which to read SOIF.#  	        Returns an associative array containing the SOIF,#		the template type, and the URL.#sub soif'parse {	print "Inside soif'parse.\n" if ($soif'debug);        return () if (eof($soif'input));       # DW	local($template_type, $url);	local ($attr, $value);	local(%SOIF);	undef %SOIF;	unless (($template_type, $url) = &soif'get_object) {		return ('UNKNOWN', 'UNKNOWN', %SOIF);	}	while (($attr,$value) = &soif'get_av) {		$SOIF{$attr} = $value;	}	return ($template_type, $url, %SOIF);}##  soif'print - $soif'output is the file descriptor to write SOIF.  #sub soif'print {	print "Inside soif'print.\n" if ($soif'debug);	local($template_type, $url, %SOIF) = @_;	# Write SOIF header, body, and trailer	print $soif'output "\@$template_type { $url\n";	if ($soif'sort_on_output) {		foreach $k (sort keys %SOIF) {			&soif'print_av($k, $SOIF{$k});		}	} else {		while (($k, $v) = each %SOIF) {			&soif'print_av($k, $v);		}	}	print $soif'output "}\n";}sub soif'print_av {	local($k, $v) = @_;	my $len = length($v);	return if ($len < 1);	print $soif'output "$k\{$len\}:\t$v\n";}# for backwards compatiablity#sub soif'print_item {#	local($k, $v) = @_;#	&soif'print_av($k, $v);#}1;

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -