⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pls_search

📁 harvest是一个下载html网页得机器人
💻
📖 第 1 页 / 共 2 页
字号:
: # *-*-perl-*-*        eval 'exec perl -S $0 "$@"'        if $running_under_some_shell;  ################################################################################   pls_search - Based on iquery.pl/iopcode.pl, but produces clean hitlist##   Modified by Darren Hardy, University of Colorado, July 1995#   Usage: pls_search dbname dbgroup plsroot [options] query##   pls_search,v 1.1 1995/07/26 18:22:36 hardy Exp##   Personal Library Software 2/8/95#   This work is protected by US Copyright Law and contains proprietary and#   confidential trade secrets.  PLS (c) Copyright 1995 by PLS,#   All Rights Reserved###   ***  iquery.pl ***##   Function:   Processes query and gets hitlist via d*query.pl ##   Created by: W. Terry Hardgrave##   Modified:   95/03/08    Naval Deshbandhu#                   Deleted code for function calcWaitTime#                   similar functionality implemented via #                   getConfigVal in config.pl file. #               3/16/95     Herman Vandermolen#                   Don't pass password around any longer.##   Notes:##############################################################################$debug = 0;&usage() if ($#ARGV < 3);sub usage {	print "Usage: pls_search dbname dbgroup plsroot [options] query\n";	exit 1;}$dbname = shift(@ARGV);$dbgroup = shift(@ARGV);$plsroot = shift(@ARGV);die "$plsroot: $!\n" if (! -d $plsroot);$query = shift(@ARGV);do "$plsroot/cgi-bin/global.pl";eval '&set_global_vars';#$dbgroup = "Harvest";#$dbname = "ChurchTest";#$query = "colorado";# fake it like we're iopcode.pl$smode = 0;$dbstring = "${dbname}:";$account = "_free_user_";$platmode = "unix";$waitval = 600;&query();exit 0;################################################################################ Name:         query# Description:  Launch the query script and process results.# Created:      ?# Changed:      2/15/95     Saad Mufti#                   Launch the dtquery script instead of the drquery script,#                   since the drquery script does an unnecessary (at this point)#                   socket communication.#               2/16/95     Bryan Slavin#                   Adds default port of 80 if one was not provided in the#                   Database.tab#               2/22/95     Herman Vandermolen#                   Add error checking, in particular: look for [Error] and#                   [Empty] tokens.#               2/23/95     Herman Vandermolen#                   Add parameter to call to dtquery.pl to handle concept#                   searches.#               2/27/95     Herman Vandermolen#                   Don't erase lines starting with '[' anymore.#               3/6/95      Herman Vandermolen#                   Improve calculation of price per KC.#                   Improve calculation of number of search results.#               3/4/95      Saad Mufti#                   Don't do any special processing for "remote"#                   documents. These are now handled at a lower layer.#               3/10/95     Herman Vandermolen#                   Add version # argument.#               3/17/95     Naval Deshbandhu#                   Modified code that scans the output received from dtquery.pl#               3/22/95     Herman Vandermolen#                   Fold TimeOut error message into Fail= format.#               3/23/95     Herman Vandermolen#                   Add icon to error messages to make them stand out.#               3/29/95     Herman Vandermolen#                   Make display of price column configurable. Set #                   $bPriceColumn to 1 if you want a price column in#                   the hit list, otherwise, leave it at 0.#               3/30/95     Herman Vandermolen#                   Remove leading whitespace from hitlines.#               3/30/95     Saad Mufti#                   Set name of price file here instead of in global.pl,#                   since we can't rely on the dbgroup variable being set#                   before we call global.pl#               3/30/95     Saad Mufti#                   Set name of bill file here instead of in global.pl,#                   since we can't rely on the dbgroup variable being set#                   before we call global.pl#               4/4/95      Herman Vandermolen#                   Check for empty hitline after URL processing. This is#                   getting really ugly.#               4/19/95     Bryan Slavin#                   Optimize check for error lines. Combined two loops#                   into one. #               4/21/95     Bryan Slavin#                   Squeeze all tabs into spaces for correct formatting of#                   headlines in hitlist.###############################################################################sub query{     local ($_, @lines, $rel, $hl, $title, $querydb, $exec);    local($remoteDocUrl) = "";  # the URL for the remote document if a hit                                # corresponds to a remote document#    $title  = "PLWeb Query Results";#    print "<title>", $title, "</title>\n";    #    #   Set Global (e.g. Heading) Variables    #    $bill_dbgroup = $dbgroup;                           # Used in global.pl !!!    $billfile  = "../DBGROUPS/$dbgroup/billfile.txt";    $pricetab  = "../DBGROUPS/$dbgroup/dbprice.tab";        #    # Shortcut obvious errors.  Taken from PLServer 1.2 2/11/95 LHC.    #    if ($dbstring eq "") {        print "No Database Has Been Selected\n\n";        exit;    }    elsif ("$query" eq "") {        print "No Query Has Been Entered\n\n";        exit;    }#debug print "<HR> Original query is: $query <HR> \n";    $querydb = "$perlpath $cgipath/dtquery.pl";      #    #  The following variables need to be derived    #    $list_dbal = $dbstring;    $list_dbid = $dbstring;    $node      = $web_node;    #    # Invoke "remote" query handler (hands off query to local or remote site).    #    $exec = "$querydb $version_id $account $waitval $dbgroup $dbgroup $smode $list_dbal $list_dbid \"$query\"";    print STDERR "Running $exec\n" if ($debug);    #debug print "iquery.pl: EXEC: $exec \n";    #debug exit(0);    @qlines = `$exec`;    # take off first two lines which are HTML header info we're not interested in    shift @qlines;    shift @qlines;    #    #   Check for errors. Format of errors is:    #   \026Fail=dbname=>reason    #    $numresults = 0;                    # Number of normal output lines    ($failure, $dbname, $reason) = ();    for $errline (@qlines)  {        if ($errline =~ /^\s*\026/)   {            ($failure, $dbname, $reason) = split(/=/, $errline);            $reason = substr($reason, 1);            if ($errline =~ /\026Fail/) {                $dbname =~ s/:$//;                $dbname =~ s/:/ & /;                print "Error encountered searching \"$dbname\". $reason.\n";            }        }        else {            $numresults++;        }    }    #    # Handle 0, 1, or more hits.    #    if ($numresults == 0) {        print "pls_query: Search produced zero hits.\n";        exit(0);    }        #    #   Set $bPriceColumn to 1 if you want a price column    #    $bPriceColumn = 0;    #    #   MAXIUMUM TITLE LENGTH :  Set to the maximum length you want the title to be    #    $maxTitleLen = 256;    #    # create query string for HTTP transfer    #    eval '$tquery = &transferQuery($query)';    eval '$tdbname = &transferQuery($dbstring)';    #    # Initialize billing    #    $numHitLines = 0;      $numCharTrans = 0;         #    #   HV 950227: change irdbt.pl to drdbt.pl.    #    do "drdbt.pl";    @lines = `cat $pricetab`;    eval '&rdbtab';    #    # Turn each raw results-info line into glistening output.    #    $index = 0;    foreach $_ (@qlines) {        if (/^\s*\026/) {            next;        }                ($raw, $norm, $xnode, $xdbgroup, $dbalias, $dbid, $docid, $chCount,          $price, $ret_query, $hl) = split(/\t/, $_, 11);                 ($xnode, $port) = split(/:/, $xnode);        $port = 80 unless $port;        $xnode = join(':',$xnode,$port);        #        # Set whether raw or normalized relevance is displayed.  MAY CHANGE        # FOR DISTRIBUTED SCORE MERGING. LHC 2/12/95        #        $relevance = $raw;                 #        # Sanitize the hit information.        #        eval '&prepHitInfo';                 $remoteDocUrl = "";        # the scheme for remote documents works by havinga surrogate file that        # contains the following special markup that contains the URL of the        # remote document. We recognize this markup and extract the URL for the        # remote document from it (this URL was specified at indexing time        # because the document was indexed using pladdsur.pl        if ($hl =~ /\<PLS\: URL\>\<A HREF=\"([^\"]*)\"\>/) {            $remoteDocUrl = $1;        }        #        # Weed out html from hitline        #        $html = 0;        if ($hl =~ s/<[^<]*>//g){            $html = 1;        }        if ($hl =~ /^\s*$/) {            $hl = "** UNTITLED **";        }        #        #   Remove leading whitespace in the hitline. Squeeze tabs to spaces        #        $hl =~ s/^\s+(.*)/$1/;        $hl =~ s/\t/ /g;        # NOTE: SKIPPING PRICING FOR REMOTE DOCUMENTS!!        if (! $remoteDocUrl) {            #            #  resolve price            #            if (($price eq "NONE") || ($price eq "")) {                $price = "";                $size = $chCount;                eval '&gen_price';            }        }        #        #   Need to format price to ensure number of decimals        #        if ($price < 0) {            $price = 0.0;

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -