⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 classic.cf

📁 harvest是一个下载html网页得机器人
💻 CF
字号:
# $Id: classic.cf,v 2.5 2002/07/30 19:32:40 sxw Exp $## Definitions for '{nph-}search.cgi' configuration.## A hash (#) in the first column denotes a comment and is not processed# even inside a definition.  Hashes not in the first column are left as-is.## Variable substitution occurs on these definitions.  If you want# a dollar-sign ($) to occur in the output, escape it with backslash (\).# Other metacharacters (quotes, asterisks, etc) probably don't need to be# escaped.  Printf-like special characters are allowed: \n \t \r etc.## The ending newline is chopped off from each definiton.  So##   <FooBar>#   abcxyz#   </FooBar>## becomes "abcxyz".  You may have blank lines in between the beginning# and ending tags, or use the newline character \n.### CGI defaults.## In this section you can give default values for any attribute that would# normally be passed in from the CGI query form.  The following attributes# are currently available:## brokerqueryconfig : Name of a broker specific configuration file# lifetime          : Maximum lifetime of a query (see also <Lifetime> section)# caseflag          : Whether the query is casesensitive ["on" or "off"]# wordflag          : Match on word boundaries ["on" or "off"]# opaqueflag        : Show matched lines ["on" or "off"]# descflag          : Return object description ["on" or "off"]# noregexflag       : Don't do regular expressions ["on" or "off"]# maxresultflag     : Maximum number of results to return# maxobjflag        : Maximum number of objects to return# maxlineflag       : Maximum number of matched lines# weightflag        : show weight of hit# csumflag          : show link to indexing data# errorflag         : Number of spelling errors (therefore it's not really a flag!)# broker            : Name of broker# host              : Name of broker host# attribute         : Space sperated list of attributes to return# sort              : Sort options ["by-rank"]# hp_url            : Url of search page# These defaults mimic the values on the provided query.html form.<Default>caseflag      : offwordflag      : offopaqueflag    : offdescflag      : offnoregexpflag  : offweightflag    : offcsumflag      : offerrorflag     : 0maxobjflag    : 500maxlineflag   : 30maxresultflag : 3000perpageflag   : 0sort          : by-rankattribute     :</Default># 'GLOBAL' VARIABLES: are defined globally within the {nph-}search.cgi program,#                     but not necessarily set at all times.## $query	the user query string# $html_query	the user query string, special HTML characters escaped# $bquery	the query string sent to the broker# $host		the broker hostname# $port		the broker port# $hp_url	the URL of the broker query (home) page# $maxresult	the maximum number of matched lines the broker will return# $nobjects	a running count of the number of objects returned# $nopaquelines a running count of the number of opaque (matched) lines## definitions here can be output in other definitions.  Each definition# here is placed into the %CFG associative array.  For example,# to print out the Timeout defined below, write $CFG{'Timeout'}.## The URL to the Harvest Project home page.#<HarvestUrl>http://harvest.sourceforge.net/</HarvestUrl># The amount of time to wait for the broker results.# Can use aritmetic # here also, eg: 5*60## We use a conservative 5 minutes here.#<Timeout>300</Timeout># InitFunction can be some perl code which gets eval'd before they# query is sent.  Use this for any special hackings.#<InitFunction>$cs_urlX = $csumflag ? 'Y' : undef;</InitFunction># The MIME content type of the query results.  Should be "text/html"# if we are going to return HTML tags.#<ContentType>text/html</ContentType>############################################################################### RESULT SECTION## The output of '{nph-}search.cgi' consists of the following tags:##    ResultHeader#    CreateNavBars#    ResultSetBegin#    ( Errors from 'broker' )#    foreach object {#        PrintObject#    }#    ResultSetEnd#    EndBrokerResults#    ( TruncateWarning )#    ResultTrailer################################################################################# First output for the HTML containing the results.  Should probably include# <TITLE> tag and the user query.#<ResultHeader><HEAD><TITLE>Search Results for: $html_query</TITLE><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"></HEAD><BODY><H2>Search Results for: $html_query</H2><HR></ResultHeader># Final HTML section for the output.  Should probably contain a pointer# to the broker home-page ($hp_url) and a pointer to the Harvest# home page.#<ResultTrailer><P><?$hp_url><P>Return to the <a HREF="$hp_url">Broker Home Page</a><BR></?$hp_url><ADDRESS><DIV ALIGN="right"><A HREF="$CFG{'HarvestUrl'}">HarvestSearch System</A></DIV></ADDRESS><BR></ResultTrailer># Obtain simple navigation bar#<CreateNavBars>if ($totalPages > 0){  $pageList = "";  for ($i = 1; $i < $page; $i++)  { $pageList .= "<a href=\"".&create_link($i)."\">$i</a>\n"; }  $pageList .= "<b>$page</b>\n";  for ($i = $page+1; $i <= $totalPages; $i++)  { $pageList .= "<a href=\"".&create_link($i)."\">$i</a>\n"; }  $navigationBar = "Page: $pageList<BR>\n<HR>" if ($totalPages > 1);}</CreateNavBars># Output just before beginning the loop over objects returned by the# broker.#<ResultSetBegin>$totnumber objects found. Showing page $page of $totalPages<BR><HR><PRE>\n</ResultSetBegin># output just after ending the loop over objects returned by the broker#<ResultSetEnd></PRE>\n<HR><center>$navigationBar</center><P></ResultSetEnd># EndBrokerResults is printed when the broker results end normally.#<EndBrokerResults>\n<STRONG>$msg</STRONG><BR>\n</EndBrokerResults># FailBrokerResults is printed when the broker results end in error#<FailBrokerResults>\n<STRONG>$msg</STRONG><BR>\n</FailBrokerResults># PER-OBJECT DEFINITIONS## NOTES: In order for us to just use "$opaque" and have it print all the#        matched lines properly, we must be in <PRE> mode.##        Rather than just putting $cs_url in a HREF, we break it up and#        stick the components around the 'DisplayObject' CGI program#        which formats the SOIF into nice-looking HTML.## VARIABLES:##    $url	Object url: http://www.cia.gov:3333/Spies/KGB/secret.html#    $A		URL Access: http#    $H		URL Host  : www.cia.gov:3333#    $P		URL Path  : /Spies/KGB/secret.html#    $D		URL Dir   : /Spies/KGB/#    $F		URL File  : secret.html#    $cs_url	URL to the SOIF object in the broker databse#    $cs_[ahp]	elements of $cs_url as above with $url#    $desc	Description attribute of the matched object#    $opaque	A matched line (or all matched lines in obj-at-a-time mode)#    $usermsg	A user message#    $attributes Requested attributes# This definition is much like the 'standard' output.#<PrintObject>$objectnum <STRONG>$A URL:</STRONG> <A HREF="$url">$F</A><?$H>    <STRONG>host</STRONG>: $H</?$H><?$P>    <STRONG>path</STRONG>: $P</?$P>$description\$attributes\$objectWeight\$opaque<?$cs_urlX>    <B>indexing data:</B> \<A HREF="$cs_a://$cs_h/Harvest/cgi-bin/displaySOIF.cgi?object=$cs_p&query=$html_query">formatted</A> - \<A HREF="$cs_a://$cs_h/Harvest/cgi-bin/displaySOIF.cgi?object=$cs_p&style=plain">plain</A></?$cs_urlX></PrintObject># An alternate way of displaying an object.##<PrintObject>#Object #: $objectnum#filename: <A HREF="$url">$F</A>#    Host: $H#    Path: $D#Indexing: <A HREF="$cs_a://$cs_h/Harvest/cgi-bin/DisplayObject.cgi?object=$cs_p">here</A>#<STRONG>Description: $desc</STRONG>#$opaque##</PrintObject># This definition is eval'd for each opaque (matched) line retruned by# the broker.  It is intended to be used to remove SOIF attributes# and the 'Matched line' string from the output.  See the perl(1) manual# page or book for information on building Perl regular expressions.##  Add this to have URLs in mached lines be links##    s/([a-z]*:\/\/\S*)/<A HREF="\1">\1<\/A>/;#<MatchedLineSub>s/\s{2,}/ /g;s/(^Matched line:)/<STRONG>$1<\/STRONG>/g;s/^.*/    $&/;</MatchedLineSub># PerObjectFunction is eval'd before every object is printed out.#<PerObjectFunction>$description = '';$description = "    <STRONG>Description: $desc</STRONG>\n" if ($desc ne '');$objectWeight = "";if ($maxWeight > 0 && $weightflag){  $objectWeight = "    <strong>weight:</strong> ".int($weight * 100 / $maxWeight)."% (".(int($weight*100)/100)." Pts.)</font>\n";}</PerObjectFunction># PerAttributeFunction is eval'd before the Attributes are printed out.#<PerAttributeFunction># Remove all empty lines$val =~ s/^(\s*)//mg;# Now prepare the object's type and size to display at the description.if ($att eq "last-modification-time" || $att eq "update-time" ) {   local(@date,@months,$month,$minute,$year);   @date = localtime($val);   @months = ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec');   $month = $months[$date[4]];   $minute = $date[1];   $minute = "0$minute" if $minute < 10;   $second = $date[0];   $second = "0$second" if $second < 10;   $year = $date[5]+1900;   $val = "$date[3]-$month-$year $date[2]:$minute:$second";} else {                                       ## other attributes   # insert spaces to format multile lines   $val =~ s/\n(.)/\n     $1/mg;   # limit to 5 (=4+1) lines...   $val =~ s/^((.*\n){4})(.*)\n(.*\n)*/$1$3.../;}</PerAttributeFunction># Format Requested Attributes.  Before this is eval'd, $att and $val# should be set.#<FormatAttribute>    <STRONG>$att:</STRONG> $val\n</FormatAttribute># ======================================================================# ERROR, STATUS and WARNING functions# ======================================================================# The message printed to the browser and logged to the HTTP server log# when the processes is killed.  If the Timeout time is reached, the# process dies from SIGALRM.  The short name of the offending signal# is placed in $sig.#<sigdie>Killed by SIG$sig...\n</sigdie># How to format the object number.  Use a printf format specification# to left/right justify, or whatever.  Do not include quotes around# the format string.#<ObjectNumPrintf>%2d.</ObjectNumPrintf># A warning message printed only when the broker might have truncated# the result set.  Only printed if the number of matched lines equals# the 'maxresultflag' or the number of returned objects equals the# 'maxobjflag' value of the query.html form.#<TruncateWarning><P><STRONG>WARNING: The search results were truncated at $nopaquelinesmatched lines and $nreturned found objects.</STRONG><P>\n</TruncateWarning>#<ObjTruncateWarning>#<P><STRONG>WARNING: The result set may have been truncated at $maxfiles#objects returned.</STRONG>#</ObjTruncateWarning># A warning message printed only when the broker returned 0 results.#<EmptySetWarning><P>Your query either <em>did not match</em> any information in thisBroker, or you <em>may</em> have specified a query that is not supported bythis Broker's search subsystem.  Please refer to the<A HREF="/Harvest/brokers/queryhelp.html">help on formulating queries</A>for further assistance.<P>\n</EmptySetWarning># Error Message returned if there is no query string sent.#<NoQuery>No query information to decode.\n</NoQuery># Message returned if the broker sends back a#      111 - Broker is too heavily loaded# reply.#<BrokerLoad></pre><P>Sorry, the search broker at <STRONG>$host, port $port</STRONG> is currently tooheavily loaded to process your request.<P>Please try again later.<P></BrokerLoad># Error message if broker is not available.#<BrokerDown>Sorry, the Broker at <STRONG>$host, port $port</STRONG> is unavailable.<P>Please try again later.<P>\n</BrokerDown># Message returned if the broker sends back a#      111 - PARSE ERROR# reply.#<ParseError></pre><P>Sorry, your query does not have the proper syntax:<PRE>    $html_query</PRE><P>Please refer to the<A HREF="/Harvest/brokers/queryhelp.html">help in formulating queries</A>for details on the proper syntax.<P>Common syntax mistakes include:<UL><LI><STRONG>Not using quotes around phrases or regular expressions</STRONG>.<BR>For example, the <em>phrase</em>:<CODE>resource discovery</CODE> should be<CODE>"resource discovery"</CODE>.The <em>regular expression</em>:<CODE>res.* disc.*</CODE> should be<CODE>"res.* disc.*"</CODE>.Note that a <em>single dot</em> (``<CODE>.</CODE>'')in a keyword needs quotes (e.g. <CODE>"3.0"</CODE>).<LI><STRONG>Missing punctuation</STRONG>.<BR>Structured queries need a colon and optional parentheses.  For example,<CODE>Type:PostScript</CODE> or, <CODE>(Type : PostScript)</CODE></UL>\n</ParseError>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -