📄 ch07.htm

📁 Web_Programming_with_Perl5,一个不错的Perl语言教程。
💻 HTM
📖 第 1 页 / 共 5 页
字号:
seen in previous chapters, Perl is a perfect language for text manipulation and searching.



It is very efficient in processing files, which, combined with its powerful regular



expression capability, make it a perfect language for this type of work.



<H4 ALIGN="CENTER"><A NAME="Heading31"></A><FONT COLOR="#000077">Introduction</FONT></H4>



<P>This example shows you how to provide a search engine into your own Web site.



The front end is a simple form with a text field, a Submit button, and a Reset button.



The back end recurses through your Web site's directories, scanning the HTML files



for the existence of the specified string. The resulting page will contain either



a message that no items have been found, or it will display a list of navigable links



to those pages that match the search criteria.



<H4 ALIGN="CENTER"><A NAME="Heading32"></A><FONT COLOR="#000077">Defining the Search



Scope</FONT></H4>



<P>The form for this example is a simple one. Using <TT>CGI::Form</TT>, Listing 7.8



contains the code.



<H3 ALIGN="CENTER"><A NAME="Heading33"></A><FONT COLOR="#000077">Listing 7.8. Subroutine



to return a search form.</FONT></H3>



<PRE><FONT COLOR="#0066FF">sub searchForm {



   my($q)=@_;



   print $q-&gt;header;



   print $q-&gt;start_html(&quot;Search My Site&quot;);



   print &quot;&lt;H1&gt;Search My Site&lt;/H1&gt;\n&lt;HR&gt;\n&quot;;



   print &quot;&lt;P&gt;Please enter one or more words to search for&quot;;



   print &quot; and click `Search'&lt;BR&gt;\n&quot;;



   print $q-&gt;start_multipart_form();



   print $q-&gt;textfield(-name=&gt;`SearchString',-maxlength=&gt;100,-size=&gt;40);



   print &quot;&lt;BR&gt;&lt;BR&gt;&lt;BR&gt;&quot;;



   print $q-&gt;submit(-name=&gt;`Action',-value=&gt;`Search');



   print &quot; &quot;;



   print $q-&gt;reset();



   print $q-&gt;endform();



   print $q-&gt;end_html();



}



</FONT></PRE>



<P>This form appears in your browser as shown in Figure 7.5. <BR>



<BR>



<A HREF="08wpp05.jpg" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/08wpp05.jpg"><TT><B>Figure 7.5.</B></TT></A><TT> </TT>The search



form as it appears in your browser. <BR>



<BR>



When the user clicks Search, the real work begins. In this example, you will search



the entire site, but depending on the size of your site, you might want to limit



the search scope by adding another field to your form. This can be accomplished by



using a pull-down menu or a group of radio buttons.



<H4 ALIGN="CENTER"><A NAME="Heading34"></A><FONT COLOR="#000077">The Power of Perl



in Text File Processing</FONT></H4>



<P>Now that you have the front end, it's time to write the search engine itself.



Use the <TT>File::Find</TT> library, available in the Perl distribution. This library



does all of the directory scanning for you, leaving you to simply implement the scanning



algorithm. This scanning algorithm searches for each word, keeping a count of occurrences



of each word. When it comes time to display the search results, you can display them



in the order of occurrences, which will give the user the most likely page they are



looking for right at the top. This concept should not be entirely new to you if you



have visited one of the popular search sites on the Web.</P>



<P>Assuming you have extracted the list of words to search for, you'll simply write



a function that accepts a word list as an argument, along with the file to scan.



Let's leave it up to the <TT>File::Find</TT> module to pass you the files, as shown



in Listing 7.9.



<H3 ALIGN="CENTER"><A NAME="Heading35"></A><FONT COLOR="#000077">Listing 7.9. Subroutine



to search for a list of words.</FONT></H3>



<PRE><FONT COLOR="#0066FF">sub wanted {



   # This line gets rid of all Unix-type hidden files/directories.



   return if $File::Find::name=~/\/\./;



   # Only look at HTML files.



   if ($File::Find::name=~/^.*\.html$/) {



      if (!open(IN, &quot;&lt; $File::Find::name&quot;)) {



         # This error message will appear in your error_log file.



         warn &quot;Cannot open file: $File::Find::name...$!\n&quot;;



         return;



      }



      my(@lines)=&lt;IN&gt;;



      close(IN);



      my($count)=0;



      foreach (@words) {



         # Make the search case-insensitive.



         $word=&quot;(?i)$_&quot;;



         $count+=grep(/$word/,@lines);



      }



      if ($count&gt;0) {



         # Add this page to the list of found items.



         push(@foundList,&quot;$File::Find::name&quot;);



         # Store the hit count in an associate array



         # with the page as the key.



         $hitCounts{&quot;$File::Find::name&quot;}=$count;



      }



   }



}



</FONT></PRE>







<DL>



	<DT><FONT COLOR="#0066FF"></FONT></DT>



</DL>







<H3 ALIGN="CENTER">



<HR WIDTH="82%">



<FONT COLOR="#0066FF"><BR>



</FONT><FONT COLOR="#000077">NOTE:</FONT></H3>











<BLOCKQUOTE>



	<P>If you are running on a UNIX system where the <TT>egrep</TT> command is available,



	you should consider replacing the majority of this Perl code with a call to <TT>egrep</TT>,



	as follows:</P>



	<PRE><FONT COLOR="#0066FF">@hitList=`egrep -ci `(word1|word2|word3)' $File::Find::name`;</FONT></PRE>







</BLOCKQUOTE>







<PRE><FONT COLOR="#0066FF"></FONT></PRE>











<BLOCKQUOTE>



	<P>This would be more efficient in terms of memory requirements and processor use.<BR>



	



<HR>











</BLOCKQUOTE>







<P><TT>File::Find</TT> contains a function called <TT>finddepth()</TT>, which takes



at least two arguments: a filter function and one or more directory names to recurse.



The filter function you are using is the one above called <TT>wanted()</TT>. <TT>finddepth()</TT>calls



<TT>wanted()</TT> for each file that it comes across. The filename is contained in



the variable <TT>$_</TT>. The file path is contained in the variable <TT>$File::Find::dir</TT>.



You have used the variable <TT>$File::Find::name</TT>, which is the combination of



the other two variables, with a path separator stuck in between. By using the functionality



provided by <TT>File::Find</TT>, all you need to do is add in your search filter



and not worry about recursion and figuring out what's a file and what's a directory.</P>



<P>The code used to initiate the search looks like this:</P>



<PRE><FONT COLOR="#0066FF">@words=split(/ /,$q-&gt;param(`SearchString'));



if (@words&gt;0) {



   finddepth(\&amp;wanted,&quot;/user/bdeng/Web/docs&quot;);



}



</FONT></PRE>



<P>It's probably a good idea to check the <TT>@words</TT> array so that it contains



at least one value. No need to make <TT>finddepth()</TT> do all that work if you



have nothing to search for. In this particular case, you might emit some HTML that



politely reminds the user to specify something to search for.



<H4 ALIGN="CENTER"><A NAME="Heading37"></A><FONT COLOR="#000077">Displaying the Results</FONT></H4>



<P>All you need to do now is display the results in a meaningful format. What you're



aiming for is an ordered list of likely candidates for what the user is trying to



find. You have an array of pages and an associative array of hit counts. What you



need first is a sort routine to rearrange the array in the correct order. The following



sort routine should work just fine:</P>



<PRE><FONT COLOR="#0066FF">@foundList = sort sortByHitCount @foundList;







sub sortByHitCount {



return $hitCounts{$b}- $hitCounts{$a};



}



</FONT></PRE>



<P>The first line in this code is the call to <TT>sort()</TT>, using the subroutine



<TT>sortByHitCount()</TT>. The <TT>$a</TT> and <TT>$b</TT> variables are package



global variables that <TT>sort()</TT> uses to tell the sorting routine which items



to compare. The items that you're comparing in this case are filenames that are keys



into the <TT>hitCounts</TT> associative array. Returning a negative value indicates



that <TT>$a</TT> is less than <TT>$b</TT>, and returning a positive value indicates



<TT>$a</TT> is greater than <TT>$b</TT>. Returning <TT>0</TT> indicates that the



two values are equal. What you are actually comparing in <TT>sortByHitCount()</TT>



is the hit count of each page.







<DL>



	<DT></DT>



</DL>







<H3 ALIGN="CENTER">



<HR WIDTH="83%">



<BR>



<FONT COLOR="#000077">NOTE:</FONT></H3>











<BLOCKQUOTE>



	<P>Remember that in the previous example, the <TT>%hitCounts</TT> associate array



	must be within the scope of the <TT>sortByHitCount</TT> function. It would be a very



	difficult problem to debug if you decided to move the <TT>sortByHitCount</TT> into



	a different package scope one day.<BR>



	



<HR>











</BLOCKQUOTE>







<P>Now you have a sorted list of files that need converting to URLs. To do this,



you simply chop off the first n characters, where n is the length of the <TT>$serverRoot</TT>



variable. This can be done with the following line:</P>



<PRE><FONT COLOR="#0066FF">$url=substr($file,length($serverRoot));



</FONT></PRE>



<P>You can now format the string as a link by adding the <TT>&lt;A&gt;</TT> tag around



the <TT>$url</TT>. The final main code appears in Listing 7.10.



<H3 ALIGN="CENTER"><A NAME="Heading39"></A><FONT COLOR="#000077">Listing 7.10. A



simple CGI searching program.</FONT></H3>



<PRE><FONT COLOR="#0066FF">#!/public/bin/perl5



use CGI::Form;



use File::Find;







# Variables for storing the search criteria/results.



@words;



@foundList;



%hitCounts;







$q = new CGI::Form;



$serverRoot=&quot;/user/bdeng/Web/docs&quot;;







if ($q-&gt;cgi-&gt;var(`REQUEST_METHOD') eq `GET') {



   &amp;searchForm($q);



} else {



   @words=split(/ /,$q-&gt;param(`SearchString'));



   print $q-&gt;header;



   print $q-&gt;start_html(&quot;Search Results&quot;);



   print &quot;&lt;H1&gt;Search Results&lt;/H1&gt;\n&lt;HR&gt;\n&quot;;



   if (@words&gt;0) {



      finddepth(\&amp;wanted,$serverRoot);



      @foundList = sort sortByHitCount @foundList;



      if (@foundList&gt;0) {



         foreach $file (@foundList) {
💿 文件大小 912 K
👤 上传用户 zhoubin2048
📂 所属分类其他书籍
🏷️ 相关标签

#Web_Programming_with_Perl #Perl #语言教程
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -