⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch11.htm

📁 CGI programming is the hottest stuff to look out for in this book
💻 HTM
📖 第 1 页 / 共 3 页
字号:
<INPUT TYPE=image SRC="/Images/WP/retwp.gif" NAME=goback
BORDER=0&gt;<BR>
&lt;/FORM&gt;<BR>
&lt;/BODY&gt;<BR>
&lt;/HTML&gt;<BR>
EOM<BR>
&nbsp;&nbsp;&nbsp;&nbsp;} else {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print &lt;&lt;EOM;
<BR>
&lt;HTML&gt;<BR>
&lt;HEAD&gt;<BR>
&lt;TITLE&gt;Incorrect email address&lt;/TITLE&gt;<BR>
&lt;!-- (c) Esoterica 1996, amcf@esoterica.pt --&gt;<BR>
&lt;/HEAD&gt;<BR>
&lt;BODY BACKGROUND=&quot;$pathBackground&quot;&gt;<BR>
&lt;H1 ALIGN=center&gt;Incorrect email address&lt;/H1&gt;<BR>
&lt;P&gt;<BR>
&lt;FORM ACTION=&quot;$url&quot; METHOD=post&gt;<BR>
The email you entered is incorrect. Please try again.<BR>
&lt;P&gt;<BR>
&lt;INPUT TYPE=image SRC=&quot;/Images/WP/retwp.gif&quot; NAME=goback
BORDER=0&gt;<BR>
&lt;/FORM&gt;<BR>
&lt;/BODY&gt;<BR>
&lt;/HTML&gt;<BR>
EOM<BR>
&nbsp;&nbsp;&nbsp;&nbsp;}<BR>
}<BR>
<BR>
##### Search on the email address list with the key given #####
<BR>
sub Search {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;$search_key = $input{'key'};<BR>
&nbsp;&nbsp;&nbsp;&nbsp;if ($search_key eq '') {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@final_list =
(&quot;The key must contain at least one character!&quot;);<BR>
&nbsp;&nbsp;&nbsp;&nbsp;} else {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$search_key =~
tr/A-Z/a-z/;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#
Convert to lower case<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@key = split(&quot;
&quot;,$search_key);<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@initial_list
= `$cat $email_list | $tr 'A-Z' 'a-z'`;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@final_list =
();<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;foreach $i (0
.. $#initial_list) {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if
(index($initial_list[$i],$key[0])&gt;=0 &amp;&amp; &Acirc;index($initial_list[$i],$key[1])&gt;=0)
{<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$initial_list[$i]
=~ s/&lt;/&amp;lt;/g;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$initial_list[$i]
=~ s/&gt;/&amp;gt;/g;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$initial_list[$i]
=~ s/\n/&lt;BR&gt;\n/g;<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;push(@final_list,$initial_list[$i]);
<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;&nbsp;&nbsp;if ($#final_list == -1) {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@final_list =
(&quot;There isn't any email address corresponding to the key
you &Acirc;gave!&quot;);<BR>
&nbsp;&nbsp;&nbsp;&nbsp;}<BR>
&nbsp;&nbsp;&nbsp;&nbsp;print &lt;&lt;EOM;<BR>
&lt;HTML&gt;<BR>
&lt;HEAD&gt;<BR>
&lt;TITLE&gt;Results of the White Pages database search&lt;/TITLE&gt;
<BR>
&lt;!-- (c) Esoterica 1996, amcf@esoterica.pt --&gt;<BR>
&lt;/HEAD&gt;<BR>
&lt;BODY BACKGROUND=&quot;$pathBackground&quot;&gt;<BR>
&lt;H1 ALIGN=center&gt;Results of the White Pages database search&lt;/H1&gt;
<BR>
&lt;P&gt;<BR>
&lt;FORM ACTION=&quot;$url&quot; METHOD=post&gt;<BR>
&lt;B&gt;Search for:&lt;/B&gt; $search_key<BR>
&lt;P&gt;<BR>
&lt;B&gt;Results:&lt;/B&gt;<BR>
&lt;HR&gt;<BR>
@final_list<BR>
&lt;HR&gt;<BR>
&lt;INPUT TYPE=image SRC=&quot;/Images/WP/retwp.gif&quot; NAME=goback
BORDER=0&gt;<BR>
&lt;/FORM&gt;<BR>
&lt;/BODY&gt;<BR>
&lt;/HTML&gt;<BR>
EOM<BR>
}<BR>
<BR>
##### Shows help page #####<BR>
sub Help {<BR>
&nbsp;&nbsp;&nbsp;&nbsp;print &lt;&lt;EOM;<BR>
&lt;HTML&gt;<BR>
&lt;HEAD&gt;<BR>
&lt;TITLE&gt;White Pages - Help&lt;/TITLE&gt;<BR>
&lt;!-- (c) Esoterica 1996, amcf@esoterica.pt --&gt;<BR>
&lt;/HEAD&gt;<BR>
&lt;BODY BACKGROUND=&quot;$pathBackground&quot;&gt;<BR>
&lt;H1 ALIGN=center&gt;White Pages&lt;/H1&gt;<BR>
&lt;H2 ALIGN=center&gt;&lt;I&gt;Help&lt;/I&gt;&lt;/H2&gt;<BR>
&lt;P&gt;<BR>
&lt;FORM ACTION=&quot;$url&quot; METHOD=post&gt;<BR>
&lt;UL&gt;<BR>
&lt;LI&gt;&lt;B&gt;What is an electronic White Page's centre?&lt;/B&gt;&lt;BR&gt;
<BR>
It's a list of electronic mail addresses in the Internet.<BR>
&lt;P&gt;<BR>
&lt;LI&gt;&lt;B&gt;How does search work?&lt;/B&gt;&lt;BR&gt;<BR>
The list of email addresses contains the real name of people on
the<BR>
Internet, along with their email address. You can enter up to
two<BR>
words for the program to search on the list and to retrieve documents
<BR>
that contain both words.<BR>
&lt;/UL&gt;<BR>
&lt;P&gt;<BR>
&lt;INPUT TYPE=image SRC=&quot;/Images/WP/retwp.gif&quot; NAME=goback
BORDER=0&gt;<BR>
&lt;/FORM&gt;<BR>
&lt;/BODY&gt;<BR>
&lt;/HTML&gt;<BR>
EOM<BR>
}</FONT></TT>
</BLOCKQUOTE>
<HR>
<P>
The script also offers the possibility to add e-mail addresses
to the database. The e-mail address database is in reality a plain
text file containing e-mail addresses, one per line. Other search
engines have more complex databases.
<P>
Every search engine-and the White Pages database is a simple one-must
have the search form but also some way to gather information.
In the White Pages database, this is done with a form for adding
e-mail addresses but also by using newsgroups in order to check
for new e-mail addresses. Lots of people use newsgroups and send
posts. Each post contains the address of the sender in the From:
line. Thus, if we manage to build a program that can sequentially
browse all posts and catch the From: line information, we can
rapidly build a good e-mail address list. In order to do this,
you should have access to a news server or have the possibility
to copy posts to your server, using a good news reader. The White
Pages database main program is a Perl script, but we have developed
a small shell script that gathers information on newsgroups. It
is presented later in this chapter (see Listing 11.2) and presumes
you have access to a news server spool saved on a local disk (the
script uses only the soc.culture.* hierarchy for performance reasons).
<P>
The main Perl script is divided into two parts: the add e-mail
function and the search function. When it starts for the first
time, the <TT><FONT FACE="Courier">GET</FONT></TT> method is used,
and the initial form is displayed. See Figure 11.2 for the White
Pages main form. On other queries (e-mail addition or help request),
the <TT><FONT FACE="Courier">POST</FONT></TT> method is used.
<P>
<A HREF="f11-2.gif" ><B>Figure 11.2:</B> <I>The initial White Page screen</I>.</A>
<P>
A user can enter one or two search keys (if there are more than
that, they are simply ignored at the moment), and the search will
return values containing all search keys (either one or two).
Uppercase letters in the search key are converted to lowercase
in order for comparison in the list of e-mail addresses to be
case insensitive:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">$search_key =~ tr/A-Z/a-z/; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
# Convert to lower case</FONT></TT>
</BLOCKQUOTE>
<P>
A result page is shown in Figure 11.3.
<P>
<A HREF="f11-3.gif" ><B>Figure 11.3:</B><I> The results from a White Pages search on &quot;astley.&quot;</I></A>
<P>
The e-mail addition form lets users enter their own e-mail addresses
and include them on the list. When adding an e-mail address to
the database, the application verifies if the address is in the
correct form (that is, there is an <TT><FONT FACE="Courier">@</FONT></TT>
symbol somewhere).
<BLOCKQUOTE>
<TT><FONT FACE="Courier">if ( index($input{'email'},'@') &gt;=
1 ) {</FONT></TT>
</BLOCKQUOTE>
<P>
Listing 11.2 shows the newsgroups e-mail address gatherer.
<HR>
<BLOCKQUOTE>
<B>Listing 11.2. The newsgroups e-mail address gatherer (shell
script) using the soc.culture.* hierarchy on a news server spool
directory.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">#!/bin/sh<BR>
#<BR>
# amcf@esoterica.pt, 1996<BR>
#<BR>
for f in `find /usr/spool/news/soc/culture -depth -type f`<BR>
do<BR>
&nbsp;&nbsp;&nbsp;grep &quot;From:&quot; $f 2&gt; /dev/null &gt;&gt;
from.list<BR>
done<BR>
cut -f2 -d: from.list &gt;&gt; email.list<BR>
cat email.list | sort -b | uniq &gt; email.list.tmp<BR>
mv email.list.tmp email.list<BR>
rm from.list</FONT></TT>
</BLOCKQUOTE>
<HR>
<P>
The email.list file should be kept on a directory of your Web
server so that the search script can access it.
<H2><A NAME="FutureImprovements"><FONT SIZE=5 COLOR=#FF0000>Future
Improvements</FONT></A></H2>
<P>
The White Pages database could be improved in several ways:
<UL>
<LI><FONT COLOR=#000000>Addition of a description phrase to each
e-mail address, along with other information, such as workplace,
country (most of the time it can be guessed from the top domains),
and so on.</FONT>
<LI><FONT COLOR=#000000>Improvement of the e-mail addition form
in order to let people submit a photo (indicated by a URL) to
put next to their e-mail address.</FONT>
<LI><FONT COLOR=#000000>Automatic e-mail sent to each user added
to the database to inform him of the addition and eventually check
for bad e-mail addresses (if mail is returned).</FONT>
<LI><FONT COLOR=#000000>Better search form, letting users enter
not only search keys but also Boolean operators, for example,
as in &quot;astley AND NOT bill.&quot;</FONT>
</UL>
<P>
Feel free to use the existing White Pages Perl script code and
improve it to fit your needs.
<P>
As a general information retrieval and organizer system, you can
check out Harvest (<TT><FONT FACE="Courier"><A HREF="http://harvest.cs.colorado.edu/">http://harvest.cs.colorado.edu/</A></FONT></TT>),
a valuable tool that can help you build a database of references
to information on your server or on other servers, and that can
be used as a cache mechanism between client applications and servers
(a Web browser and a Web server, for example).
<P>
Search engines on the Web have existed for some years and are
now indispensable tools for information retrieval. One could not
imagine a manual search of the Web or the Internet for a specific
topic of information in a time where lots of terabytes flow around
the world. As the Web grows, search engines must also grow in
both raw power and search/selection capabilities. More powerful
servers can (and will) be used, but we also expect improvements
on the quality of search algorithms along with improved search
forms (for use of natural language in queries).
<H2><A NAME="Summary"><FONT SIZE=5 COLOR=#FF0000>Summary</FONT></A>
</H2>
<P>
This chapter overviewed major searching engines on the World Wide
Web as well as their respective search and presentation techniques.
As you have seen, most of the work accomplished by these engines
is done with the help of CGI scripts.
<P>
As an example of a simple search engine, we developed the White
Pages database. It allows the maintenance of a list of e-mail
addresses in which you can search for a person by providing a
search key (the person's name, or part of it) introduced in the
White Pages main form.
<P>
<HR WIDTH="100%"></P>

<CENTER><P><A HREF="ch10.htm"><IMG SRC="pc.gif" BORDER=0 HEIGHT=88 WIDTH=140></A><A HREF="#CONTENTS"><IMG SRC="cc.gif" BORDER=0 HEIGHT=88 WIDTH=140></A><A HREF="index.htm"><IMG SRC="hb.gif" BORDER=0 HEIGHT=88 WIDTH=140></A><A HREF="ch12.htm"><IMG 
SRC="nc.gif" BORDER=0 HEIGHT=88 WIDTH=140></A></P></CENTER>

<P>
<HR WIDTH="100%"></P>

</BODY>
</HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -