📄 ch11.htm

📁 CGI programming is the hottest stuff to look out for in this book
💻 HTM
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<HTML>

<HEAD>
   <TITLE>Chapter 11 -- Searching and CGI </TITLE>
   <META>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<H1><FONT COLOR=#FF0000>Chapter 11</FONT></H1>
<H1><B><FONT SIZE=5 COLOR=#FF0000>Searching and CGI </FONT></B>
</H1>
<P>
<HR WIDTH="100%"></P>
<P>
<H3 ALIGN=CENTER><FONT COLOR="#000000"><FONT SIZE=+2>CONTENTS<A NAME="CONTENTS"></A>
</FONT></FONT></H3>


<UL>
<LI><A HREF="#SearchingInformationontheWeb" >Searching Information on the Web</A>
<LI><A HREF="#MostImportantSearchEngines" >Most Important Search Engines</A>
<LI><A HREF="#GatheringInformationontheInternet" >Gathering Information on the Internet</A>
<LI><A HREF="#SearchingInterfacesfortheFinalUser" >Searching Interfaces for the Final User</A>
<LI><A HREF="#CGIWorkintheBackground" >CGI Work in the Background</A>
<LI><A HREF="#DevelopingaSimpleCGIforaWhitePage" >Developing a Simple CGI for a White Pages Database</A>
<LI><A HREF="#FutureImprovements" >Future Improvements</A>
<LI><A HREF="#Summary" >Summary</A>
</UL>
<HR>
<P>
This chapter covers major search engines on the World Wide Web.
We'll cover different search techniques and the use of CGI applications
on a search engine. We will not create a complex search engine,
but hope to give you some ideas on the use and importance of CGI
applications on search engines, illustrated by a simple White
Pages application presented at the end of the chapter. A <I>White
Pages</I> database is a list of e-mail addresses. This one was
developed by using the CGI specifications and the Perl language
and has a simple Web interface to let users submit queries.
<H2><A NAME="SearchingInformationontheWeb"><FONT SIZE=5 COLOR=#FF0000>Searching
Information on the Web</FONT></A></H2>
<P>
Exploring the World Wide Web can be an enjoyable task, but can
also become frustrating if your search doesn't reward you with
anything of value after several hours of searching. The Web was
designed to provide easy access to all types of information and,
like the whole Internet, it is also a vast information platform.
Since its creation in 1990, the Web has been growing so quickly
that it has become nearly impossible for one to use it correctly
without specialized tools. These tools have developed over time
and are generally referred to as <I>search engines</I>,<I> </I>which
help users in the organization and retrieval of information.
<H2><A NAME="MostImportantSearchEngines"><FONT SIZE=5 COLOR=#FF0000>Most
Important Search Engines</FONT></A></H2>
<P>
Web search engines appeared a few months after the creation of
the Web itself and were developed to meet the need for information
organization and fast-retrieval.
<P>
Back in October 1993, when there were about 200 known Web servers,
it was possible for a human to have a general idea of what one
could find on the Web. But some months later, the number of known
Web servers increased to 1500 (as of June 1994). Finding information
without any help was starting to become difficult. Search engines
started appearing as one natural evolution of the World Wide Web
and rapidly became some of the most visited sites on the Internet.
This is not surprising, because it was incredibly faster to find
information based on hierarchical organization or keyword searching
than with simple Web surfing, a task that could last for hours
and show no practical results. Today, there are tens of thousands
of Web servers, and the need for an organized system of information
retrieval is greater than ever.
<P>
Lycos, Yahoo!, Excite, Infoseek, or Altavista (see list of URLs
that follows) and others probably aren't new to you because all
these search engines have become quite well known and widely used.
Each search engine has its own qualities, and it is difficult
to name one as the best overall engine, because they differ in
the way they gather information and the way they let you search
the corresponding database. Yahoo!, for example, is a database
where one must enter a URL for later verification by a human or
a program. On the other hand, one of Altavista's characteristics
is that it uses a special program usually known as a <I>robot</I>
(its nickname is Scooter) to gather information automatically
from the Web and other Internet resources. These two strategies
result in different databases.
<P>
The URLs of the search engines mentioned are
<UL>
<LI><FONT COLOR=#000000>Lycos</FONT>: <TT><FONT FACE="Courier"><A HREF="http://www.lycos.com/">http://www.lycos.com/</A></FONT></TT>
<LI><FONT COLOR=#000000>Yahoo</FONT>!: <TT><FONT FACE="Courier"><A HREF="http://www.yahoo.com/">http://www.yahoo.com/</A></FONT></TT>
<LI><FONT COLOR=#000000>Altavista</FONT>: <TT><FONT FACE="Courier"><A HREF="http://altavista.digital.com/">http://altavista.digital.com/</A></FONT></TT>
<LI><FONT COLOR=#000000>Infoseek</FONT>: <TT><FONT FACE="Courier"><A HREF="http://www.infoseek.com/">http://www.infoseek.com/</A></FONT></TT>
<LI><FONT COLOR=#000000>Excite</FONT>: <TT><FONT FACE="Courier"><A HREF="http://www.excite.com/">http://www.excite.com/</A></FONT></TT>
</UL>
<H2><A NAME="GatheringInformationontheInternet"><FONT SIZE=5 COLOR=#FF0000>Gathering
Information on the Internet</FONT></A></H2>
<P>
As I have mentioned previously, there are various possible strategies
to gather information and construct a Uniform Resource Locator
(URL) database about documents on the Web and other Internet resources,
such as the Usenet. &quot;Passive&quot; sites just wait for you
to enter your own URLs or scan special Usenet newsgroups for URLs.
&quot;Active&quot; sites go search information for themselves,
using programs know as <I>robots</I> or <I>spiders</I>. A robot
is a program that automatically traverses the Web, retrieves documents,
and uses the links on the documents to continue its search through
the Web. By doing this recursively, robots can index most of the
Web (although it may take some days or weeks of continuous work).
<P>
After retrieving a page, a robot generally passes information
to another program responsible for creating an index database
in which every word is related to the pages in which it appears.
Searching and indexing words on a page may be accomplished by
using one of the following techniques:
<UL>
<LI><FONT COLOR=#000000>Search only on titles and/or headings
and/or comments</FONT>
<LI><FONT COLOR=#000000>Search the whole document</FONT>
</UL>
<P>
In the first case, only the titles, headings, or comments within
a page are really referenced on the database. This can save valuable
time, space, and computational resources but can result in a much
poorer index, because even the best page title can only give &quot;hints&quot;
about the page contents. The most powerful search engines use
the second technique and index all the text within a page.
<P>
After building an index of documents on the Web, one must periodically
check if the URLs are still valid. This is done with another program
or robot that checks existing references for invalid or moved
links. It may run periodically, getting its input from the URL
database.
<P>
Gathering information about documents available on the Internet
is one side of a search engine. The final aim is to make this
information available to users in such a way that retrieval of
relevant documents is as easy and complete as possible.
<H2><A NAME="SearchingInterfacesfortheFinalUser"><FONT SIZE=5 COLOR=#FF0000>Searching
Interfaces for the Final User</FONT></A></H2>
<P>
The search interfaces are implemented on Web pages and allow a
user to define what he or she wants to search for. These pages
are HTML forms in which the main field permits the introduction
of words or phrases and other eventual secondary fields to control
the way in which the search itself is done or presented. The form
contents are finally passed to a program on the server side as
soon as you press the Submit button. These programs on the server
side are usually implemented by using the CGI specifications.
They receive user's input, such as the search word, case sensitivity
choice, maximum number of documents to retrieve, and so on, perform
some actions on the background, and send the user an HTML page
containing references to the documents found. CGI applications
handle user input that results in output but can also pass the
actual searching action to another program, a gateway, or query
program to a database. If the index database is not very big,
it can be implemented by using plain files, and a CGI application
handling user's input and output, as well as the information search.
<P>
So, forms are the user's doors to all the information available
on a search engine. As there are lots of search engines, there
are also lots of search pages. Fortunately, they all are similar
and easy to use.
<P>
Being able to use a search engine on the World Wide Web is useful
but requires you to connect to different search engines (if you
plan to use more than one) and submit a query to each one. Wouldn't
it be nice to have your own customized form from which you could
submit queries to every major search engine? You could even develop
this idea further and try to submit queries to search engines
at the same time, but then you would have to develop a special
script to help you do the submission and get the results.
<P>
For you to develop your own search form for your favorite search
engines, it is necessary to look at the original form and see
what the CGI search program is expecting to get as input, which
you do by viewing the HTML source of each search form and looking
for the <TT><FONT FACE="Courier">&lt;FORM...&gt; &lt;/FORM&gt;</FONT></TT>
tags. Also, on different engines, a search script can be implemented
by using different call methods (<TT><FONT FACE="Courier">GET</FONT></TT>
or <TT><FONT FACE="Courier">POST</FONT></TT>). Because a search
query will not alter a database, the <TT><FONT FACE="Courier">GET</FONT></TT>
method is generally used to submit the form, although some sites
prefer to use <TT><FONT FACE="Courier">POST</FONT></TT>.
<P>
In any event, I recommend you read the copyright statements or
use policies of each search engine before copying any HTML or
invoking any CGI application from other servers. In general, it
is allowed to use the CGI applications from custom forms (that
respect the interface of the CGI application, naturally), but
you should always check to make sure.
<P>
A global search form is only a collection of different search
forms available on each search engine. As an example, we will
create a custom form for searches on Yahoo! and Lycos:
<UL>
<LI><FONT COLOR=#000000>First, look at the source of </FONT><TT><FONT FACE="Courier"><A HREF="http://www.lycos.com/">http://www.lycos.com/</A></FONT></TT>
and copy the source between <TT><FONT FACE="Courier">&lt;FORM
...&gt;</FONT></TT> and <TT><FONT FACE="Courier">&lt;/FORM&gt;</FONT></TT>
tags, removing unwanted images, links, text, or other unimportant
tags:
</UL>
<P>
<P>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">...<BR>
&lt;form action=&quot;/cgi-bin/pursuit&quot; method=GET&gt;<BR>
&lt;b&gt;Find:&lt;/b&gt; &lt;input name=&quot;query&quot;&gt;&lt;input
type=submit value=&quot;Go Get It&quot;&gt;<BR>
&lt;br&gt;<BR>
&lt;input type=radio name=ab checked value=the_catalog&gt;lycos
catalog<BR>
&lt;input type=radio name=ab value=a2z&gt;a2z directory<BR>
&lt;input type=radio name=ab value=point&gt; point reviews<BR>
&lt;/form&gt;<BR>
...</FONT></TT>
</BLOCKQUOTE>
<UL>
<LI><FONT COLOR=#000000>Do the same </FONT>thing for Yahoo! (<TT><FONT FACE="Courier"><A HREF="http://www.yahoo.com/">http://www.yahoo.com/</A></FONT></TT>)
or other search engines you like:
</UL>
<P>
<P>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">...<BR>
&lt;form action=&quot;http://search.yahoo.com/bin/search&quot;&gt;
<BR>
&lt;input size=25 name=p&gt; &lt;input type=submit value=Search&gt;
<BR>
&lt;/form&gt;<BR>
...</FONT></TT>
</BLOCKQUOTE>
<UL>
<LI><FONT COLOR=#000000>Finally, combine both forms on a single
HTML page and try displaying it by using your browser to see if
it works (it should if you proceed this way). See Figure 11.1
for the final page. You can then customize your page (center tables,
fields, and so on) and make it more appealing by integrating some
graphics used on the remote search engine (most of them permit
reutilization of graphics for use on a search form, but you should
check this out first, too). The HTML source </FONT>code for our
global search form follows:
</UL>
<P>
<A HREF="f11-1.gif" ><B>Figure 11.1: </B><I>The Custom search form.</I></A>
<BR>
<P>
<BLOCKQUOTE>
<TT><FONT FACE="Courier">&lt;html&gt;<BR>
&lt;head&gt;<BR>
&lt;title&gt;My search form&lt;/title&gt;<BR>
&lt;/head&gt;<BR>
&lt;body&gt;<BR>
&lt;h1 align=center&gt;My search form&lt;/h1&gt;<BR>
&lt;p&gt;<BR>
&lt;h2 align=center&gt;Lycos&lt;/h2&gt;<BR>
&lt;form action=&quot;http://www.lycos.com/cgi-bin/pursuit&quot;
method=GET&gt;<BR>
&lt;b&gt;Find:&lt;/b&gt; &lt;input name=&quot;query&quot;&gt;&lt;input
type=submit value=&quot;Go Get It&quot;&gt;<BR>
&lt;br&gt;<BR>
&lt;input type=radio name=ab checked value=the_catalog&gt;lycos
catalog<BR>
&lt;input type=radio name=ab value=a2z&gt;a2z directory<BR>
&lt;input type=radio name=ab value=point&gt; point reviews<BR>
&lt;/form&gt;<BR>
&lt;p&gt;<BR>
&lt;h2 align=center&gt;Yahoo&lt;/h2&gt;<BR>
&lt;form action=&quot;http://search.yahoo.com/bin/search&quot;&gt;
<BR>
&lt;input size=25 name=p&gt; &lt;input type=submit value=Search&gt;
<BR>
&lt;/form&gt;<BR>
&lt;/body&gt;<BR>
&lt;/html&gt;</FONT></TT>
</BLOCKQUOTE>
<P>
This form can now sit on your server so that users don't need
to connect to the original search engine main form in order to
perform information searches on the Internet.
12 3 下一页
💿 文件大小 1276 K
👤 上传用户 as7512158
📂 所属分类软件工程
📄 代码行数 836 行
💻 语言类型 HTM
🏷️ 相关标签

#programming #hottest #stuff #book
更多programming资源 →
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -