📄 00000066.htm
字号:
<HTML><HEAD> <TITLE>BBS水木清华站∶精华区</TITLE></HEAD><BODY><CENTER><H1>BBS水木清华站∶精华区</H1></CENTER>发信人: reden (鱼 ~ 君子律己以利人), 信区: Linux <BR>标 题: Searching a Web Site with Linux <BR>发信站: BBS 水木清华站 (Mon Oct 5 00:18:52 1998) WWW-POST <BR> <BR>"Linux Gazette...making Linux just a little more fun!"
<BR>
<BR>
<BR>
<BR> Searching a Web Site with Linux
<BR>
<BR> By Branden Williams
<BR>
<BR>
<BR>
<BR>As your website grows in size, so will the number of people that visit your <BR>site. Now most of these people are just like you and me
<BR>in the sense that they want to go to your site, click a button, and get <BR>exactly what information they were looking for. To serve
<BR>these kinds of users a bit better, the Internet community responded with the <BR>``Site Search''. A way to search a single website for
<BR>the information you are looking for. As a system administrator, I have been <BR>asked to provide search engines for people to use on
<BR>their websites so that their clients can get to their information as fast as <BR>possible.
<BR>
<BR>Now the trick to most search engines (Internet wide included) is that they <BR>index and search entire sites. So for instance, you are
<BR>looking for used cars. You decide to look for an early 90s model Nissan <BR>Truck. You get on the web, and go to AltaVista. If you do
<BR>a search for ``used Nissan truck'', you will most likely come up with a few <BR>pages that have listings of cars. Now the pain comes
<BR>when you go to that link and see that 400K HTML file with text listings of <BR>used trucks. You have to either go line by line until you
<BR>find your choice, or like most people, find it on your page using your <BR>browser's find command.
<BR>
<BR>Now wouldn't it be nice if you could just search for your used truck and get <BR>the results you are looking for in one fail swoop?
<BR>
<BR>A recent search CGI that I designed for a company called Resource Spectrum <BR>(<A HREF="http://www.spectrumm.com/)">http://www.spectrumm.com/)</A> is what precipitated
<BR>DocSearch. Resource Spectrum needed a solution similar to my truck analogy. <BR>They are a placement agency for high skilled jobs
<BR>that needed another alternative to posting their job listing to newsgroups. <BR>What was proposed was a searchable Internet listing of
<BR>the jobs on their new website.
<BR>
<BR>Now as the job listing came to us, it was in a word document that had been <BR>exported to HTML. As I searched (no pun intended)
<BR>long and hard for something that I could use, nothing turned up. All of the <BR>search engines I found only searched sites, not single
<BR>documents.
<BR>
<BR>This is where the idea for DocSearch came from.
<BR>
<BR>I needed a simple, clean way to search that single HTML document so users <BR>could get the information they needed quickly and
<BR>easily.
<BR>
<BR>I got out the old Perl Reference and spent a few afternoons working out a <BR>solution to this problem. After a few updates, you see
<BR>in front of you DocSearch 1.0.4. You can grab the latest version at <BR><A HREF="ftp://ftp.inetinc.net/pub/docsearch/docsearch.tar.gz.">ftp://ftp.inetinc.net/pub/docsearch/docsearch.tar.gz.</A>
<BR>
<BR>Let's go through the code here so we can see how this works. First before we <BR>really get into this though, you need to make sure
<BR>you have the CGI Library (cgi-lib.pl) installed. If you do not, you can <BR>download it from <A HREF="http://www.bio.cam.ac.uk/cgi-lib/.">http://www.bio.cam.ac.uk/cgi-lib/.</A> This is
<BR>simply a Perl library that contains several useful functions for CGIs. Place <BR>it in your cgi-bin directory and make it world readable
<BR>and executable. (chmod a+rx cgi-lib.pl)
<BR>
<BR>Now you can start to configure DocSearch. First off, there are a few <BR>constants that need to be set. They are in reference to the
<BR>characteristics of the document you are searching. For instance...
<BR>
<BR># The Document you want to search.
<BR>$doc = "/path/to/my/list.html";
<BR>
<BR>Set this to the absolute path of the document you are searching.
<BR>
<BR># Document Title. The text to go inside the
<BR><title></title> HTML tags.
<BR>$htmltitle = "Nifty Search Results";
<BR>
<BR>Set this to what you want the results page title to be.
<BR>
<BR># Optional Back link. If you don't want one, make the string null.
<BR># i.e. $backlink = "";
<BR>$backlink = "<A HREF="http://www.inetinc.net/some.html";
">http://www.inetinc.net/some.html";
</A> <BR>
<BR>If you want to provide a ``Go Back'' link, enter the URL of the file that we <BR>will be referencing.
<BR>
<BR># Record delimiter. The text which separates the records.
<BR>$recdelim = " ";
<BR>
<BR>This part is one of the most important aspects of the search. The document <BR>you are searching must have something in between
<BR>the "records" to delimit the html document. In English, you will need to <BR>place some HTML comment or something in between each
<BR>possible result of the search. In my example, MS Word put the $nbsp; tag in <BR>between all of the records by default, so I just used
<BR>that as a delimiter.
<BR>
<BR>Next we ReadParse() our information from the HTML form that was used as a <BR>front end to our CGI. Then to simplify things
<BR>later, we go ahead and set the variable $query to be the term we are <BR>searching for.
<BR>
<BR>$query = $input{`term'};
<BR>
<BR>This step can be repeated for each query item you would like to use to narrow <BR>your search. If you want any of these items to be
<BR>optional, just add a line like this in your code.
<BR>
<BR>if ($query eq "") {
<BR> $query = " ";
<BR>}
<BR>
<BR>This will match relatively any record you search.
<BR>
<BR>Now comes a very important step. We need to make sure that any meta <BR>characters are escaped. Perl's bind operator uses meta
<BR>characters to modify and change search output. We want to make sure that any <BR>characters that are entered into the form are not
<BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -