📄 lsg44.htm
字号:
:cost 0.00
:cost-unit: free
:maintainer "wais_help@tpci.com"
:subjects "Everything you need to know about Linux"
:description "If you need to know something about Linux, it's here."</FONT></PRE>
<P>You'll want to edit this file when you set up freeWAIS because the default descriptions are rather spare and useless.
<BR>
<BR>
<A NAME="E69E252"></A>
<H4 ALIGN=CENTER>
<CENTER>
<FONT SIZE=4 COLOR="#FF0000"><B>The waisindex Command</B></FONT></CENTER></H4>
<BR>
<P>The waisindex command provides a number of options, some of which you have seen earlier in this chapter. The primary waisindex options of interest to most users are the following:
<BR>
<UL>
<LI>The -a option appends data to an existing index file (used to update index files instead of regenerating them each time a new document is added).
<BR>
<BR>
<LI>The -contents—option indexes the file contents (default action).
<BR>
<BR>
<LI>The -d option gives the filename root for index files (for example, -d /usr/wais/foo names all index files as /usr/wais/foo.xxx).
<BR>
<BR>
<LI>The -e option gives the name of the log file for error information (default is stderr, which is usually the console, although you can specify -s for /dev/null).
<BR>
<BR>
<LI>The -export option adds the host name and TCP port to descriptions for easier Internet access.
<BR>
<BR>
<LI>The -l option gives the level of log messages. Valid values are 0 = no log; 1 = log only high priority errors and warnings; 5 = log medium priority errors and warnings as well as index filename information; and 10 = log every event.
<BR>
<BR>
<LI>The -M option links multiple types of files.
<BR>
<BR>
<LI>The -mem option limits memory usage during indexing (the higher the number specified, the faster the indexing process and the more memory used).
<BR>
<BR>
<LI>The -nocontents option prevents a file from being indexed (indexes only the document header and filename).
<BR>
<BR>
<LI>The -nopairs option instructs waisindex to ignore adjacent capitalized words from being indexed together.
<BR>
<BR>
<LI>The -nopos— option ignores the location of keywords in a document when determining scores.
<BR>
<BR>
<LI>The -pairs option indexes adjacent capitalized words as a single entry.
<BR>
<BR>
<LI>The -pos option determines scores based on locations of keywords (proximity of keywords increases scores).
<BR>
<BR>
<LI>The -r option activates recursive subdirectory indexing.
<BR>
<BR>
<LI>The -register option registers your indexes with the WAIS Directory of Services.
<BR>
<BR>
<LI>The -stdin option uses a filename from the keyboard instead of a filename on the command line.
<BR>
<BR>
<LI>The -stop option indicates a file containing stopwords (words too common to be indexed), usually defined in src/ir/stoplist.c.
<BR>
<BR>
<LI>The -t option indicates the data file type.
<BR>
<BR>
<LI>The -T option sets the type of data to whatever follows, such as HTML, text, or ps.
<BR>
<BR>
</UL>
<P>You must tell the waisindex program what type of information is in a file, or it may not be able to generate an index properly. Many filetypes are currently defined with freeWAIS, which you can display by entering the command with no argument:
<BR>
<BR>
<PRE>
<FONT COLOR="#000080">waisindex</FONT></PRE>
<P>Although many different types are supported by freeWAIS, only a few are really in common use. The most common file types supported by freeWAIS are the following:
<BR>
<UL>
<LI>The filename type is the same as text except that the filename is used as the headline.
<BR>
<BR>
<LI>The first_line type is the same as text except that the first line in the file is used as the headline.
<BR>
<BR>
<LI>The ftp type contains FTP code that users can use to retrieve information from another machine.
<BR>
<BR>
<LI>The GIF type is for GIF images, one image per file. The filename is used as the headline.
<BR>
<BR>
<LI>The HTML type is for HTML source code (usually used for WWW browsers). The headline is taken from the HTML code.
<BR>
<BR>
<LI>The mail_or_rmail type indexes the mbox mailbox contents as individual items.
<BR>
<BR>
<LI>The mail_digest type indexes standard e-mail as individual messages. The subject field is the headline.
<BR>
<BR>
<LI>The netnews type is standard USENET news, each article a separate item. The subject field is the headline.
<BR>
<BR>
<LI>The one_line type indexes each sentence in a document separately.
<BR>
<BR>
<LI>The PICT type is for a PICT image, one image per file. The filename is used as the headline.
<BR>
<BR>
<LI>The ps type is a PostScript file with one document per file.
<BR>
<BR>
<LI>The text type indexes the file as one document. The pathname is used as the headline.
<BR>
<BR>
<LI>The TIFF type is for TIFF images, one image per file. The filename is used as the headline.
<BR>
<BR>
</UL>
<P>To tell waisindex the type of file to be examined, use the -t option followed by the proper type. For example, to index standard ASCII text, you could use the command:
<BR>
<BR>
<PRE>
<FONT COLOR="#000080">waisindex -t text -r /usr/waisdata/*</FONT></PRE>
<P>This command indexes all the files in /usr/waisdata recursively, assuming they are all ASCII files.
<BR>
<BLOCKQUOTE>
<BLOCKQUOTE>
<HR ALIGN=CENTER>
<BR>
<NOTE>When a document has been indexed, any changes in the document will not be reflected in the WAIS index unless a complete reindex is performed. Using the -a option does not update existing index entries. Instead, start the index process again. You should do this at periodic intervals as a matter of course.</NOTE>
<BR>
<HR ALIGN=CENTER>
</BLOCKQUOTE></BLOCKQUOTE>
<BR>
<A NAME="E69E253"></A>
<H4 ALIGN=CENTER>
<CENTER>
<FONT SIZE=4 COLOR="#FF0000"><B>Getting Fancy</B></FONT></CENTER></H4>
<BR>
<P>You can provide some extra features for users of your freeWAIS service in a number of ways. Although this section is not exhaustive by any means, it will show you two of the easily implementable features that make a WAIS site more attractive.
<BR>
<P>To begin, suppose you want to make video, graphics, or audio available on a particular subject. As an example, imagine that your site deals with musical instruments and you have lots of documents on violins. You may want to provide an audio clip of a violin being played, a video of the making of a violin body, or a graphic image of a Stradivarius violin. To make these extra files available, you should have all the files with the same filename but different extensions. For example, if your primary document on violins is called violins.txt, you may have the following files in the WAIS directories:
<BR>
<TABLE BORDERCOLOR=#000040 BORDER=1 CELLSPACING=2 WIDTH="100%" CELLPADDING=2 >
<TR>
<TD VALIGN=top BGCOLOR=#80FFFF ><FONT COLOR=#000080>
violins.TEXT
</FONT>
<TD VALIGN=top BGCOLOR=#80FFFF ><FONT COLOR=#000080>
Document describing violins</FONT>
<TR>
<TD VALIGN=top BGCOLOR=#80FFFF ><FONT COLOR=#000080>
violins.TIFF
</FONT>
<TD VALIGN=top BGCOLOR=#80FFFF ><FONT COLOR=#000080>
Image of a Stradivarius</FONT>
<TR>
<TD VALIGN=top BGCOLOR=#80FFFF ><FONT COLOR=#000080>
violins.MPEG
</FONT>
<TD VALIGN=top BGCOLOR=#80FFFF ><FONT COLOR=#000080>
Video of the making of a violin body</FONT>
<TR>
<TD VALIGN=top BGCOLOR=#80FFFF ><FONT COLOR=#000080>
violins.MIDI
</FONT>
<TD VALIGN=top BGCOLOR=#80FFFF ><FONT COLOR=#000080>
MIDI file of a violin being played</FONT>
</TABLE><P>All these files should have the same root name (violins) but different types (recognized by waisindex). Then you have to associate the multimedia files with the document file. You can do this with the following command:
<BR>
<BR>
<PRE>
<FONT COLOR="#000080">waisindex -d violin -M TEXT,TIFF,MPEG,MIDI -export /usr/waisdata/violin/*</FONT></PRE>
<P>This tells waisindex that all four types of files are to be handled. When a user searches for the keyword violin, all four types of files will be matched, and options on the browser may let them play, view, or hear the non-text components.
<BR>
<P>Another common feature is the use of synonyms to account for different methods of specifying a subject. For example, a scientist may use the keyword feline, but a non-scientist may use cat. You want to be able to match these two words to the same thing. You can do this through a file called SOURCE.syn, which is automatically read by the search engine when it is working. The SOURCE.syn file has the following format:
<BR>
<BR>
<PRE>
<FONT COLOR="#000080">word synonym [synonym ...]</FONT></PRE>
<P>Here, word is the word to be used to search the databases, and synonym is the word(s) that should match it. For example, if you are dealing with domestic pets in your WAIS site, you may have the following entries in the SOURCE.syn file:
<BR>
<PRE>
<FONT COLOR="#000080">cat feline
dog canine hound pooch
bord parrot budgie</FONT></PRE>
<P>The synonym file can be very useful when people use different terms to refer to the same thing. An easy way to check for the need for synonyms is to set the logging op<A NAME="I2"></A>tion for waisindex to 10 for a while, and see what words people are using on your site. Don't keep it on too long, as the logfiles can become enormous with a little traffic.
<BR>
<BR>
<A NAME="E68E236"></A>
<H3 ALIGN=CENTER>
<CENTER>
<FONT SIZE=5 COLOR="#FF0000"><B>Summary</B></FONT></CENTER></H3>
<BR>
<P>Now that WAIS is up and running on your server, you can go about the process of building your index files and letting others access your server. WAIS is quite easy to manage, and offers a good way of letting other users access your system's documents. The alternative approach, for text-based systems, is Gopher, which you examine in the next chapter.
<P ALIGN=LEFT>
</td>
</tr>
</table>
<!-- begin footer information -->
</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -