807-809.html

来自「linux-unix130.linux.and.unix.ebooks130 l」· HTML 代码 · 共 120 行

HTML
120
字号
<HTML>

<HEAD>

<TITLE>Linux Unleashed, Third Edition:Configuring a WAIS Site</TITLE>

<SCRIPT>
<!--
function displayWindow(url, width, height) {
        var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>

 -->




<!--ISBN=0672313723//-->

<!--TITLE=Linux Unleashed, Third Edition//-->

<!--AUTHOR=Tim Parker//-->

<!--PUBLISHER=Macmillan Computer Publishing//-->

<!--IMPRINT=Sams//-->

<!--CHAPTER=49//-->

<!--PAGES=807-809//-->

<!--UNASSIGNED1//-->

<!--UNASSIGNED2//-->



<CENTER>

<TABLE BORDER>

<TR>

<TD><A HREF="805-806.html">Previous</A></TD>

<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>

<TD><A HREF="809-812.html">Next</A></TD>

</TR>

</TABLE>

</CENTER>

<P><BR></P>

<H4 ALIGN="LEFT"><A NAME="Heading6"></A><FONT COLOR="#000077">WAIS Index Files</FONT></H4>

<P>The freeWAIS index files are not usually readable by a system user (although one or two files can be read with some success). Usually, <TT>waisindex</TT> creates seven index files, although the number may vary depending on requirements. Each index file has a specific file extension to show its purpose, based on a root name (specified on the <TT>waisindex</TT> command line, or defaulting to <TT>index</TT>). The index files and their purposes are as follows:</P>

<DL>

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>index.doc</TT> A document file that contains a table with the filename, a headline (title) from the file, the location of the first and last characters of an entry, the length of the document, the number of lines in the document, and the time and date the document was created.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>index.dct</TT> A dictionary file that contains a list of every unique word in the files cross-indexed to the inverted file.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>index.fn</TT> A filename file that contains a table with a list of the filenames, the date they were created in the index, and the type of file.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>index.hl</TT> A headline file that contains a table of all headlines (titles). The headline is displayed in the search output when a match occurs.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>index.inv</TT> Inverted files that contain a table associating every unique word in all the files with a pointer to the files themselves and the word&#146;s importance (determined by how close the word is to the start of the file, the number of times the word occurs in the document, and the percentage of times the word appears in the document).

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>index.src</TT> A source description file that contains descriptions of the information indexed, including the host name and IP address, the port watched by WAIS, the source file name, any cost information for the service, the headline of the service, a description of the source, and the email address of the administrator. The source description file is editable by ASCII editors. We will look at this file in a little more detail shortly.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>index.status</TT> A status file containing user-defined information.

</DL>

<P>The source description file is a standard ASCII file that is read by <TT>waisindex</TT> at intervals to see whether information has changed. If the changes are significant, <TT>waisindex</TT> updates its internal information. A sample source file looks like this:</P>

<!-- CODE //-->

<PRE>

 (:source

  :version 2

  :ip-address &#147;147.120.0.10&#148;

  :ip-name: &#147;wizard.tpci.com&#148;

  :tcp-port 210

  :database-name &#147;Linux stuff&#148;

  :cost 0.00

  :cost-unit: free

  :maintainer &#147;wais_help&#64;tpci.com&#148;

  :subjects &#147;Everything you need to know about Linux&#148;

  :description &#147;If you need to know something about Linux, it&#146;s here.&#148;

</PRE>

<!-- END CODE //-->

<P>You should edit this file when you set up freeWAIS because the default descriptions are rather sparse and useless.

</P>

<H4 ALIGN="LEFT"><A NAME="Heading7"></A><FONT COLOR="#000077">The waisindex Command</FONT></H4>

<P>The <TT>waisindex</TT> command allows a number of options, some of which you have seen earlier in this chapter. The following list contains the primary <TT>waisindex</TT> options of interest to most users:</P>

<DL>

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-a</TT> Appends data to an existing index file (used to update index files instead of regenerating them each time a new document is added).

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-contents</TT> Indexes the file contents (default action).

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-d</TT> Gives the filename root for index files (for example, <TT>-d /usr/wais/foo</TT> named all index files as <TT>/usr/wais/foo.<I>xxx</I></TT>).

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-e</TT> Gives the name of the log file for error information (default is <TT>stderr</TT>&#151;usually the console&#151;although you can specify <TT>-s</TT> for <TT>/dev/null</TT>).

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-export</TT> Adds the host name and TCP port to descriptions for easier Internet access.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-l</TT> Gives the level of log messages. Valid values are as follows:

<DL>

<DD><TT>0</TT>, no log

<DD><TT>1</TT>, log only high priority errors and warnings

<DD><TT>5</TT>, log medium priority errors and warnings, as well as index filenameinformation

<DD><TT>10</TT>, log every event

</DL>

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-M</TT> Links multiple types of files.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-mem</TT> Limits memory usage during indexing (the higher the number specified, the faster the indexing process and the more memory used).

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-nocontents</TT> Prevents a file from being indexed (indexes only the document header and filename).

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-nopairs</TT> Instructs <TT>waisindex</TT> to ignore adjacent capitalized words from being indexed together.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-nopos</TT> Ignores the location of keywords in a document when determining scores.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-pairs</TT> Indexes adjacent capitalized words as a single entry.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-pos</TT> Determines scores based on locations of keywords (proximity of keywords increases scores).

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-r</TT> Recursive subdirectory indexing.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-register</TT> Registers your indexes with the WAIS Directory of Services.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-stdin</TT> Uses a filename from the keyboard instead of a filename on the command line.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-stop</TT> Indicates a file containing stopwords (words too common to be indexed), usually defined in <TT>src/ir/stoplist.c</TT>.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-t</TT> Data file type indicator.

<DD><B>&#149;</B>&nbsp;&nbsp;<TT>-T</TT> Sets the type of data to whatever follows.

</DL>

<P>The <TT>waisindex</TT> program has to be told the type of information in a file; otherwise it may not be able to generate an index properly. Many file types are currently defined with freeWAIS, and you can display them by entering this command with no argument:</P>

<!-- CODE SNIP //-->

<PRE>

waisindex

</PRE>

<!-- END CODE SNIP //-->

<P><BR></P>

<CENTER>

<TABLE BORDER>

<TR>

<TD><A HREF="805-806.html">Previous</A></TD>

<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>

<TD><A HREF="809-812.html">Next</A></TD>

</TR>

</TABLE>

</CENTER>





</td>
</tr>
</table>

<!-- begin footer information -->





</body></html>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?