📄 glimpseindex.1
字号:
.TH GLIMPSEINDEX l "November 10, 1997".SH NAME\fIglimpseindex 4.1\fP - index whole file systems to be searched by glimpse.SH OVERVIEW\fIGlimpse\fP (which stands for GLobal IMPlicit SEarch)is a popular UNIX indexing and query system that allows you to search througha large set of files very quickly.Glimpseindex is the indexing program for glimpse.Glimpse supports most of \fIagrep\fP's options (\fIagrep\fP is our powerful version of \fIgrep\fP) including approximate matching (e.g., finding misspelled words), Boolean queries, and even some limited forms of regular expressions. It is used in the same way, except that you don't have tospecify file names.So, if you are looking for a \fIneedle\fPanywhere in your file system, all you have to do is say\fIglimpse needle\fRand all lines containing \fIneedle\fP will appear precededby the file name.See man glimpse for details on how to use glimpse..LPGlimpseindex provides three indexing options: a tiny index (2-3% ofthe total size of all files), a small index (7-8%) and a medium-sizeindex (20-30%). Search times are normally better with larger indexes(although unless files are quite large, the small index is justabout as good as the medium one).To index all your files, you say\fIglimpseindex ~\fRfor tiny index (where ~ stands for the home directory), \fIglimpseindex -o ~\fR for small index, and\fIglimpseindex -b ~\fR for medium..LPMail glimpse-request@cs.arizona.edu to be added to the glimpse mailing list.Mail glimpse@cs.arizona.edu to report bugs, ask questions, discuss tricks for using glimpse, etc. (this is a moderated mailing list with very littletraffic, mostly announcements).HTML version of these manual pages can be found inhttp://glimpse.cs.arizona.edu/glimpseindexhelp.htmlAlso, see the glimpse home pages inhttp://glimpse.cs.arizona.edu/.SH SYNOPSIS.B glimpseindex [\fB\-abEfFiInostT \-w \fInumber\fP \-dD \fIfilename(s) \-H \fIdirectory\fP \-M \fInumber\fP \-S \fInumber\fP\fR ]\fIdirectory_name[s]\fR.SH INTRODUCTION\fIGlimpseindex\fP builds an index of all text files in allthe directories specified and all their subdirectories (recursively).It is also possible to build several separate indexes (possiblyeven overlapping).The simplest way to index your files is to say.LP\fIglimpseindex -o ~\fP.LPThe index consists of several files (described in detail below),all with the prefix \fI.glimpse_\fR stored in the user's home directory(unless otherwise specified with the -H option).Files with one of the following suffixes are not indexed: ".o", ".gz", ".Z", ".z", ".hqx", ".zip", ".tar".(Unless the -z option is used, see below.)In addition, glimpseindex attempts to determine whether a fileis a text file and does not index files that it thinks are not text files.Numbers are not indexed unless the -n option is used.It is possible to prevent specified files from beingindexed by adding their names to the .glimpse_exclude file (described below).The -o option builds a larger index than without it(typically about 7-8% vs. 2-3% without -o)allowing for a faster search (1-5 times faster).The -b builds an even larger index and allows an even faster searchsome of the time (-b is helpful mostly when large files arepresent).There is an incremental indexing option \fI-f\fR, which updates an existing index by determining which fileshave been created or modified since the index was built andadding them to the index (see -f).Glimpseindex is reasonably fast, taking about 20 minutes to index 15,000 files of about 200MB (on an Dec Alpha 233) and 2-4 minutesto update an existing index. (Your mileage may vary.)It is also possible to increment the index by adding a specific file (the -a option)..LPOnce an index is built, searching for \fIpattern\fP is as easy as saying.LP\fIglimpse pattern\fR.LP(See man glimpse for all glimpse's options and features.).SH "A DETAILED DESCRIPTION OF GLIMPSEINDEX".LPGlimpse does not automatically index files. You have to tell it to doit. This can be done manually, but a better way is to set it to runevery night. It is probably a good idea to run glimpseindex manuallyfor the first time to be sure it works properly.The following is a simple script to run glimpseindex every night.We assume that this script is stored in a file called glimpse.script:.LPglimpseindex -o -t -w 5000 ~ >& .glimpse_out.brat -m 0300 glimpse.script.br(It might be interesting to collect all the outputs of glimpse bychanging >& to >>& so that the file .glimpse_out maintains a history.In this case the file must be created before the first time >>& is used.If you use ksh, replace '>&' with '2>&1'.).LPGlimpseindex stores the names of all the files that it indexedin the file .glimpse_filenames.Each file is listed by its full path name as obtained at the timethe files were indexed.For example, /usr1/udi/file1.Glimpse uses this full name when it performs the search, so the namemust match the current name.This may become a problem when the indexing and the searchare done from different machines (e.g., through NFS), which may causethe path names to be different.For example, /tmp_mnt/R/xxx/xxx/usr1/udi/file1.(The same is true for several other .glimpse files. See below.).LPGlimpseindex does not follow symbolic links unless they are explicitly included in the .glimpse_include file (described below)..LPGlimpseindex makes an effort to identify non-text files such asbinary files, compressed files, uuencoded files, postscript files,binhex files, etc.These files are automatically not indexed.In addition, all files whose names end with `.o', `.gz', `.Z', `.z', `.hqx', `.zip', or `.tar'will not be indexed (unless they are specifically includedin .glimpse_include - see below)..LPThe options for glimpseindex are as follows:.TP.B \-a adds the given file[s] and/or directories to an existing index.Any given directory will be traversed recursively and all files willbe indexed (unless they appear in .glimpse_exclude; see below).Using this option is generally much faster than indexing everything from scratch, although in rare cases the index may not be as good. If for some reason the index is full (which can happenunless -o or -b are used)glimpseindex -a will producean error message and will exit without changing the original index..TP.B \-bbuilds a medium-size index (20-30% of the size of all files),allowing faster search. This option forces glimpseindex to storean exact (byte level) pointer to each occurrence of each word (except for some very common wordsbelonging to the stop list)..TP.B \-Buses a hash table that is 4 times bigger (256k entries instead of 64K) to speed up indexing. The memory usage will increase typically by about 2 MB.This option is only for indexing speed; it does not affect the final index..TP.B \-d filename(s)deletes the given file(s) from the index..TP.B \-D filename(s)deletes the given file(s) from the list of file names, but notfrom the index. This is much faster than -d, and the file(s) willnot be found by glimpse. However, the index itself will not becomesmaller..TP.B \-Edoes not run a check on file types. Glimpse normally attempts toexclude non-text files, but this attempt is not always perfect.With \-E, glimpseindex indexesall files, except those that are specifically excluded in .glimpse_excludeand those whose file names end with one of the excluded suffixes..TP.B \-fincremental indexing. \fIglimpseindex\fP scans all filesand adds to the index only those files that were created or modifiedafter the current index was built.If there is no current index or if this procedure fails, \fIglimpseindex\fPautomatically reverts to the default mode (which is to index everything from scratch).This option may create an inefficient index for several reasons, one of which is that deleted files are not really deleted from the index.Unless changes are small, mostly additions, and -o is used,we suggest to use the default mode as much as possible..TP.B \-F Glimpseindex receives the list of files to index from standard input..TP.B \-H directoryPut or update the index and all other .glimpse files (listed below)in "directory".The default is the home directory.When glimpse is run, the -H option must be used to direct glimpse to thisdirectory, because glimpse assumes that the index is in the homedirectory (see also the -H option in glimpse)..TP.B \-iMake .glimpse_include (SEE GLIMPSEINDEX FILES) take precedence over .glimpse_exclude,so that, for example, one can exclude everything (by putting *)and then explicitly include files..TP.B \-IInstead of indexing, only show (print to standard out)the list of files that would be indexed.It is useful for filtering purposes.("glimpseindex -I dir | glimpseindex -F" is the same as "glimpseindex dir".).TP.B \-M x Tells glimpseindex to use x MB of memory for temporary tables.The more memory you allow the faster glimpseindex will run.The default is x=2.The value of x must be a positive integer.Glimpseindex will need more memory than x for other things, and glimpseindex may perform some 'forks', so you'llhave to experiment if you want to use this option.WARNING:If x is too large you may run out of swap space..TP.B \-nIndex numbers as well as text. The default is not to index numbers.This is useful when searching for dates or other identifying numbers,but it may make the index very large if there are lots of numbers.In general, glimpseindex strips away any non-alphabetic character.For example, the string abc123 will be indexed as abc if the -n optionis not used and as abc123 if it is used.Glimpse provides warnings (in .glimpse_messages) for all filesin which more than half the words that were added to the indexfrom that file had digits in them (this is an attempt to identifydata files that should probably not be indexed).One can use the .glimpse_exclude file to exclude data files or anyother files.(See GLIMPSEINDEX FILES.).TP.B \-oBuild a small index rather than tiny(meaning 7-9% of the sizes of all files - your mileage may vary)allowing faster search. This option forces glimpseindex to allocateone block per file (a block usually contains many files).A detailed explanation of how blocks affect glimpse can befound in the glimpse article.(See also LIMITATIONS.).TP.B \-RRecompute .glimpse_filenames_index from .glimpse_filenames.The file .glimpse_filenames_index speeds up processing.Glimpseindex usually computes it automatically.However, if for some reason one wants to change the path namesof the files listed in .glimpse_filenames, then runningglimpseindex -R recomputes .glimpse_filenames_index.This is useful if the index is computed on one machine,but is used on another (with the same hierarchy).The names of the files listed in .glimpse_filenames are usedin runtime, so changing them can be done at any time in any way(as long as just the names not the content is changed).This is not really an option in the regular sense; rather,it is a program by itself, and it is meant as a post-processing step.(Avaliable only from version 3.6.).TP.B \-ssupports structured queries. This option was added to support theHarvest project and it is applicable mostly in that context.See STRUCTURED QUERIES below for more information and alsohttp://harvest.transarc.com for more informationabout the Harvest project..TP.B \-S kThe number k determines the size of the \fIstop-list\fP.The stop-list consists of words that are too common and are not indexed(e.g., 'the' or 'and').Instead of having a fixed stop-list, glimpseindex figures out thewords that are too common for every index separately.The rules are different for the different indexing options.The tiny index contains all words (the savings from a stop-list aretoo small to bother).The small index (-o), the number k is a percentage threshold.A word will be in the stop list if it appears in at least k% of all files.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -