📄 glimpse.1
字号:
both patterns.\fIglimpse -F 'gnu;\\.c$' 'define;DEFAULT'\fRwill output all lines containing both 'define' and 'DEFAULT'(anywhere in the line, not necessarily in order) infiles whose name contains 'gnu' and ends with .c.\fIglimpse '{political,computer};science'\fR will match 'political science' or 'science of computers'.The NOT operation works only together with the -W option and it isgenerally applies only to the whole file rather to individual records.Its output may sometimes seem counterintuitive.Use with care.\fIglimpse -W 'fame;~glory'\fR will output all lines containing 'fame'in all files that contain 'fame' but do not contain 'glory';This is the most common use of NOT, and in this case it worksas expected.\fIglimpse -W '~{fame;glory}'\fR will be limited to files that donot contain both words, and will output all lines containing oneof them..TP\fBWild cards\fPThe symbol '#' is used to denote a sequence of any number (including 0) of arbitrary characters (see LIMITATIONS). The symbol # is equivalent to .* in egrep.In fact, .* will work too, because it is a valid regular expression(see below), but unless this is part of an actual regular expression,# will work faster. (Currently glimpse is experiencing some problems with #.).TP\fBCombination of exact and approximate matching\fPAny pattern inside angle brackets <> must match the text exactly evenif the match is with errors. For example, <mathemat>ics matchesmathematical with one error (replacing the last s with an a), butmathe<matics> does not match mathematical no matter how many errors areallowed.(This option is buggy at the moment.).TP\fBRegular expressions\fPSince the index is word based, a regular expression must matchwords that appear in the index for glimpse to find it.Glimpse first strips the regular expression from all non-alphabeticcharacters, and searches the index for all remaining words.It then applies the regular expression matching algorithm to thefiles found in the index.For example, \fIglimpse\fP 'abc.*xyz' will search the indexfor all files that contain both 'abc' and 'xyz', and thensearch directly for 'abc.*xyz' in those files.(If you use glimpse \-w 'abc.*xyz', then 'abcxyz' will not be found, because glimpsewill think that abc and xyz need to be matches to whole words.)The syntax of regular expressions in \fBglimpse\fP is in general the same asthat for \fBagrep\fP. The union operation `|', Kleene closure `*',and parentheses () are all supported.Currently '+' is not supported.Regular expressions are currently limited to approximately 30characters (generally excluding meta characters). Some options(\-d, \-w, \-t, \-x, \-D, \-I, \-S) do not currently work with regular expressions.The maximal number of errors for regular expressions that use '*'or '|' is 4. (See LIMITATIONS.).TP\fBstructured queries\fPGlimpse supports some form of structured queries using Harvest's SOIFformat. See STRUCTURED QUERIES below for details..SH EXAMPLES.LP(Run "glimpse '^glimpse' this-file" to get a list of all examples, someof which were given earlier.).TPglimpse -F 'haystack.h$' needlefinds all needles in all haystack.h's files..TPglimpse -2 -F html Anestesiologyoutputs all occurrences of Anestesiology with two errors in files withhtml somewhere in their full name..TPglimpse -l -F '\\.c$' variablenamelists the names of all .c files that contain variablename(the -l option lists file names rather than output the matched lines)..TPglimpse -F 'mail;1993' 'windsurfing;Arizona'finds all lines containing \fIwindsurfing\fP and \fIArizona\fP in all fileshaving `mail' and '1993' somewhere in their full name..TPglimpse -F mail 't.j@#uk'finds all mail addresses (search only files with mail somewhere intheir name) from the uk, where the login name ends witht.j, where the . stands for any one character. (This is very useful to find a login name of someone whose middle nameyou don't know.).TPglimpse -F mbox -h -G . > MBOXconcatenates all files whose name matches `mbox' into one big one..SH "SEARCHING IN COMPRESSED FILES".LPGlimpse includes an optional new compression program,called \fIcast\fP,which allows glimpse (and agrep) to search the compressed fileswithout having to decompress them. The search is actually significantlyfaster when the files are compressed. However, we have not tested\fIcast\fP as thoroughly as we would have liked, and a mishap ina compression algorithm can cause loss of data, so we recommendat this point to use \fIcast\fP very carefully.We do not support or maintain cast.(Unless you specifically use \fIcast\fP, the default is to ignore it.).SH "GLIMPSEINDEX FILES".LPAll files used by glimpse are located at the directory(ies) where the index(es) is (are) stored and have .glimpse_ as a prefix.The first two files (.glimpse_exclude and .glimpse_include) areoptionally supplied by the user. The other files are built andread by glimpse..LP.IP "\fB.glimpse_exclude\fR"contains a list of files that glimpseindex is explicitly told to ignore. In general, the syntax of .glimpse_exclude/include is the same asthat of agrep (or any other grep). The lines in the .glimpse_excludefile are matched to the file names, and if they match, the files are excluded. Notice that agrep matches to parts of the string!e.g., agrep /ftp/pub will match /home/ftp/pub and /ftp/pub/whatever.So, if you want to exclude /ftp/pub/core, you just listit, as is, in the .glimpse_exclude file.If you put "/home/ftp/pub/cdrom" in .glimpse_exclude, every filename that matches that string will be excluded, meaning all filesbelow it.You can use ^ to indicate the beginning of a file name, and $ toindicate the end of one, and you can use * and ? in the usual way.For example /ftp/*html will exclude /ftp/pub/foo.html, but willalso exclude /home/ftp/pub/html/whatever; if you want to excludefiles that start with /ftp and end with html use ^/ftp*html$Notice that putting a * at the beginning or at the end is redundant(in fact, in this case glimpseindex will remove the * when it does the indexing).No other meta characters are allowed in .glimpse_exclude(e.g., don't use .* or # or |).Lines with * or ? must have no more than 30 characters.Notice that, although the index itself will not be indexed,the list of file names (.glimpse_filenames) will be indexedunless it is explicitly listed in .glimpse_exclude..IP "\fB.glimpse_filters\fR"See the description above for the -z option..IP "\fB.glimpse_include\fR"contains a list of files that glimpseindexis explicitly told to \fIinclude\fP in the index even though they may looklike non-text files. Symbolic links are followed by glimpseindex only if they are specifically included here.If a file is in both .glimpse_exclude and .glimpse_include it will beexcluded..IP "\fB.glimpse_filenames\fP"contains the list of all indexed file names, one per line.This is an ASCII file that can also be used with agrep to searchfor a file name leading to a fast find command.For example, .brglimpse 'count#\\.c$' ~/.glimpse_filenames.brwill output the names of all (indexed) .c files that have 'count' intheir name (including anywhere on the path from the index).Setting the following alias in the .login file may be useful:.bralias findfile 'glimpse -h \!:1 ~/.glimpse_filenames'.IP ".\fBglimpse_index\fP"contains the index. The index consists of lines, each starting with aword followed by a list of block numbers (unless the -o or -b options are used, in which case each word is followed by an offset into the file .glimpse_partitions where all pointers are kept).The block/file numbers are stored in binary form, so this is not an ASCII file..IP "\fB.glimpse_messages\fP"contains the output of the -w option (see above)..IP "\fB.glimpse_partitions\fP"contains the partition of the indexed space into blocksand, when the index is built with the -o or -b options, some part of theindex. This file is used internally by glimpse and it isa non-ASCII file..IP "\fB.glimpse_statistics\fP"contains some statistics about the makeup of the index. Useful forsome advanced applications and customization of glimpse..IP "\fB.glimpse_turbo\fP"An added data structure (used under glimpseindex -o or -b only) that helps to speed upqueries significantly for large indexes. Its size is 0.25MB.Glimpse will work without it if needed..SH "STRUCTURED QUERIES"Glimpse can search for Boolean combinations of "attribute=value" termsby using the Harvest SOIF parser library (in glimpse/libtemplate). To search this way, the index must be made by using the -s option ofglimpseindex (this can be used in conjunction with other glimpseindexoptions). For glimpse and glimpseindex to recognize "structured" files,they must be in SOIF format. In this format, each value is prefixed byan attribute-name with the size of the value (in bytes) present in "{}"after the name of the attribute. For example, The following lines are part of an SOIF file:.br.nftype{17}: Directory-Listingmd5{32}: 3858c73d68616df0ed58a44d306b12ba.fiAny string can serve as an attribute name.Glimpse "pattern;type=Directory-Listing" will search for "pattern"only in files whose type is "Directory-Listing".The file itself is considered to beone "object" and its name/url appears as the first attribute with an"@" prefix; e.g.,@FILE { http://xxx... }The scope of Boolean operations changes from records(lines) to whole files when structured queries are used in glimpse(since individual query terms can look at different attributes and theymay not be "covered" by the record/line). Note that glimpse can onlysearch for patterns in the value parts of the SOIF file: there are some attributes (like the TTL, MD5, etc.) that are interpreted by Harvest'sinternal routines.See http://harvest.cs.colorado.edu/harvest/user-manual/ for more detailedinformation of the SOIF format..SH "REFERENCES".IP 1.U. Manber and S. Wu,"GLIMPSE: A Tool to Search Through Entire File Systems,"\fIUsenix Winter 1994 Technical Conference\fP(best paper award),San Francisco (January 1994), pp. 23\-32.Also, Technical Report #TR 93-34, Dept. of Computer Science,University of Arizona, October 1993 (a postscript fileis available by anonymous ftp atftp://ftp.cs.arizona.edu/reports/1993/TR93-34.ps)..IP 2.S. Wu and U. Manber,"Fast Text Searching Allowing Errors,"\fICommunications of the ACM\fP\fB35\fP (October 1992), pp. 83\-91..SH "SEE ALSO".BR agrep (1),.BR ed (1),.BR ex (1),.BR glimpseindex (1),.BR glimpseserver (1),.BR grep (1),.BR sh (1),.BR csh (1)..SH LIMITATIONS.LPThe index of glimpse is word based. A pattern that contains more thanone word cannot be found in the index. The way glimpse overcomes thisweakness is by splitting any multi-word pattern into its set of wordsand looking for all of them in the index.For example, \fBglimpse 'linear programming'\fR will first consult the indexto find all files containing both \fIlinear\fP and \fIprogramming\fP,and then apply agrep to find the combined pattern.This is usually an effective solution, but it can be slow forcases where both words are very common, but their combination is not..LPAs was mentioned in the section on PATTERNS above, some charactersserve as meta characters for glimpse and need to bepreceded by '\\' to search for them. The most commonexamples are the characters '.' (which stands for a wild card),and '*' (the Kleene closure).So, "glimpse ab.de" will match abcde, but "glimpse ab\\.de"will not, and "glimpse ab*de" will not match ab*de, but "glimpse ab\\*de" will.The meta character - is translated automatically to a hypenunless it appears between [] (in which case it denotes a range ofcharacters)..LPThe index of glimpse stores all patterns in lower case.When glimpse searches the index it first convertsall patterns to lower case, finds the appropriate files,and then searches the actual files using the originalpatterns.So, for example, \fIglimpse ABCXYZ\fR will first find allfiles containing abcxyz in any combination of lower and uppercases, and then searches these files directly, so only theright cases will be found.One problem with this approach is discovering misspellingsthat are caused by wrong cases.For example, \fIglimpse -B abcXYZ\fR will first search theindex for the best match to abcxyz (because the pattern isconverted to lower case); it will find that there are matcheswith no errors, and will go to those files to search themdirectly, this time with the original upper cases. If the closest match is, say AbcXYZ, glimpse may miss it,because it doesn't expect an error.Another problem is speed. If you search for "ATT", it will lookat the index for "att". Unless you use -w to match the whole word,glimpse may have to search all files containing, for example, "Seattle"which has "att" in it..LPThere is no size limit for simple patterns and simple patternswithin Boolean expressions.More complicated patterns, such as regular expressions,are currently limited to approximately 30 characters.Lines are limited to 1024 characters.Records are limited to 48K, and may be truncated if they are largerthan that.The limit of record length can be changed by modifying the parameter Max_record in agrep.h..LPGlimpseindex does not index words of size > 64..SH BUGS.LPIn some rare cases, regular expressions using * or # may not match correctly..LPA query that contains no alphanumeric characters is notrecommended (unless glimpse is used as agrep and the file namesare provided). This is an understatement..LPThe notion of "match to the whole word" (the \-w option) can be trickysometimes. For example, glimpse -w 'word$' will not match 'word'appearing at the end of a line, because the extra '$' makes the patternmore than just one simple word.The same thing can happen with ^ and with _.To be on the safe side,use the -w option only when the patterns are actual words..LPPlease send bug reports or comments to glimpse@cs.arizona.edu..SH DIAGNOSTICSExit status is 0 if any matches are found,1 if none, 2 for syntax errors or inaccessible files..SH AUTHORSUdi Manber and Burra Gopal, Department of Computer Science, University of Arizona, and Sun Wu, the National Chung-Cheng University,Taiwan. (Email: glimpse@cs.arizona.edu)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -