⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 glimpse.1

📁 harvest是一个下载html网页得机器人
💻 1
📖 第 1 页 / 共 3 页
字号:
.TH GLIMPSE l "November 10, 1997".SH NAME\fIglimpse 4.1\fP - search quickly through entire file systems.SH OVERVIEW\fIGlimpse\fP (which stands for GLobal IMPlicit SEarch)is a very popular UNIX indexing and query system that allows you to search through a large set of files very quickly.Glimpse supports most of \fIagrep\fP's options(\fIagrep\fP is our powerful version of \fIgrep\fP)including approximate matching (e.g., finding misspelled words),Boolean queries, and even some limited forms of regular expressions.It is used in the same way, except that you don't have tospecify file names.So, if you are looking for a \fIneedle\fPanywhere in your file system, all you have to do is say\fIglimpse needle\fRand all lines containing \fIneedle\fP will appear precededby the file name..LPTo use glimpse you first need to index your files with glimpseindex.For example, \fIglimpseindex -o ~\fR  will index everything at or belowyour home directory.  See man glimpseindex for more details..LPGlimpse is also available for web sites, as a set of tools called \fIWebGlimpse\fP.(The old glimpseHTTP is no longer supported and is not recommended.)See http://glimpse.cs.arizona.edu/webglimpse/ for moreinformation..LPGlimpse includes all of agrep and can be used instead of agrepby giving a file name(s) at the end of the command.This will cause glimpse to ignore the index and run agrep as usual.For example, \fIglimpse -1 pattern file\fR is the same as \fIagrep -1pattern file\fR.Agrep is distributed as a self-contained package within glimpse,and can be used separately.We added a new option to agrep:  -r searches recursively thedirectory and everything below it (see agrep options below);it is used only when glimpse reverts to agrep..LPMail glimpse-request@cs.arizona.edu to be added to the glimpse mailing list.Mail glimpse@cs.arizona.edu to report bugs, ask questions, discuss tricksfor using glimpse, etc. (this is a moderated mailing list with very littletraffic, mostly announcements).HTML version of these manual pagescan be found inhttp://glimpse.cs.arizona.edu/glimpsehelp.htmlAlso, see the glimpse home pages inhttp://glimpse.cs.arizona.edu/.SH SYNOPSIS.B glimpse\- [almost all letters].I pattern.SH INTRODUCTIONWe start with simple ways to use glimpse and describe all theoptions in detail later on.Once an index is built, using glimpseindex,searching for \fIpattern\fP is as easy as saying.LP\fIglimpse pattern\fR.LPThe output of glimpse is similar to that of \fIagrep\fP (or any othergrep).The pattern can be any agrep legal pattern including a regularexpression or a Boolean query (e.g., searching for Tucson AND Arizonais done by \fIglimpse 'Tucson;Arizona'\fR)..LPThe speed of glimpse depends mainly on the number and sizesof the files thatcontain a match and only to a second degree on the total size of allindexed files.  If the pattern is reasonably uncommon, then allmatches will be reported in a few seconds even if the indexed filestotal 500MB or more.  Some information on how glimpse works and areference to a detailed article are given below..LPMost of agrep (and other grep's) options are supported, includingapproximate matching.  For example,.LP\fIglimpse -1 'Tuson;Arezona'\fR.LPwill output all lines containing both patterns allowingone spelling error in any of the patterns(either insertion, deletion, orsubstitution), which in this case is definitely needed..LP\fIglimpse -w -i 'parent'\fR.LPspecifies case insensitive (\-i) and match on complete words (\-w).So 'Parent' and 'PARENT' will match, 'parent/child' will match,but 'parenthesis' or 'parents' will not match.(Starting at version 3.0, glimpse can be much faster when thesetwo options are specified, especially for very large indexes.You may want to set an alias especially for "glimpse -w -i".).LPThe -F option provides a pattern that must match the file name.For example,.LP\fIglimpse -F '\\.c$' needle\fR.LPwill find the pattern \fIneedle\fP in all files whose nameends with .c.(Glimpse will first check its index to determine which files maycontain the pattern and then run agrep on the file names to furtherlimit the search.)The -F option \fIshould not\fR be put at the end after the main pattern(e.g., "glimpse needle -F hay" is incorrect)..SH "A Detailed Description of All the Options of Glimpse".TP.B \-\fI#\fP\fI#\fP is an integer between 1 and 8specifying the maximum number of errorspermitted in finding the approximate matches (the default is zero).Generally, each insertion, deletion, or substitution counts as one error.It is possible to adjust the relative cost of insertions,deletions and substitutions (see -I -D and -S options).Since the index stores only lower case characters, errors ofsubstituting upper case with lower case may be missed(see LIMITATIONS).Allowing errors in the match requires more time and can slow downthe match by a factor of 2-4.Be very careful when specifying more than one error, as the numberof matches tend to grow very quickly..TP.B \-aprints attribute names.  This option applies only to Harvest SOIF structureddata (used with glimpseindex -s).(See http://harvest.transarc.com for more informationabout the Harvest project.).TP.B \-Aused for glimpse internals..TP.B \-bprints the byte offset (from the beginning of the file)of the end of each match.The first character in a file has offset 0..TP.B \-BBest match mode.  (Warning: -B sometimes misses matches.  It is saferto specify the number of errors explicitly.)When \-B is specified and no exact matches are found, glimpsewill continue to search until the closest matches (i.e., the oneswith minimum number of errors)are found, at which point the following message will be shown:"the best match contains x errors, there are y matches, output them? (y/n)"This message refers to the number of matches found in the index.There may be many more matches in the actual text (or there may be noneif -F is used to filter files).When the \-#, \-c, or \-l options are specified, the \-B option is ignored.In general, \-B may be slower than \-#, but not by very much.Since the index stores only lower case characters, errors ofsubstituting upper case with lower case may be missed(see LIMITATIONS)..TP.B \-cDisplay only the count of matching records.  Only files with count > 0are displayed..TP.B \-Ctells glimpse to send its queries to \fIglimpseserver\fP..TP.B \-d "'\fIdelim\fP'"Define \fIdelim\fP to be the separator between two records.The default value is '$', namely a record is by defaulta line.\fIdelim\fP can be a string of size at most 8(with possible use of ^ and $), but nota regular expression.Text between two \fIdelim\fP's, before the first \fIdelim\fP,and after the last \fIdelim\fP is considered as one record.For example, -d '$$' defines paragraphs as records and -d '^From\ 'defines mail messages as records.\fIglimpse\fP matches each record separately.\fBThis option does not currently work with regular expressions.\fPThe -d option is especially useful for Boolean AND queries,because the patterns need not appear in the same line but in thesame record. For example, \fIglimpse -F mail -d '^From\ ' 'glimpse;arizona;announcement'\fRwill output all mail messages (in their entirety) that havethe 3 patterns anywhere in the message (or the header),assuming that files with 'mail' in their name contain mailmessages.If you want the scope of the record to be the whole file,use the -W option.\fBGlimpse warning\fP:Use this option with care.  If the delimiter is set tomatch mail messages, for example, and glimpse finds the pattern in a regular file, it may not find the delimiter and will therefore output the whole file.(The -t option - see below - can be used to put the \fIdelim\fP atthe end of the record.)\fBPerformance Note:\fPAgrep (and glimpse) resorts to more complex search when the \-doption is used.  The search is slower and unfortunately no more than32 characters can be used in the pattern..TP.B \-D\fIk\fPSet the cost of a deletion to \fIk\fP (\fIk\fP is a positive integer).This option does not currently work with regular expressions..TP.BI \-e " pattern"Same as a simple.I patternargument, but useful when the.I patternbegins with a.RB ` \- '..TP.B \-Eprints the lines in the index (as they appear in the index)which match the pattern.  Used mostly for debugging and maintenance of the index.This is not an option that a user needs to know about..TP.B \-f  \fIfile_name\fRthis option has a different meaning for agrep than for glimpse:In glimpse, only the files whose names arelisted in \fIfile_name\fP are matched.(The file names have to appear as in .glimpse_filenames.)In agrep, the file_name contains the list of the patterns that are searched.(Starting at version 3.6, this option for glimpse is much fasterfor large files.).TP.B \-F  \fIfile_pattern\fRlimits the search to those files whose name (including the wholepath) matches \fIfile_pattern\fP.This option can be used in a variety of applications to providelimited search even for one large index.If \fIfile_pattern\fP matches a directory, then all files with this directory ontheir path will be considered.  To limit the search to actual filenames, use $ at the end of the pattern.  \fIfile_pattern\fP can be aregular expression and even a Boolean pattern.This option is implemented by running agrep \fIfile_pattern\fP on the list of file names obtained from the index.  Therefore, searching theindex itself takes the same amount of time, but limiting thesecond phase of the search to only a few files can speed up thesearch significantly.For example,.sp 1glimpse -F 'src#\\.c$' needle.sp 1will search for needle in all .c files with src somewhere along thepath.The -F \fIfile_pattern\fP must appear before the search pattern(e.g., glimpse needle -F '\\.c$' will not work).It is possible to use some of agrep's options when matching file names.  In this case all options as well as thefile_pattern should be in quotes.  (-B and -v do not work very wellas part of a file_pattern.)For example,.sp glimpse -F '-1 \\.html' pattern.spwill allow one spelling error when matching .html to the file names(so ".htm" and ".shtml" will match as well)..spglimpse -F '-v \\.c$' counter.spwill search for 'counter' in all files \fIexcept\fP for .c files..TP.B \-gprints the file number (its position in the .glimpse_filenamesfile) rather than its name..TP.B \-GOutput the (whole) files that contain a match..TP.B \-hDo not display filenames..TP.B \-H  \fIdirectory_name\fRsearches for the index and the other .glimpse files in\fIdirectory_name\fP.  The default is the home directory.This option is useful, for example, if several different indexes are maintained for different archives (e.g., one for mail messages, onefor source code, one for articles)..TP.B \-iCase-insensitive search \(em e.g., "A" and "a" are considered equivalent.Glimpse's index stores all patterns in lower case (see LIMITATIONS below).\fBPerformance Note:\fPWhen \-i is used together with the \-w option, the search may become much faster.It is recommended to have \-i and \-w as defaults, for example,through an alias.  We use the following alias in our .cshrc file.bralias glwi 'glimpse -w -i'.TP.B \-I\fIk\fPSet the cost of an insertion to \fIk\fP (\fIk\fP is a positive integer).This option does not currently work with regular expressions..TP.B \-j If the index was constructed with the -t option, then \-j will output the files last modification dates in addition toeverything else.There are no major performance penalties for this option..TP.B \-J \fIhost_name\fPused in conjunction with glimpseserver (\-C) to connect to one particular server..TP.B \-kNo symbol in the pattern is treated as a meta character. For example, glimpse -k 'a(b|c)*d' will find  the occurrences of a(b|c)*d whereas glimpse 'a(b|c)*d' will find substrings that match the regular expression 'a(b|c)*d'.(The only exception is ^ at the beginning of the pattern and $ at theend of the pattern, which are still interpreted in the usual way.  Use \\^ or \\$ if you need them verbatim.).TP.B \-K \fIport_number\fPused in conjunction with glimpseserver (\-C) to connect to one particular server at the specified TCP port number..TP.B \-lOutput only the files names that contain a match.This option differs from the \-N option in that the filesthemselves \fIare\fP searched, but the matching lines arenot shown. 

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -