⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 glimpse.1

📁 harvest是一个下载html网页得机器人
💻 1
📖 第 1 页 / 共 3 页
字号:
.TP.B \-L x | x:y | x:y:zif one number is given, it is a limit on the total number of matches.Glimpse outputs only the first x matches.  If \-l is used (i.e., only file namesare sought), then the limit is on the number of files;otherwise, the limit is on the number of records.If two numbers are given (x:y), then y is an added limit on the totalnumber of files.If three numbers are given (x:y:z), then z is an added limit on thenumber of matches per file.If any of the x, y, or z is set to 0, it means to ignore it(in other words 0 = infinity in this case);  for example,\-L 0:10 will output all matches to the first 10 files thatcontain a match.This option is particularlyuseful for servers that needs to limit the amount ofoutput provided to clients..TP.B \-mused for glimpse internals..TP.B \-Mused for glimpse internals..TP.B \-nEach matching record (line) is prefixed by its record (line) number in the file.\fBPerformance Note:\fPTo compute the record/line number, agrep needs to search for allrecord delimiters (or line breaks), which can slow down the search..TP.B \-Nsearches only the index (so the search is faster).If -o or -b are used then the result is the number of filesthat have a potential match plus a prompt to ask if you want tosee the file names.(If \-y is used, then there is no prompt and the names of thefiles will be shown.)This could be a way to get the matching file names without even havingaccess to the files themselves.However, because only the index is searched, some potential matchesmay not be real matches.In other words, with \-N you will not miss any file but you may getextra files.For example, since the index stores everything in lower case,a case-sensitive query may match a file that has only a case-insensitivematch.  Boolean queries may match a file that has all the keywordsbut not in the same line (indexing with \-b allows glimpse tofigure out whether the keywords are close, but it cannot figure outfrom the index whether they are exactly on the same line or in the same recordwithout looking at the file).If the index was not build with \-o or \-b, then this option outputs the number of \fIblocks\fP matching the pattern.  This is useful as an indication of how long the search will take.All files are partitioned into usually 200-250 blocks.The file \fB.glimpse_statistics\fP contains the total number of blocks(or \fBglimpse -N a\fP will give a pretty good estimate; only blocks with nooccurrences of 'a' will be missed)..TP.B \-othe opposite of \-t: the delimiteris not output at the tail, but at the beginning of the matched record..TP.B \-Othe file names are not printed before every matched record;instead, each filename is printed just once,and all the matched records within it are printed after it..TP.B \-p(from version 4.0B1 only) Supports reading compressedset of filenames.The -p option allows you to utilize compressed `neighborhoods'(sets of filenames) to limit your search, without uncompressing them.Added mostly for WebGlimpse.The usage is:.br"-p filename:X:Y:Z"where "filename" is the file with compressed neighborhoods, X is anoffset into that file (usually 0, must be a multiple of sizeof(int)),Y is the length glimpse must access from that file (if 0, then whole file;must be a multiple of sizeof(int)), and Z must be 2 (it indicatesthat "filename" has the sparse-set representation of compressedneighborhoods: the other values are for internal use only). Note thatany colon ":" in filename must be escaped using a backslash \..TP.B \-Pused for glimpse internals..TP.B \-qprints the offsets of the beginning and end of each matched record.The difference between \-q and \-b is that \-b prints the offsetsof the actual matched string, while \-q prints the offsets of thewhole record where the match occurred. The output format is @x{y}, where x is the beginning offsetand y is the end offset..TP.B \-Qwhen used together with \-N glimpse not only displays the filename wherethe match occurs, but the exact occurrences (offsets) as seen in theindex.  This option is relevant only if the index was builtwith -b;  otherwise, the offsets are not available in the index.This option is ignored when used not with \-N..TP.B \-rThis option is an agrep option and it will be ignored in glimpse,unless glimpse is used with a file name at the end which makes itrun as agrep.If the file name is a directory name, the \-r option will search(recursively) the whole directory and everything below it.(The glimpse index will not be used.).TP.B \-R \fIk\fPdefines the maximum size (in bytes) of a record.The maximum value (which is the default) is 48K.Defining the maximum to be lower than the deafult may speedup some searches..TP.B \-s   Work silently, that is, display nothing except error messages.This is useful for checking the error status..TP.B \-S\fIk\fPSet the cost of a substitution to \fIk\fP (\fIk\fP is a positive integer).This option does not currently work with regular expressions..TP.B \-t   Similar to the \-d option, except that the delimiter is assumedto appear at the \fIend\fP of the record.Glimpse will output the record starting from the end of .I delimto (and including) the next.I delim.(See warning for the \-d option.).TP.B \-T directory Use \fIdirectory\fP as a place where temporary files are built.(Glimpse produces some small temporary files usually in /tmp.)This option is useful mainly in the context of structured queriesfor the Harvest project, where the temporary files may be non-trivial,and the /tmp directory may not have enough space for them..TP.B \-U(starting at version 4.0B1) Interprets an index createdwith the -X or the -U option in glimpseindex. Useful mostly for WebGlimpse or similar web applications.When glimpse outputs matches, itwill display the filename, the URL, and the title automatically..TP.B \-v(This option is an agrep option and it will be ignored in glimpse,unless glimpse is used with a file name at the end which makes itrun as agrep.)Output all records/lines that do \fInot\fP contain a match.(Glimpse does not support the NOT operator yet.).TP.B \-Vprints the current version of glimpse..TP.B \-wSearch for the pattern as a word \(em i.e., surrounded by non-alphanumericcharacters.  For example, \fIglimpse -w car\fR will match car, but not characters and notcar10.The non-alphanumeric \fImust\fPsurround the match;  they cannot be counted as errors.This option does not work with regular expressions.\fBPerformance Note:\fPWhen \-w is used together with the \-i option, the search may become much faster.The \-w will not work with $, ^, and _ (see BUGS below).It is recommended to have \-i and \-w as defaults, for example,through an alias.  We use the following alias in our .cshrc file.bralias glwi 'glimpse -w -i'.TP.B \-WThe default for Boolean AND queries is that they cover one record(the default for a record is one line) at a time.  For example, glimpse 'good;bad' will output all lines containingboth 'good' and 'bad'.The \-W option changes the scope of Booleans to be the whole file.Within a file glimpse will output all matches to any of the patterns.So, glimpse -W 'good;bad' will output all lines containing 'good' \fIor\fP 'bad', but only in files that contain both patterns.The NOT operator '~' can be used only with \-W.It is described later on.The OR operator is essentially unaffected (unless it isin combination with the other Boolean operations).For structured queries, the scope is always the whole attributeor file..TP.B \-xThe pattern must match the whole line.(This option is translated to -w when the index is searchedand it is used only when the actual text is searched.It is of limited use in glimpse.).TP.B \-X(from version 4.0B1 only) Output the names of files thatcontain a match even if these files have been deleted since theindex was built.Without this option glimpse will simply ignore these files..TP.B \-yDo not prompt.Proceed with the match as if the answer to any prompt is y.Servers (or any other scripts) using glimpse will probably wantto use this option..TP.B \-Y \fIk\fPIf the index was constructed with the -t option, then \-Y xwill output only matches to files that were created ormodified within the last x days.There are no major performance penalties for this option..TP.B \-zAllow customizable filtering, using the file .glimpse_filters to perform the programs listed there for each match.  The best example iscompress/decompress.  If .glimpse_filters include the line.br*.Z   uncompress <.br(separated by tabs)then before indexing any file that matches the pattern "*.Z" (samesyntax as the one for .glimpse_exclude) the command listed isexecuted first (assuming input is from stdin, which is why uncompressneeds <) and its output (assuming it goes to stdout) is indexed.The file itself is not changed (i.e., it stays compressed).Then if glimpse -z is used, the same program is used on these fileson the fly.  Any program can be used (we run 'exec').  For example,one can filter out parts of files that should not be indexed.Glimpseindex tries to apply all filters in .glimpse_filters in theorder they are given.For example, if you want to uncompress a file and then extractsome part of it, put the compression command (the example above)first and then another line that specifies the extraction.Note that this can slow down the search because the filters need tobe run before files are searched.(See also glimpseindex.).TP.B \-ZNo op.  (It's useful for glimpse's internals. Trust us.).LPThe characters.RB ` $ ',.RB `^ ',.RB ` \(** ',.RB ` [ ' ,.RB ` ] ' ,.RB ` \s+2^\s0 ',.RB ` | ',.RB ` ( ',.RB ` ) ',.RB ` ! ',and.RB ` \e 'can cause unexpected results when included in the.IR pattern ,as these characters are also meaningfulto the shell.  To avoid these problems, enclose the entirepattern in single quotes, i.e., 'pattern'.Do not use double quotes (")..ne 4.SH PATTERNS.LP\fIglimpse\fP supports a large variety of patterns, including simplestrings, strings with classes of characters, sets of strings, wild cards, and regular expressions (see LIMITATIONS)..TP\fBStrings   \fPStrings are any sequence of characters, including the special symbols`^' for beginning of line and `$' for end of line.The following special characters (.RB ` $ ',.RB `^ ',.RB ` \(** ',.RB ` [ ' ,.RB ` \s+2^\s0 ',.RB ` | ',.RB ` ( ',.RB ` ) ',.RB ` ! ',and.RB ` \e ') as well as the following meta characters special to glimpse (and agrep):.RB ` ; ',.RB ` , ',.RB ` # ',.RB ` < ',.RB ` > ',.RB ` - ',and.RB ` . ',should be preceded by `\\' if they are to be matched as regularcharacters.  For example, \\^abc\\\\ corresponds to the string ^abc\\,whereas ^abc corresponds to the string abc at the beginning of aline..TP\fBClasses of characters\fPa list of characters inside [] (in order) corresponds to any characterfrom the list.  For example, [a-ho-z] is any character between a and hor between o and z.  The symbol `^' inside [] complements the list.For example, [^i-n] denote any character in the character set exceptcharacter 'i' to 'n'.The symbol `^' thus has two meanings, but this is consistent withegrep.The symbol `.' (don't care) stands for any symbol (except for thenewline symbol)..TP\fBBoolean operations\fP.B Glimpse supports an `AND' operation denoted by the symbol `;' an `OR' operation denoted by the symbol `,',a limited version of a 'NOT' operation (starting at version 4.0B1)denoted by the symbol `~',or any combination.  For example,\fIglimpse 'pizza;cheeseburger'\fR will output all lines containing

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -