📄 agrep.1
字号:
.TH AGREP l "Jan 17, 1992".SH NAMEagrep \- search a file for a string or regular expression, with approximate matching capabilities.SH SYNOPSIS.B agrep[.B \-#cdehiklnpstvwxBDGIS].I pattern [ -f.I patternfile ][.IR filename ".\|.\|. ]" .SH DESCRIPTION.B agrepsearches the input.IR filenames(standard input is the default, but see a warning under LIMITATIONS) for records containing strings which either \fIexactly\fP or \fIapproximately\fP match a pattern. A record is by default a line, but it can be defined differently usingthe -d option (see below).Normally, each record found is copied to the standard output.Approximate matching allows finding records that contain the pattern with several errors including substitutions, insertions, anddeletions.For example, Massechusets matches Massachusetts with two errors(one substitution and one insertion). Running .B agrep-2 Massechusets foo outputs all lines in foo containing any string withat most 2 errors from Massechusets..LP.B agrep supports many kinds of queries including arbitrary wild cards, sets of patterns, and in general,regular expressions.See PATTERNS below.It supports most of the options supported by the .B grepfamily plus several more (but it is not 100% compatible with grep).For more information on the algorithms used by agrep seeWu and Manber, "Fast Text Searching With Errors,"Technical report #91-11, Department of Computer Science, Universityof Arizona, June 1991 (available by anonymous ftp from cs.arizona.eduin agrep/agrep.ps.1), andWu and Manber,"Agrep -- A Fast Approximate Pattern Searching Tool",To appear in USENIX Conference 1992 January (available by anonymous ftpfrom cs.arizona.edu in agrep/agrep.ps.2)..LPAs with the rest of the \fBgrep\fP family, the characters.RB ` $ ',.RB `^ ',.RB ` \(** ',.RB ` [ ' ,.RB ` ] ' ,.RB ` \s+2^\s0 ',.RB ` | ',.RB ` ( ',.RB ` ) ',.RB ` ! ',and.RB ` \e 'can cause unexpected results when included in the.IR pattern ,as these characters are also meaningfulto the shell. To avoid these problems, one should always enclose the entirepattern argument in single quotes, i.e., 'pattern'.Do not use double quotes (")..LPWhen.B agrepis applied to more than one inputfile, the name of the file is displayedpreceding each line which matchesthe pattern. The filename is not displayedwhen processing a singlefile, so if you actually want the filenameto appear, use.B /dev/nullas a second file in the list..SH OPTIONS.TP.B \-\fI#\fP\fI#\fP is a non-negative integer (at most 8)specifying the maximum number of errorspermitted in finding the approximate matches (defaults to zero).Generally, each insertion, deletion, or substitution counts as one error.It is possible to adjust the relative cost of insertions,deletions and substitutions (see -I -D and -S options)..TP.B \-cDisplay only the count of matching records..TP.B \-d "'\fIdelim\fP'"Define \fIdelim\fP to be the separator between two records.The default value is '$', namely a record is by defaulta line.\fIdelim\fP can be a string of size at most 8(with possible use of ^ and $), but nota regular expression.Text between two \fIdelim\fP's, before the first \fIdelim\fP,and after the last \fIdelim\fP is considered as one record.For example, -d '$$' defines paragraphs as records and -d '^From\ 'defines mail messages as records..B agrepmatches each record separately.This option does not currently work with regular expressions..TP.BI \-e " pattern"Same as a simple.I patternargument, but useful when the.I patternbegins with a.RB ` \- '..TP.BI \-f " patternfile".I patternfile contains a set of (simple) patterns.The output is all lines that match at least one of the patterns in .I patternfile.Currently, the \-f option works only for exact match and for simplepatterns (any meta symbol is interpreted as a regular character);it is compatible only with \-c, \-h, \-i, \-l, \-s, \-v, \-w, and \-x options.see LIMITATIONS for size bounds..TP.B \-hDo not display filenames..TP.B \-iCase-insensitive search \(em e.g., "A" and "a" are considered equivalent..TP.B \-kNo symbol in the pattern is treated as a meta character. For example, agrep -k 'a(b|c)*d' foo will find the occurrences of a(b|c)*d in foo whereas agrep 'a(b|c)*d' foowill find substrings in foo that match the regular expression 'a(b|c)*d'..TP.B \-lList only the files that contain a match.This option is useful for looking for files containing a certain pattern.For example, " agrep -l 'wonderful' * " will list the names of thosefiles in current directory that contain the word 'wonderful'..TP.B \-nEach line that is printed is prefixed by its record number in the file..TP.B \-pFind records in the text that contain a supersequence of the pattern.For example,\fB agrep \-p DCS foowill match "Department of Computer Science.".TP.B \-s Work silently, that is, display nothing except error messages.This is useful for checking the error status..TP.B \-t Output the record starting from the end of .I delimto (and including) the next.I delim.This is useful for cases where .I delimshould come at the end of the record..TP.B \-vInverse mode \(em display only those records that.I do notcontain the pattern..TP.B \-wSearch for the pattern as a word \(em i.e., surrounded by non-alphanumericcharacters. The non-alphanumeric .B mustsurround the match; they cannot be counted as errors.For example, .B agrep-w -1 car will match cars, but not characters..TP.B \-xThe pattern must match the whole line..TP.B \-yUsed with \-B option. When \-y is on, agrep will always output the best matches without giving a prompt..TP.B \-BBest match mode.When \-B is specified and no exact matches are found, agrepwill continue to search until the closest matches (i.e., the oneswith minimum number of errors)are found, at which point the following message will be shown:"the best match contains x errors, there are y matches, output them? (y/n)"The best match mode is not supported for standard input, e.g.,pipeline input.When the \-#, \-c, or \-l options are specified, the \-B option is ignored.In general, \-B may be slower than \-#, but not by very much..TP.B \-D\fIk\fPSet the cost of a deletion to \fIk\fP (\fIk\fP is a positive integer).This option does not currently work with regular expressions..TP.B \-GOutput the files that contain a match..TP.B \-I\fIk\fPSet the cost of an insertion to \fIk\fP (\fIk\fP is a positive integer).This option does not currently work with regular expressions..TP.B \-S\fIk\fPSet the cost of a substitution to \fIk\fP (\fIk\fP is a positive integer).This option does not currently work with regular expressions..ne 4.SH PATTERNS.LP\fIagrep\fP supports a large variety of patterns, including simplestrings, strings with classes of characters, sets of strings, wild cards, and regular expressions..TP\fBStrings\fPany sequence of characters, including the special symbols`^' for beginning of line and `$' for end of line.The special characters listed above (.RB ` $ ',.RB `^ ',.RB ` \(** ',.RB ` [ ' ,.RB ` \s+2^\s0 ',.RB ` | ',.RB ` ( ',.RB ` ) ',.RB ` ! ',and.RB ` \e ') should be preceded by `\\' if they are to be matched as regularcharacters. For example, \\^abc\\\\ corresponds to the string ^abc\\,whereas ^abc corresponds to the string abc at the beginning of aline..TP\fBClasses of characters\fPa list of characters inside [] (in order) corresponds to any characterfrom the list. For example, [a-ho-z] is any character between a and hor between o and z. The symbol `^' inside [] complements the list.For example, [^i-n] denote any character in the character set exceptcharacter 'i' to 'n'.The symbol `^' thus has two meanings, but this is consistent withegrep.The symbol `.' (don't care) stands for any symbol (except for thenewline symbol)..TP\fBBoolean operations\fP.B agrep supports an `and' operation `;' and an `or' operation `,',but not a combination of both. For example, 'fast;network' searchesfor all records containing both words..TP\fBWild cards\fPThe symbol '#' is used to denote a wild card. # matches zero or anynumber of arbitrary characters. For example, ex#e matches example. The symbol # is equivalent to .* in egrep.In fact, .* will work too, because it is a valid regular expression(see below), but unless this is part of an actual regular expression,# will work faster. .TP\fBCombination of exact and approximate matching\fPany pattern inside angle brackets <> must match the text exactly evenif the match is with errors. For example, <mathemat>ics matchesmathematical with one error (replacing the last s with an a), butmathe<matics> does not match mathematical no matter how many errors weallow..TP\fBRegular expressions\fPThe syntax of regular expressions in \fBagrep\fP is in general the same asthat for \fBegrep\fP. The union operation `|', Kleene closure `*',and parentheses () are all supported.Currently '+' is not supported.Regular expressions are currently limited to approximately 30characters (generally excluding meta characters). Some options(\-d, \-w, \-f, \-t, \-x, \-D, \-I, \-S) do not currently work with regular expressions.The maximal number of errors for regular expressions that use '*'or '|' is 4..SH EXAMPLES.LP.TPagrep -2 -c ABCDEFG foo gives the number of lines in file foo that contain ABCDEFGwithin two errors..TPagrep -1 -D2 -S2 'ABCD#YZ' foooutputs the lines containing ABCD followed, within arbitrarydistance, by YZ, with up to one additional insertion(-D2 and -S2 make deletions and substitutions too "expensive")..TPagrep -5 -p abcdefghij /usr/dict/wordsoutputs the list of all words containing at least 5 of the first 10letters of the alphabet \fIin order\fR. (Try it: any list startingwith academia and ending with sacrilegious must mean something!).TP agrep -1 'abc[0-9](de|fg)*[x-z]' foooutputs the lines containing, within up to one error, the stringthat starts with abc followed by one digit, followed by zero or morerepetitions of either de or fg, followed by either x, y, or z..TPagrep -d '^From\ ' 'breakdown;internet' mboxoutputs all mail messages (the pattern '^From\ ' separates mail messagesin a mail file) that contain keywords 'breakdown' and 'internet'..TPagrep -d '$$' -1 '<word1> <word2>' foofinds all paragraphs that contain word1 followed by word2 with oneerror in place of the blank. In particular, if word1 is the last word in a line and word2is the first word in the next line, then the space will besubstituted by a newline symbol and it will match.Thus, this is a way to overcome separation by a newline.Note that -d '$$' (or another delim which spans more than one line)is necessary, because otherwise agrep searchesonly one line at a time..TPagrep '^agrep' <this manual>outputs all the examples of the use of agrep in this man pages..PD.SH "SEE ALSO".BR ed (1),.BR ex (1),.BR grep (1V),.BR sh (1),.BR csh (1)..SH BUGS/LIMITATIONSAny bug reports or comments will be appreciated! Please mail them to sw@cs.arizona.edu or udi@cs.arizona.edu.LPRegular expressions do not support the '+' operator (match 1 or moreinstances of the preceding token). These can be searched for by usingthis syntax in the pattern:.sp.in 1.0i\&'\fIpattern\fB(\fIpattern\fB)*\fR'.in.sp(search for strings containing one instance of the pattern, followed by 0 ormore instances of the pattern)..LPThe following can cause an infinite loop:.B agreppattern * > output_file.If the number of matches is high, they may be deposited inoutput_file before it is completely read leading to more matches ofthe pattern within output_file (the matches are against the wholedirectory). It's not clear whether this is a "bug" (grep will do thesame), but be warned..LPThe maximum size of the .I patternfileis limited to be 250Kb, and the maximum number of patternsis limited to be 30,000..LPStandard input is the default if no input file is given.However, if standard input is keyed in directly (as opposed to througha pipe, for example) agrep may not work for some non-simple patterns..LPThere is no size limit for simple patterns.More complicated patterns are currently limited to approximately 30 characters.Lines are limited to 1024 characters.Records are limited to 48K, and may be truncated if they are largerthan that.The limit of record length can be changed by modifying the parameter Max_record in agrep.h..SH DIAGNOSTICSExit status is 0 if any matches are found,1 if none, 2 for syntax errors or inaccessible files..SH AUTHORSSun Wu and Udi Manber, Department of Computer Science, University ofArizona, Tucson, AZ 85721. {sw|udi}@cs.arizona.edu.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -