📄 ch27_08.htm
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 27] 27.8 glimpse and agrep </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly & Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:44:23Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch27_01.htm"TITLE="27. Searching Through Files"><LINKREL="prev"HREF="ch27_07.htm"TITLE="27.7 grepping for a List of Patterns "><LINKREL="next"HREF="ch27_09.htm"TITLE="27.9 New greps Are Much Faster "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch27_07.htm"TITLE="27.7 grepping for a List of Patterns "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 27.7 grepping for a List of Patterns "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 27<BR>Searching Through Files</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch27_09.htm"TITLE="27.9 New greps Are Much Faster "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 27.9 New greps Are Much Faster "BORDER="0"></A></TD></TR></TABLE> <HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-7350">27.8 glimpse and agrep </A></H2><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="glimpse">glimpse</A><BR></TH><TDVALIGN="TOP"><ACLASS="indexterm"NAME="AUTOID-30228"></A><ACLASS="indexterm"NAME="AUTOID-30230"></A><EMCLASS="emphasis">Glimpse</EM> is an indexing and query system that lets you search hugeamounts of text (for example, all of your files) very quickly.For example, if you're looking for the word <EMCLASS="emphasis">something</EM>, just type<CODECLASS="literal">glimpse something</CODE>; all matching lines will appear with thefilename at the start.</TD></TR></TABLE><PCLASS="para">Before you use <EMCLASS="emphasis">glimpse</EM>, you need to index your files by running<EMCLASS="emphasis">glimpseindex</EM>.You'll probably want to run it every night from<SPANCLASS="link"><EMCLASS="emphasis">cron</EM> (<ACLASS="linkend"HREF="ch40_12.htm"TITLE="Periodic Program Execution: The cron Facility ">40.12</A>)</SPAN>.So, your searches will miss files that have been added since the last<EMCLASS="emphasis">glimpseindex</EM> run.But, other than that problem (which can't be avoided in an indexed systemlike this), <EMCLASS="emphasis">glimpse</EM> is fantastic - especially because it's(usually) so fast.</P><PCLASS="para">The speed depends on the size of the index file you build: a biggerindex makes the searches faster.But even with the smallest index file, I can search my entire 70-Megabyteemail archive, on a fairly slow workstation, in less than 30 seconds.With faster CPUs and disks, the search could be much quicker.One weakness is in search patterns that could match many files, whichcan take a lot of time to do:<EMCLASS="emphasis">glimpse</EM> will print a warning and ask if you want to continue the search.(After <EMCLASS="emphasis">glimpse</EM> checks its index for possible matches, it runs<EMCLASS="emphasis">agrep</EM> on the possibly matching files to check and get theexactly matching records.)</P><PCLASS="para"><EMCLASS="emphasis">agrep</EM> is oneof the nicer additions to the <EMCLASS="emphasis">grep</EM> family.It's not only one of the faster greps around, it has the unique feature thatit will look for approximate matches.It's also record-orientedrather than line-oriented.<EMCLASS="emphasis">Glimpse</EM> calls <EMCLASS="emphasis">agrep</EM>, but you can also use <EMCLASS="emphasis">agrep</EM>without using <EMCLASS="emphasis">glimpse</EM>.The three most significant features of <EMCLASS="emphasis">agrep</EM> that are not supported bythe <EMCLASS="emphasis">grep</EM> family are:</P><OLCLASS="orderedlist"><LICLASS="listitem"><PCLASS="para">The ability to search for approximate patterns, with a user-definablelevel of accuracy.For example, </P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>agrep -2 homogenos foo</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">will find "homogeneous" as well as any other word that can be obtained from "homogenos" with at most 2 substitutions, insertions, or deletions.</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>agrep -B homogenos foo</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para"> will generate a message of the form:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">best match has 2 errors, there are 5 matches, output them? (y/n)</PRE></BLOCKQUOTE></P><PCLASS="para"></P></LI><LICLASS="listitem"><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-30274"></A><ACLASS="indexterm"NAME="AUTOID-30278"></A><ACLASS="indexterm"NAME="AUTOID-30280"></A><EMCLASS="emphasis">agrep</EM> is record-oriented rather than just line-oriented; a recordis by default a line, but it can be user-defined with the <EMCLASS="emphasis">-d</EM>option specifying a pattern that will be used as a record delimiter.For example, </P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>agrep -d '^From ' 'pizza' mbox</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">outputs all<SPANCLASS="link">mail messages (<ACLASS="linkend"HREF="ch01_33.htm"TITLE="UNIX Networking and Communications ">1.33</A>)</SPAN>(delimited by a line beginning with <EMCLASS="emphasis">From</EM> and a space)in the file <EMCLASS="emphasis">mbox</EM>that contain the keyword <EMCLASS="emphasis">pizza</EM>.Another example: </P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>agrep -d '$$' </B></CODE><CODECLASS="replaceable"><I>pattern</I></CODE><CODECLASS="userinput"><B> foo</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">will output allparagraphs (separated by an empty line) that contain <EMCLASS="emphasis">pattern</EM>.</P></LI><LICLASS="listitem"><PCLASS="para"><EMCLASS="emphasis">agrep</EM> allowsmultiple patterns with AND (or OR) logic queries.For example, </P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>agrep -d '^From ' 'burger,pizza' mbox</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">outputs all mail messages containing at least one of the two keywords (<CODECLASS="literal">,</CODE> stands for OR).</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>agrep -d '^From ' 'good;pizza' mbox</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">outputs all mail messagescontaining both keywords.</P></LI></OL><PCLASS="para">Putting these options together one can write queries like:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>agrep -d '$$' -2 '<CACM>;</B></CODE><CODECLASS="replaceable"><I>TheAuthor</I></CODE><CODECLASS="userinput"><B>;Curriculum;<198[5-9]>' bib</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">which outputs all paragraphs referencing articles in CACM between 1985 and 1989 by <EMCLASS="emphasis">TheAuthor</EM> dealing with Curriculum.Two errors are allowed, but they cannot be in either CACM or the year.(The <> brackets forbid errors in the pattern between them.)</P><PCLASS="para">Other <EMCLASS="emphasis">agrep</EM> features include searching for regular expressions (with orwithout errors), unlimited wildcards, limiting the errors to only insertions or only substitutions or any combination, allowing each deletion, for example, to be counted as, say, 2 substitutions or 3 insertions, restricting parts of the query to be exact and parts to be approximate, and many more.</P><PCLASS="para">Email <EMCLASS="emphasis">glimpse-request@cs.arizona.edu</EM> to be added to the<EMCLASS="emphasis">glimpse</EM> mailing list.Email <EMCLASS="emphasis">glimpse@cs.arizona.edu</EM> to report bugs, ask questions,discuss tricks for using glimpse, etc.(This is a moderated mailing list with very little traffic,mostly announcements.)</P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">JP</SPAN>, <SPANCLASS="authorinitials">SW, UM</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch27_07.htm"TITLE="27.7 grepping for a List of Patterns "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 27.7 grepping for a List of Patterns "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch27_09.htm"TITLE="27.9 New greps Are Much Faster "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 27.9 New greps Are Much Faster "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">27.7 grepping for a List of Patterns </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">27.9 New greps Are Much Faster </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed & awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -