📄 ch16_26.htm
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 16] 16.26 Finding Text Files with findtext </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly & Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:38:03Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch16_01.htm"TITLE="16. Where Did I Put That?"><LINKREL="prev"HREF="ch16_25.htm"TITLE="16.25 Listing Files by Age and Size "><LINKREL="next"HREF="ch16_27.htm"TITLE="16.27 newer: Print the Name of the Newest File "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch16_25.htm"TITLE="16.25 Listing Files by Age and Size "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 16.25 Listing Files by Age and Size "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 16<BR>Where Did I Put That?</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch16_27.htm"TITLE="16.27 newer: Print the Name of the Newest File "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 16.27 newer: Print the Name of the Newest File "BORDER="0"></A></TD></TR></TABLE> <HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-4310">16.26 Finding Text Files with findtext </A></H2><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-17844"></A><ACLASS="indexterm"NAME="AUTOID-17847"></A><ACLASS="indexterm"NAME="AUTOID-17850"></A><ACLASS="indexterm"NAME="AUTOID-17854"></A>Some of my directories - my<SPANCLASS="link"><EMCLASS="emphasis">bin</EM> (<ACLASS="linkend"HREF="ch04_02.htm"TITLE="A bin Directory for Your Programs and Scripts ">4.2</A>)</SPAN>,for instance - have some text files (like shell scripts anddocumentation) as well as non-text files (executable binary files, compressedfiles, archives, etc.).If I'm trying to find a certain file - with<SPANCLASS="link"><EMCLASS="emphasis">grep</EM> (<ACLASS="linkend"HREF="ch27_01.htm#UPT-ART-7420"TITLE="Different Versions of grep ">27.1</A>)</SPAN>or a<SPANCLASS="link">pager (<ACLASS="linkend"HREF="ch25_03.htm"TITLE="Using more to Page Through Files ">25.3</A>, <ACLASS="linkend"HREF="ch25_04.htm"TITLE='The "less" Pager: More than "more"'>25.4</A>)</SPAN>-the non-text files can print garbage on my screen.I want some way to say "only look at the files that have text in them."</P><PCLASS="para">The <EMCLASS="emphasis">findtext</EM> shell script does that.It runs<ACLASS="indexterm"NAME="AUTOID-17863"></A><SPANCLASS="link"><EMCLASS="emphasis">file</EM> (<ACLASS="linkend"HREF="ch25_08.htm"TITLE="Finding File Types ">25.8</A>)</SPAN>to guess what's in each file.It only prints filenames of text files.</P><PCLASS="para">So, for example, instead of typing:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>egrep something *</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">I type:</P><PCLASS="para"><TABLECLASS="screen.co"BORDER="1"><TR><THVALIGN="TOP"><PRECLASS="calloutlist"><ACLASS="co"HREF="ch09_16.htm"TITLE="9.16 Command Substitution ">`...`</A> </PRE></TH><TDVALIGN="TOP"><PRECLASS="screen">% <CODECLASS="userinput"><B>egrep something `findtext *`</B></CODE></PRE></TD></TR></TABLE></P><PCLASS="para">Here's the script, then some explanation of how to set it up on your system:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen"> #!/bin/sh# PIPE OUTPUT OF file THROUGH sed TO PRINT FILENAMES FROM LINES# WE LIKE. NOTE: DIFFERENT VERSIONS OF file RETURN DIFFERENT# MESSAGES. CHECK YOUR SYSTEM WITH strings /usr/bin/file OR# cat /etc/magic AND ADAPT THIS./usr/bin/file "$@" |sed -n '/MMDF mailbox/b print/Interleaf ASCII document/b print/PostScript document/b print/Frame Maker MIF file/b print/c program text/b print/fortran program text/b print/assembler program text/b print/shell script/b print/c-shell script/b print/shell commands/b print/c-shell commands/b print/English text/b print/ascii text/b print/\[nt\]roff, tbl, or eqn input text/b print/executable .* script/b printb:prints/:<KBDCLASS="keycap">[TAB]</KBD>.*//p'</PRE></BLOCKQUOTE></P><PCLASS="para">The script is simple: It runs <EMCLASS="emphasis">file</EM> on the command-line arguments.The output of <EMCLASS="emphasis">file</EM> looks like this:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">COPY2PC: directoryEx24348: emptyFROM_consult.tar.Z: compressed data block compressed 16 bitsGET_THIS: ascii texthmo: English textmsg: English text1991.ok: [nt]roff, tbl, or eqn input text</PRE></BLOCKQUOTE></P><PCLASS="para">The output is piped to a<SPANCLASS="link"><EMCLASS="emphasis">sed</EM> (<ACLASS="linkend"HREF="ch34_24.htm"TITLE="Quick Reference: sed ">34.24</A>)</SPAN>script that selects the lines that seem to be from text files - after the<CODECLASS="literal">print</CODE> label, the script strips off everything after the filename(starting at the colon) and prints the filename.</P><PCLASS="para">Different versions of <EMCLASS="emphasis">file</EM> produce different output.Some versions also read an <EMCLASS="emphasis">/etc/magic</EM> file.To find the kinds of names your <EMCLASS="emphasis">file</EM> calls text files,use commands like:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>strings /usr/bin/file > possible</B></CODE>% <CODECLASS="userinput"><B>cat /etc/magic >> possible</B></CODE>% <CODECLASS="userinput"><B>vi possible</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">The <EMCLASS="emphasis">possible</EM> file will have a list of descriptions that <EMCLASS="emphasis">strings</EM>found in the <EMCLASS="emphasis">file</EM> binary; some of them are for text files.If your system has an <EMCLASS="emphasis">/etc/magic</EM> file, it will have lines like these:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">0 long 0x1010101 MMDF mailbox0 string <!OPS Interleaf ASCII document0 string %! PostScript document0 string <MIFFile Frame Maker MIF file</PRE></BLOCKQUOTE></P><PCLASS="para"> Save the descriptions of text-type files from the right-hand column.</P><PCLASS="para">Then, turn each line of your edited <EMCLASS="emphasis">possible</EM> file into a <EMCLASS="emphasis">sed</EM> command:</P><PCLASS="para"><TABLECLASS="screen.co"BORDER="1"><TR><THVALIGN="TOP"><PRECLASS="calloutlist"><ACLASS="co"HREF="ch34_19.htm"TITLE="34.19 Making Edits Everywhere Except... ">b print</A> </PRE></TH><TDVALIGN="TOP"><PRECLASS="screen"><CODECLASS="userinput"><B>/</B></CODE><CODECLASS="replaceable"><I>description</I></CODE><CODECLASS="userinput"><B>/b print</B></CODE></PRE></TD></TR></TABLE></P><PCLASS="para">Watch for special characters in the <EMCLASS="emphasis">file</EM> descriptions.I had to handle two special cases in the last two lines of the script above:</P><ULCLASS="itemizedlist"><LICLASS="listitem"><PCLASS="para">I had to change the string <CODECLASS="literal">executable %s script</CODE>from our <EMCLASS="emphasis">file</EM> command to <CODECLASS="literal">/executable .* script/b print</CODE>in the <EMCLASS="emphasis">sed</EM> script.That's because our <EMCLASS="emphasis">file</EM> command replaces <CODECLASS="literal">%s</CODE> with a namelike <CODECLASS="literal">/bin/ksh</CODE>.</P></LI><LICLASS="listitem"><PCLASS="para">Characters that <EMCLASS="emphasis">sed</EM> will treat as a regular expression, such as the brackets in <CODECLASS="literal">[nt]roff</CODE>, need to be escaped with backslashes.I used <CODECLASS="literal">\[nt\]troff</CODE> in the script.</P></LI></UL><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-17934"></A>If you have <SPANCLASS="link"><EMCLASS="emphasis">perl</EM> (<ACLASS="linkend"HREF="ch37_01.htm#UPT-ART-5560"TITLE="What We Do and Don't Tell You About Perl ">37.1</A>)</SPAN>,you can make a simpler version of this script, since <EMCLASS="emphasis">perl</EM> has a built-intest for whether or not a file is a text file.Perl picks a "text file" by checkingthe first block or so for strange control codes or metacharacters.If there are too many (more than 10%), it's not a text file.You can't tune the Perl script to, for example, skip a certain kind offile by type.But the Perl version is simpler!It looks like this:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>perl -le '-T && print while $_ = shift' *</B></CODE></PRE></BLOCKQUOTE></P><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="csh_init">csh_init</A><BR><ACLASS="programreference"HREF="examples/index.htm"TITLE="sh_init">sh_init</A><BR></TH><TDVALIGN="TOP"> If you want to put that into an<SPANCLASS="link">alias (<ACLASS="linkend"HREF="ch10_02.htm"TITLE="Aliases for Common Commands ">10.2</A>)</SPAN>,the C shell's<SPANCLASS="link">quoting problems (<ACLASS="linkend"HREF="ch47_02.htm"TITLE="C Shell Programming Considered Harmful ">47.2</A>, <ACLASS="linkend"HREF="ch08_15.htm"TITLE="Differences Between Bourne and C Shell Quoting ">8.15</A>)</SPAN>make it tough to do.Thanks to<SPANCLASS="link"><EMCLASS="emphasis">makealias</EM> (<ACLASS="linkend"HREF="ch10_08.htm"TITLE="Fix Quoting in csh Aliases with makealias and quote ">10.8</A>)</SPAN>,though, here's an alias that does the job:</TD></TR></TABLE><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">alias findtext 'perl -le '\''-T && print while $_ = shift'\'' *'</PRE></BLOCKQUOTE></P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">JP</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch16_25.htm"TITLE="16.25 Listing Files by Age and Size "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 16.25 Listing Files by Age and Size "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch16_27.htm"TITLE="16.27 newer: Print the Name of the Newest File "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 16.27 newer: Print the Name of the Newest File "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">16.25 Listing Files by Age and Size </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">16.27 newer: Print the Name of the Newest File </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed & awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -