📄 unx06.htm
字号:
<HTML>
<HEAD>
<TITLE>UNIX Unleashed unx06.htm</TITLE>
<LINK REL="ToC" HREF="index.htm">
<LINK REL="Next" HREF="unx07.htm">
<LINK REL="Previous" HREF="unx05.htm"></HEAD>
<BODY TEXT="#000000" LINK="#0000FF" VLINK="#800080" bgcolor=white>
<P><A HREF="unx05.htm"><IMG SRC="bluprev.gif" WIDTH = 32 HEIGHT = 32 BORDER = 0 ALT="Previous Page"></A>
<A HREF="index.htm"><IMG SRC="blutoc.gif" WIDTH = 32 HEIGHT = 32 BORDER = 0 ALT="TOC"></A>
<A HREF="unx07.htm"><IMG SRC="blunext.gif" WIDTH = 32 HEIGHT = 32 BORDER = 0 ALT="Next Page"></A>
<A HREF="index.htm"><IMG SRC="bluprev.gif" WIDTH = 32 HEIGHT = 32 BORDER = 0 ALT="Home"></A>
</P><UL>
<LI>
<A HREF="#I1">6 — Popular File Tools</A></LI>
<UL>
<UL>
<UL>
<UL>
<LI>
<A HREF="#I3">By Pete Holsberg</A></LI></UL></UL>
<LI>
<A HREF="#I4">Determing the Nature of a File's Contents with file</A></LI>
<LI>
<A HREF="#I5">Browsing Through Text Files with more (page), and pg</A></LI>
<LI>
<A HREF="#I6">Searching for Stringswith the grep Family</A></LI>
<UL>
<LI>
<A HREF="#I7">Regular Expressions</A></LI>
<UL>
<LI>
<A HREF="#I8">Regular Expression Characters</A></LI>
<LI>
<A HREF="#I9">A Regular Expression with No Special Characters</A></LI>
<LI>
<A HREF="#I10">Special Characters</A></LI>
<LI>
<A HREF="#I11">Regular Expression Examples</A></LI>
<LI>
<A HREF="#I12">The grep Command</A></LI>
<LI>
<A HREF="#I13">The egrep Command</A></LI>
<LI>
<A HREF="#I14">The fgrep Command</A></LI></UL></UL>
<LI>
<A HREF="#I15">Compressing Files—compress, uncompress, and zcat</A></LI>
<UL>
<LI>
<A HREF="#I16">Printing with pr</A></LI></UL>
<LI>
<A HREF="#I17">Printing Hard Copy Output</A></LI>
<UL>
<LI>
<A HREF="#I18">Requesting To Print</A></LI>
<UL>
<LI>
<A HREF="#I19">The lp Command</A></LI>
<LI>
<A HREF="#I20">The cancel Command</A></LI></UL>
<LI>
<A HREF="#I21">Getting Printer and Print Request Status</A></LI></UL>
<LI>
<A HREF="#I22">Comparing Directories with dircmp</A></LI>
<LI>
<A HREF="#I23">Encrypting a File with the crypt Command</A></LI>
<LI>
<A HREF="#I24">Printing the Beginning or End of a File with head and tail</A></LI>
<LI>
<A HREF="#I25">Pipe Fitting with tee</A></LI>
<LI>
<A HREF="#I26">Updating a File's Time and Date with touch</A></LI>
<LI>
<A HREF="#I27">Splitting Files with split and csplit</A></LI>
<LI>
<A HREF="#I28">Comparing Files with cmp and diff</A></LI>
<UL>
<LI>
<A HREF="#I29">The cmp Command</A></LI>
<LI>
<A HREF="#I30">The dif command</A></LI></UL>
<LI>
<A HREF="#I31">Summary</A></LI></UL></UL></UL>
<H1 ALIGN="CENTER">
<CENTER><A ID="I1" NAME="I1">
<BR>
<FONT SIZE=5><A ID="I2" NAME="I2"></A><B>6 — Popular File Tools</B>
<BR></FONT></A></CENTER></H1>
<H5 ALIGN="CENTER">
<CENTER><A ID="I3" NAME="I3">
<FONT SIZE=3><B>By Pete Holsberg</B>
<BR></FONT></A></CENTER></H5>
<P>Files are the heart of UNIX. Unlike most other operating systems, UNIX was designed with a simple, yet highly sophisticated, view of files: Everything is a file. Information stored in an area of a disk or memory is a file; a directory is a file; the
keyboard is a file; the screen is a file.
<BR></P>
<P>This single-minded view makes it easy to write tools that manipulate files, because files have no structure—UNIX sees every file merely as a simple stream of bytes. This makes life much simpler for both the UNIX programmer and the UNIX user. The
user benefits from being able to send the contents of a file to a command without having to go through a complex process of opening the file. In a similar way, the user can capture the output of a command in a file without having previously created that
file. And perhaps most importantly, the user can send the output of one command directly to the input of another, using memory as a temporary storage device or file. Finally, users benefit from UNIX's unstructured files because they are simply easier to
use than files that must conform to one of several highly structured formats.
<BR></P>
<H3 ALIGN="CENTER">
<CENTER><A ID="I4" NAME="I4">
<FONT SIZE=4><B>Determing the Nature of a File's Contents with </B><B><I>file</I></B>
<BR></FONT></A></CENTER></H3>
<P>A user—especially a power user—must take a closer look at a file before manipulating it. If you've ever sent a binary file to a printer, you're aware of the mess that can result. Murphy's Law assures that every binary file includes a string of
bytes that does one or more of the following:
<BR></P>
<UL>
<LI>Spew a ream of paper through the printer before you can shut it off, printing just enough on each page to render the paper fit only for the recycling bin
<BR>
<BR></LI>
<LI>Put the printer into a print mode that prints all characters at 1/10 their intended size
<BR>
<BR></LI>
<LI>Lock your keyboard
<BR>
<BR></LI>
<LI>Dump core—that is, create a file consisting of whatever was in memory at that instant of time!
<BR>
<BR></LI></UL>
<P>In a similar way, sending a binary file to the screen can lock the keyboard, put the screen in a mode that changes the displayed character set to one that is clearly not English, dump core, and so on.
<BR></P>
<P>While it's true that many files already stored on the system—and certainly every file you create with a text editor (see Chapter 7)—are text files, many are not. UNIX provides a command, file, that attempts to determine the nature of the
contents of files when you supply their file names as arguments. You can invoke the file command in one of two ways:
<BR></P>
<PRE>file [-h] [-m <I>mfile</I>] [-f <I>ffile</I>] <I>arg(s)</I>
file [-h] [-m <I>mfile</I>] -f <I>ffile</I></PRE>
<P>The file command performs a series of tests on each file in the list of <I>arg(s)</I> or on the list of files whose names are contained in the file <I>ffile</I>. If the file being tested is a text file, file examines the first 512 bytes and tries to
determine the language in which it is written. The identification is worded by means of the contents of a file called /etc/magic. If you don't like what's in the file, you can use the -m mfile option, replacing mfile with the name of the "magic
file" you'd like to use. (Consult your local magician for suitable spells and potions!) Here are the kinds of text files that Unixware Version 1.0's file command can identify:
<BR></P>
<UL>
<LI>Empty files
<BR>
<BR></LI>
<LI>SCCS files
<BR>
<BR></LI>
<LI>troff (typesetter runoff) output files
<BR>
<BR></LI>
<LI>Data files
<BR>
<BR></LI>
<LI>C program text files (with or without garbage)
<BR>
<BR></LI>
<LI>FORTRAN program text files (with or without garbage)
<BR>
<BR></LI>
<LI>Assembler program text files (with or without garbage)
<BR>
<BR></LI>
<LI>[nt]roff, tbl, or eqn input text (with or without garbage)
<BR>
<BR></LI>
<LI>Command text files (with or without garbage)
<BR>
<BR></LI>
<LI>English text files (with or without garbage)
<BR>
<BR></LI>
<LI>ASCII text files (with or without garbage)
<BR>
<BR></LI>
<LI>PostScript program text files (with or without garbage)
<BR>
<BR></LI></UL>
<P>Don't be concerned if you're not familiar with some of these kinds of text. Many of them are peculiar to UNIX and are explained in later chapters.
<BR></P>
<P>If the file is not text, file looks near the beginning of the file for a magic number—a number or string that is associated with a file type; an arbitrary value that is couple with a descriptive phrase. Then file uses /etc/magic, which provides a
database of magic numbers and kinds of files, or the file specified as <I>mfile</I> to determine the file's contents. If the file being tested is a symbolic link, file follows the link and tries to determine the nature of the contents of the file to which
it is linked. The -h option causes file to ignore symbolic links.
<BR></P>
<P>The /etc/magic file contains the table of magic numbers and their meanings. For example, here is an excerpt from Unixware Version 1.0's /etc/magic file. The number following uxcore: is the magic number, and the phrase that follows is the file type. The
other columns tell file how and where to look for the magic number:
<BR></P>
<PRE>>16 short 2 uxcore:231 executable
0 string uxcore:648 expanded ASCII cpio archive
0 string uxcore:650 ASCII cpio archive
>1 byte 0235 uxcore:571 compressed data
0 string uxcore:248 current ar archive
0 short 0432 uxcore:256 Compiled Terminfo Entry
0 short 0434 uxcore:257 Curses screen image
0 short 0570 uxcore:259 vax executable
0 short 0510 uxcore:263 x86 executable
0 short 0560 uxcore:267 WE32000 executable
0 string 070701 uxcore:565 DOS executable (EXE)
0 string 070707 uxcore:566 DOS built-in
0 byte 0xe9 uxcore:567 DOS executable (COM)
0 short 0520 uxcore:277 mc68k executable
0 string uxcore:569 core file (Xenix)
0 byte 0x80 uxcore:280 8086 relocatable (Microsoft)</PRE>
<HR ALIGN=CENTER>
<NOTE>
<IMG SRC="caution.gif" WIDTH = 37 HEIGHT = 35><B>CAUTION:</B> Human beings cannot read any of the files listed in this excerpt, so you should not send any of these files to the screen or the printer. The same is true for any of the previously listed text
files that have garbage.
<BR></NOTE>
<HR ALIGN=CENTER>
<H3 ALIGN="CENTER">
<CENTER><A ID="I5" NAME="I5">
<FONT SIZE=4><B>Browsing Through Text Files with </B><B><I>more (page)</I></B><B>, and </B><B><I>pg</I></B>
<BR></FONT></A></CENTER></H3>
<P>After you identify a file as being a text file that humans can read, you may want to read it. The cat command streams the contents of a file to the screen, but you must be quick with the Scroll Lock (or equivalent) key so that the file content does not
flash by so quickly that you cannot read it (your speed-reading lessons notwithstanding). UNIX provides a pair of programs that present the contents of a file one screen at a time.
<BR></P>
<P>The more(page) programs are almost identical, and will be discussed as if they were a simple program. The only differences are the following:
<BR></P>
<UL>
<LI>page clears the screen automatically between pages, but more does not.
<BR>
<BR></LI>
<LI>more provides a two-line overlap from one screen to the next, while page provides only a one-line overlap.
<BR>
<BR></LI></UL>
<P>Both more and page have several commands, many of which take a numerical argument that controls the number of times the command is actually executed. You can issue these commands while using the more or page program (see the syntax below), and none of
these commands are echoed to the screen. Table 6.1 lists the major commands.
<BR></P>
<PRE><B>more</B> [-<B>cdflrsuw</B>] [-<I>lines</I>] [+<I>linenumber</I>] [+/<I>pattern</I>] [<I>file(s)</I>]
<B>page</B> [-<B>cdflrsuw</B>] [-<I>lines</I>] [+<I>linenumber</I>] [+/<I>pattern</I>] [<I>file(s)</I>]</PRE>
<UL>
<LH><B>Table 6.1. Commands for </B><B>more(page)</B>
<BR></LH></UL>
<TABLE BORDER>
<TR>
<TD>
<PRE><I>Command</I>
<BR></PRE>
<TD>
<PRE><I>Meaning</I>
<BR></PRE>
<TR>
<TD>
<P><I>n</I>Spacebar</P>
<TD>
<P>If no positive number is entered, display the next screenfull. If an <I>n</I> value is entered, display <I>n</I> more lines.</P>
<TR>
<TD>
<P><I>n</I>Return</P>
<TD>
<P>If no positive number is entered, display another line. If an <I>n</I> value is entered, display <I>n</I> more lines. (Depending on your keyboard, you can press either the Return or Enter key.)</P>
<TR>
<TD>
<P>n^D, nd</P>
<TD>
<P>If no positive number is entered, scroll down 11 more lines. If an n value is given, scroll the screen down n times.</P>
<TR>
<TD>
<P>nz</P>
<TD>
<P>Same as nSpacebar, except that if an n value is entered, it becomes the new number of lines per screenfull.</P>
<TR>
<TD>
<P><I>n</I>^B, <I>n</I>b</P>
<TD>
<P>Skip back <I>n</I> screensfull and then print a screenfull.</P>
<TR>
<TD>
<P>q, Q</P>
<TD>
<P>Exit from more or page.</P>
<TR>
<TD>
<P>=</P>
<TD>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -