⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch35_10.htm

📁 the unix power tools
💻 HTM
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 35] 35.10 Splitting Files by Context: csplit </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly &amp; Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:48:09Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch35_01.htm"TITLE="35. You Can't Quite Call This Editing"><LINKREL="prev"HREF="ch35_09.htm"TITLE="35.9 Splitting Files at Fixed Points: split "><LINKREL="next"HREF="ch35_11.htm"TITLE="35.11 Hacking on Characters with tr "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch35_09.htm"TITLE="35.9 Splitting Files at Fixed Points: split "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 35.9 Splitting Files at Fixed Points: split "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 35<BR>You Can't Quite Call This Editing</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch35_11.htm"TITLE="35.11 Hacking on Characters with tr "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 35.11 Hacking on Characters with tr "BORDER="0"></A></TD></TR></TABLE>&nbsp;<HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-2890">35.10 Splitting Files by Context: csplit </A></H2><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="csplit">csplit</A><BR></TH><TDVALIGN="TOP"><ACLASS="indexterm"NAME="UPT-ART-2890-IX-CSPLIT-PROGRAM"></A>Like<SPANCLASS="link"><EMCLASS="emphasis">split</EM> (<ACLASS="linkend"HREF="ch35_09.htm"TITLE="Splitting Files at Fixed Points: split ">35.9</A>)</SPAN>,<EMCLASS="emphasis">csplit</EM> lets you break a file into smaller pieces,but <EMCLASS="emphasis">csplit</EM> (context split) also allows the file to be broken intodifferent-sized pieces, according to context. With <EMCLASS="emphasis">csplit</EM>,you give the locations (line numbers or search patterns)at which to break each section.<EMCLASS="emphasis">csplit</EM> comes with System V, but there are alsofreely available versions.</TD></TR></TABLE><PCLASS="para">Let's look at search patterns first.Suppose you have an outline consisting of three main sections. You couldcreate a separate file for each section by typing:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit outline /I./ /II./ /III./</B></CODE>28 <ICLASS="lineannotation">number of characters in each file</I>415                   .372                   .554                   .% <CODECLASS="userinput"><B>ls</B></CODE>outlinexx00   <ICLASS="lineannotation"> outline title, etc.</I>xx01   <ICLASS="lineannotation"> Section I</I>xx02   <ICLASS="lineannotation"> Section II</I>xx03   <ICLASS="lineannotation"> Section III</I></PRE></BLOCKQUOTE></P><PCLASS="para">This command creates four new files (<EMCLASS="emphasis">outline</EM> remains intact).<EMCLASS="emphasis">csplit</EM> displays the character counts for each file. Note thatthe first file (<EMCLASS="emphasis">xx00</EM>) contains any text up to <EMCLASS="emphasis">but not including</EM> the first pattern, and that <EMCLASS="emphasis">xx01</EM> contains the first section, as you'dexpect. This is why the naming scheme begins with <EMCLASS="emphasis">00</EM>.(Even if <EMCLASS="emphasis">outline</EM> had begun immediately with a <CODECLASS="literal">I.</CODE>,<EMCLASS="emphasis">xx01</EM> would still contain Section I, but <EMCLASS="emphasis">xx00</EM> would be empty in thiscase.)</P><PCLASS="para">If you don't want to save the text that occurs before a specified pattern,use a percent sign as the pattern delimiter:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit outline %I.% /II./ /III./</B></CODE>415372554% <CODECLASS="userinput"><B>ls</B></CODE>outlinexx00 <ICLASS="lineannotation"> Section I</I>xx01 <ICLASS="lineannotation"> Section II</I>xx02 <ICLASS="lineannotation"> Section III</I></PRE></BLOCKQUOTE></P><PCLASS="para">The preliminary text file has been suppressed, andthe created files now begin where the actual outline starts (the filenumbering is off, however).</P><PCLASS="para">Let's make some further refinements. We'll use the <EMCLASS="emphasis">-s</EM> option to suppress the display of the character counts, and we'll use the <EMCLASS="emphasis">-f</EM> option to specify a file prefix other than the conventional <EMCLASS="emphasis">xx</EM>:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit -s -f part. outline /I./ /II./ /III./</B></CODE>% <CODECLASS="userinput"><B>ls</B></CODE>outlinepart.00part.01part.02part.03</PRE></BLOCKQUOTE></P><PCLASS="para">There's still a slight problem though. In search patterns, a period is a <SPANCLASS="link">metacharacter (<ACLASS="linkend"HREF="ch26_10.htm"TITLE="Pattern Matching Quick Reference with Examples ">26.10</A>)</SPAN>that matches any single character, so the pattern <CODECLASS="literal">/I./</CODE> may inadvertentlymatch words like <EMCLASS="emphasis">Introduction</EM>. We need to escape the period with abackslash; however, the backslash has meaning both to the pattern and tothe shell, so in fact, we need either to use a double backslash or to surroundthe pattern in<SPANCLASS="link">quotes (<ACLASS="linkend"HREF="ch08_14.htm"TITLE="Bourne Shell Quoting ">8.14</A>)</SPAN>.A subtlety, yes, but one that can drive you crazyif you don't remember it. Our command line becomes:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit -s -f part. outline &quot;/I\./&quot; /II./ /III./</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">You can also break a file at repeated occurrences of the same pattern.Let's say you have a file that describes 50 ways to cook a chicken,and you want each method stored in a separate file. Each section beginswith headings <EMCLASS="emphasis">WAY #1</EM>, <EMCLASS="emphasis">WAY #2</EM>, and so on. To divide the file,use <EMCLASS="emphasis">csplit</EM>'s repeat argument:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit -s -f cook. fifty_ways /^WAY/ &quot;{49}&quot;</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">This command splits the file at the first occurrence of <EMCLASS="emphasis">WAY</EM>,and the number in braces tells <EMCLASS="emphasis">csplit</EM> to repeat the split 49 more times.Note that a caret is used to match the beginning of the line and thatthe C shell requires quotes around the<SPANCLASS="link">braces (<ACLASS="linkend"HREF="ch09_05.htm"TITLE="Build Strings with {&nbsp;} ">9.5</A>)</SPAN>.The command has created50 files:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>ls cook.*</B></CODE>cook.00cook.01  ...cook.48cook.49</PRE></BLOCKQUOTE></P><PCLASS="para">Quite often, when you want to split a file repeatedly,you don't know or don't care how many files will be created; you just want to make sure that the necessary number of splits takes place.In this case, it makes sense to specify a repeat count that is slightly higherthan what you need (maximum is 99). Unfortunately, if you tell <EMCLASS="emphasis">csplit</EM> tocreate more files than it's able to, this produces an &quot;out of range&quot; error.Furthermore, when <EMCLASS="emphasis">csplit</EM> encounters an error, it exits by removing any files it created along the way. (A bug, if you ask me.) This is where the<EMCLASS="emphasis">-k</EM> option comes in.Specify <EMCLASS="emphasis">-k</EM> to <EMCLASS="emphasis">k</EM>eep the files around, even when the &quot;out of range&quot;message occurs.</P><PCLASS="para"><EMCLASS="emphasis">csplit</EM> allows you to break a file at some number of lines above or belowa given search pattern. For example, to break a file at the line that is five lines below the one containing <EMCLASS="emphasis">Sincerely,</EM> you could type:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit -s -f letter. all_letters /Sincerely/+5</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">This situation might arise if you have a series of business lettersstrung together in one file. Each letter begins differently, but each one begins five lines after the previous letter's <EMCLASS="emphasis">Sincerely</EM> line.Here's another example, adapted from AT&amp;T's UNIX <EMCLASS="emphasis">User's Reference Manual</EM>:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit -s -k -f routine. prog.c '%main(%' '/^}/+1' '{99}'</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">The idea is that the file <EMCLASS="emphasis">prog.c</EM> contains a group of C routines,and we want to place each one in a separate file (<EMCLASS="emphasis">routine.00</EM>, <EMCLASS="emphasis">routine.01</EM>, etc.). The first pattern uses <CODECLASS="literal">%</CODE>because we want to discard anything before <EMCLASS="emphasis">main</EM>. The next argumentsays, &quot;Look for a closing brace at the beginning of a line (the conventionalend of a routine) and split on the following line (the assumed beginning ofthe next routine).&quot; Repeat this split up to 99 times, using <EMCLASS="emphasis">-k</EM> to preserve the created files.[4]</P><BLOCKQUOTECLASS="footnote"><PCLASS="para">[4] In this case, the repeat can actually occur only 98 times, since we've already specified two arguments and the maximum number is 100.</P></BLOCKQUOTE><PCLASS="para">The <EMCLASS="emphasis">csplit</EM> command takes line-number arguments in addition to patterns.You can say:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit stuff 50 373 955</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">to create files split at some arbitrary line numbers.In that example, the new file <EMCLASS="emphasis">xx00</EM>will have lines 1-49 (49 lines total), <EMCLASS="emphasis">xx01</EM> will have lines50-372 (323 lines total), <EMCLASS="emphasis">xx02</EM> will have lines 373-954(582 lines total), and <EMCLASS="emphasis">xx03</EM> will hold the rest of <EMCLASS="emphasis">stuff</EM>.</P><PCLASS="para"><EMCLASS="emphasis">csplit</EM> works like <EMCLASS="emphasis">split</EM> if you repeat the argument. The command:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>csplit top_ten_list 10 &quot;{18}&quot;</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">breaks the list into 19 segments of 10 lines each.[5]<ACLASS="indexterm"NAME="AUTOID-40665"></A><ACLASS="indexterm"NAME="AUTOID-40666"></A></P><BLOCKQUOTECLASS="footnote"><PCLASS="para">[5] Not really. The first file contains only nine lines (1-9); the restcontain 10. In this case, you're better off saying <CODECLASS="literal">split&nbsp;-10&nbsp;top_ten_list</CODE>.</P></BLOCKQUOTE><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">DG</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch35_09.htm"TITLE="35.9 Splitting Files at Fixed Points: split "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 35.9 Splitting Files at Fixed Points: split "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch35_11.htm"TITLE="35.11 Hacking on Characters with tr "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 35.11 Hacking on Characters with tr "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">35.9 Splitting Files at Fixed Points: split </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">35.11 Hacking on Characters with tr </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed &amp; awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -