⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch36_07.htm

📁 the unix power tools
💻 HTM
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 36] 36.7 Sorting Multiline Entries </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly &amp; Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:48:43Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch36_01.htm"TITLE="36. Sorting"><LINKREL="prev"HREF="ch36_06.htm"TITLE="36.6 Miscellaneous sort Hints "><LINKREL="next"HREF="ch36_08.htm"TITLE="36.8 lensort: Sort Lines by Length "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch36_06.htm"TITLE="36.6 Miscellaneous sort Hints "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 36.6 Miscellaneous sort Hints "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 36<BR>Sorting</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch36_08.htm"TITLE="36.8 lensort: Sort Lines by Length "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 36.8 lensort: Sort Lines by Length "BORDER="0"></A></TD></TR></TABLE>&nbsp;<HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-0043">36.7 Sorting Multiline Entries </A></H2><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-41790"></A>There's one limitation to <EMCLASS="emphasis">sort</EM>. It works a line at a time. Ifyou want to sort a file with multiline entries, you're in toughshape. For example, let's say you have a list of addresses:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">Doe, John and Jane30 Anywhere StAnytown, New York10023Buck, Jane and John40 Anywhere StNowheresville, Alaska90023</PRE></BLOCKQUOTE></P><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="chunksort">chunksort</A><BR></TH><TDVALIGN="TOP">How would you sort these? Certainly not with <EMCLASS="emphasis">sort</EM>-whatever youdo, you'll end up with a mish-mash of unmatched addresses, names, andzip codes.The <EMCLASS="emphasis">chunksort</EM> script will do the trick.Here's the part of the script that does the real work:<ACLASS="indexterm"NAME="AUTOID-41800"></A></TD></TR></TABLE><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen"># completely empty lines separate records.gawk '{    gsub(/\n/,&quot;\1&quot;);    print $0 &quot;\1&quot; }' RS= $files |sort $sortopts |tr '\1' '\12'</PRE></BLOCKQUOTE></P><PCLASS="para">The script starts with a lot of option processing that we don't showhere&nbsp;- it's incredibly thorough, and allows you to use any <EMCLASS="emphasis">sort</EM>options, except <EMCLASS="emphasis">-o</EM>.It also adds a new <EMCLASS="emphasis">-a</EM> option, whichallows you to sort based on different lines of a multiline entry.Say you're sorting an address file, and the street address is on thesecond line of each entry.The command <CODECLASS="literal">chunksort&nbsp;-a&nbsp;+3</CODE> wouldsort the file based on the zip codes.I'm not sure if this is really useful (you can't, for example, sort onthe third field of the second line), but it's a nice bit ofadditional functionality.</P><PCLASS="para">The body of the script (after the option processing) is conceptuallysimple.It uses<SPANCLASS="link"><EMCLASS="emphasis">gawk</EM> (<ACLASS="linkend"HREF="ch33_12.htm"TITLE="Versions of awk ">33.12</A>)</SPAN>to collapse each multiline record into asingle line, with the CTRL-a character to mark where the linebreaks were. After this processing, a few addresses from a typicaladdress list might look like this:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">Doe, John and Jane^A30 Anywhere St^AAnytown, New York^A10023^ABuck, Jane and John^A40 Anywhere St^ANowheresville, Alaska^A90023^A</PRE></BLOCKQUOTE></P><PCLASS="para">Now that we've converted the original file into a list of one-line entries, wehave something that <EMCLASS="emphasis">sort</EM> can handle. So we just use <EMCLASS="emphasis">sort</EM>,with whatever options were supplied on the command line.After sorting,<SPANCLASS="link"><EMCLASS="emphasis">tr</EM> (<ACLASS="linkend"HREF="ch35_11.htm"TITLE="Hacking on Characters with tr ">35.11</A>)</SPAN>&quot;unpacks&quot; this single-linerepresentation, restoring the file to its original form,by converting each CTRL-a back to a newline.Notice that the <EMCLASS="emphasis">gawk</EM> script added an extra CTRL-a to theend of each output line&nbsp;- so <EMCLASS="emphasis">tr</EM> outputs an extra newline, plusthe newline from the <EMCLASS="emphasis">gawk</EM> <EMCLASS="emphasis">print</EM> command, to give a blankline between each entry.(Thanks to Greg Ubben for this improvement.)</P><PCLASS="para">There are lots of interesting variations on this script. You cansubstitute <EMCLASS="emphasis">grep</EM> for the <EMCLASS="emphasis">sort</EM> command, allowing you to search for multiline entries&nbsp;- for example, to look up addresses in anaddress file. This would require slightly different optionprocessing, but the script would be essentially the same.</P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">JP</SPAN>, <SPANCLASS="authorinitials">ML</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch36_06.htm"TITLE="36.6 Miscellaneous sort Hints "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 36.6 Miscellaneous sort Hints "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch36_08.htm"TITLE="36.8 lensort: Sort Lines by Length "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 36.8 lensort: Sort Lines by Length "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">36.6 Miscellaneous sort Hints </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">36.8 lensort: Sort Lines by Length </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed &amp; awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -