⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch36_04.htm

📁 the unix power tools
💻 HTM
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 36] 36.4 Confusion with White Space Field Delimiters </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly &amp; Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:48:39Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch36_01.htm"TITLE="36. Sorting"><LINKREL="prev"HREF="ch36_03.htm"TITLE="36.3 Changing the Field Delimiter "><LINKREL="next"HREF="ch36_05.htm"TITLE="36.5 Alphabetic and Numeric Sorting "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch36_03.htm"TITLE="36.3 Changing the Field Delimiter "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 36.3 Changing the Field Delimiter "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 36<BR>Sorting</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch36_05.htm"TITLE="36.5 Alphabetic and Numeric Sorting "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 36.5 Alphabetic and Numeric Sorting "BORDER="0"></A></TD></TR></TABLE>&nbsp;<HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-2924">36.4 Confusion with White Space Field Delimiters </A></H2><PCLASS="para">One would hope that a simple task like sorting would be relativelyunambiguous. Unfortunately, it isn't. The behavior of <EMCLASS="emphasis">sort</EM> canbe very puzzling. I'll try to straighten out some of theconfusion&nbsp;- at the same time, I'll be leaving myself open to abuse bythe real <EMCLASS="emphasis">sort</EM> experts. I hope you appreciate this! Seriously,though: if we find any new wrinkles to the story, we'll add them inthe next edition.</P><PCLASS="para">The trouble with <EMCLASS="emphasis">sort</EM> is figuring out where one field ends andanother begins. It's simplest if you can<SPANCLASS="link">specify an explicit field delimiter (<ACLASS="linkend"HREF="ch36_03.htm"TITLE="Changing the Field Delimiter ">36.3</A>)</SPAN>.This makes it easyto tell where fields end and begin. But by default, <EMCLASS="emphasis">sort</EM> uses whitespace characters (tabs and spaces) to separate fields, and the rulesfor interpreting white space field delimitersare unfortunately complicated. As I see them, they are:</P><ULCLASS="itemizedlist"><LICLASS="listitem"><PCLASS="para">The first white space character you encounter is a &quot;field delimiter&quot;;it marks the end of the old field and the beginning of the next field.</P></LI><LICLASS="listitem"><PCLASS="para">Any white space character following a field delimiter is <EMCLASS="emphasis">part of</EM>the new field. That is&nbsp;- if you have two or more white spacecharacters in a row, the first one is used as a field delimiter, andisn't sorted. The remainder <EMCLASS="emphasis">are </EM> sorted, as part of the nextfield.</P></LI><LICLASS="listitem"><PCLASS="para">Every field has at least one non-whitespace character, unless you'reat the end of the line. (That is: null fields only occur when you'vereached the end of a line.)</P></LI><LICLASS="listitem"><PCLASS="para">All white space is not equal.Sorting is done according to the<SPANCLASS="link">ASCII (<ACLASS="linkend"HREF="ch51_03.htm"TITLE="ASCII Characters: Listing and Getting Values ">51.3</A>)</SPAN>collating sequence.Therefore, TABs are sorted before spaces.</P></LI></UL><PCLASS="para">Here is a silly but instructive example that demonstrates most of thehard cases. We'll sort the file <EMCLASS="emphasis">sortme</EM>, which is:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">apple   Fruit shipment20      beta    beta test sites  5              Something or other</PRE></BLOCKQUOTE></P><PCLASS="para">All is not as it seems-<SPANCLASS="link"><EMCLASS="emphasis">cat -t -v</EM> (<ACLASS="linkend"HREF="ch25_06.htm"TITLE="What's in That White Space? ">25.6</A>, <ACLASS="linkend"HREF="ch25_07.htm"TITLE="Show Non-Printing Characters with cat -v or od -c ">25.7</A>)</SPAN>shows that the file reallylooks like this:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">^Iapple^IFruit shipment20^Ibeta^Ibeta test sites  5^I^ISomething or other</PRE></BLOCKQUOTE></P><PCLASS="para"><CODECLASS="literal">^I</CODE> indicates a tab character. Before showing you what<EMCLASS="emphasis">sort</EM> does with this file, let's break it intofields, being very careful to apply the rules above. In the table, weuse quotes to show exactly where each field begins and ends:</P><TABLECLASS="informaltable"><THEADCLASS="thead"><TRCLASS="row"VALIGN="TOP"><THCLASS="entry"ALIGN="LEFT"ROWSPAN="1"COLSPAN="1">Field</TH><THCLASS="entry"ALIGN="LEFT"ROWSPAN="1"COLSPAN="1">0</TH><THCLASS="entry"ALIGN="LEFT"ROWSPAN="1"COLSPAN="1">1</TH><THCLASS="entry"ALIGN="LEFT"ROWSPAN="1"COLSPAN="1">2</TH><THCLASS="entry"ALIGN="LEFT"ROWSPAN="1"COLSPAN="1">3</TH></TR></THEAD><TBODYCLASS="tbody"><TRCLASS="row"VALIGN="TOP"><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">Line</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1"></TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1"></TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1"></TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1"></TD></TR><TRCLASS="row"VALIGN="TOP"><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">1</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;^Iapple&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;Fruit&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;shipment&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">null (no more data)</TD></TR><TRCLASS="row"VALIGN="TOP"><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">2</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;20&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;beta&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;beta&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;test&quot;</TD></TR><TRCLASS="row"VALIGN="TOP"><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">3</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;5&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;^Isomething&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;or&quot;</TD><TDCLASS="entry"ROWSPAN="1"COLSPAN="1">&quot;other&quot;</TD></TR></TBODY></TABLE><PCLASS="para">OK, now let's try some <EMCLASS="emphasis">sort</EM> commands; I've added annotations on theright, showing what character the &quot;sort&quot; was based on. First, we'llsort on field zero&nbsp;- that is, the first field in each line:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">&#13;% <CODECLASS="userinput"><B>sort sortme</B></CODE> <ICLASS="lineannotation">sort on field zero</I>        apple   Fruit shipments <ICLASS="lineannotation">field 0, first character: TAB</I> 5              Something or other <ICLASS="lineannotation">field 0, first character: SPACE</I>20      beta    beta test sites <ICLASS="lineannotation">field 0, first character: 2</I></PRE></BLOCKQUOTE></P><PCLASS="para">As I noted earlier, a TAB precedes a space in the collating sequence.Everything is as expected. Now let's try another, this time sortingon field 1 (the second field):</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">&#13;% <CODECLASS="userinput"><B>sort +1 sortme</B></CODE> <ICLASS="lineannotation">sort on field 1</I> 5              Something or other <ICLASS="lineannotation">field 1, first character: TAB</I>        apple   Fruit shipments <ICLASS="lineannotation">field 1, first character: F</I>20      beta    beta test sites <ICLASS="lineannotation">field 1, first character: b</I></PRE></BLOCKQUOTE></P><PCLASS="para">Again, the initial TAB causes &quot;something or other&quot; to appear first.&quot;Fruit shipments&quot; preceded &quot;beta&quot; because in the ASCII table,uppercase letters precede lowercase letters. Now, let's sort on thenext field:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">&#13;% <CODECLASS="userinput"><B>sort +2 sortme</B></CODE> <ICLASS="lineannotation">sort on field 2</I>20      beta    beta test sites <ICLASS="lineannotation">field 2, first character: b</I> 5              Something or other <ICLASS="lineannotation">field 2, first character: o</I>        apple   Fruit shipments <ICLASS="lineannotation">field 2, first character: s</I></PRE></BLOCKQUOTE></P><PCLASS="para">&#13;No surprises here. And finally, sort on field 3 (the &quot;fourth&quot; field):</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">&#13;% <CODECLASS="userinput"><B>sort +3 sortme</B></CODE> <ICLASS="lineannotation">sort on field 3</I>        apple   Fruit shipments <ICLASS="lineannotation">field 3,  NULL</I> 5              Something or other <ICLASS="lineannotation">field 3, first character: o</I>20      beta    beta test sites <ICLASS="lineannotation">field 3, first character: t</I></PRE></BLOCKQUOTE></P><PCLASS="para">The only surprise here is that the NULL field gets sorted first.That's really no surprise, though: NULL has the ASCII value zero, sowe should expect it to come first.</P><PCLASS="para">OK, this was a silly example. But it was a difficult one; a casualunderstanding of what sort &quot;ought to do&quot; won't explain any of thesecases. Which leads to another point. If someone tells you to sortsome terrible mess of a data file, you could be heading for anightmare. But often, you're not just sorting; you're also<EMCLASS="emphasis">designing</EM> the data file you want to sort. If you get to designthe format for the input data, a little bit of care will save you lotsof headaches. If you have a choice, <EMCLASS="emphasis">never</EM> allow TABs in thefile. And be careful of leading spaces; a word with an extra spacebefore it will be sorted <EMCLASS="emphasis">before</EM> other words. Therefore, use anexplicit delimiter between fields (like a colon), or use the <EMCLASS="emphasis">-b</EM>option (and an explicit sort field), which tells <EMCLASS="emphasis">sort</EM> to ignoreinitial white space.</P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">ML</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch36_03.htm"TITLE="36.3 Changing the Field Delimiter "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 36.3 Changing the Field Delimiter "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch36_05.htm"TITLE="36.5 Alphabetic and Numeric Sorting "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 36.5 Alphabetic and Numeric Sorting "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">36.3 Changing the Field Delimiter </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">36.5 Alphabetic and Numeric Sorting </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed &amp; awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -