0368-0370.html

来自「linux-unix130.linux.and.unix.ebooks130 l」· HTML 代码 · 共 506 行
HTML
506 行




<HTML>

<HEAD>

<TITLE>Developer.com - Online Reference Library - 0672311739:RED HAT LINUX 2ND EDITION:GNU Project Utilities</TITLE>

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<SCRIPT>
<!--
function displayWindow(url, width, height) {
        var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>

 -->




<!-- ISBN=0672311739 //-->

<!-- TITLE=RED HAT LINUX 2ND EDITION //-->

<!-- AUTHOR=DAVID PITTS ET AL //-->

<!-- PUBLISHER=MACMILLAN //-->

<!-- IMPRINT=SAMS PUBLISHING //-->

<!-- PUBLICATION DATE=1998 //-->

<!-- CHAPTER=17 //-->

<!-- PAGES=0351-0372 //-->

<!-- UNASSIGNED1 //-->

<!-- UNASSIGNED2 //-->









<P><CENTER>

<a href="0365-0367.html">Previous</A> | <a href="../ewtoc.html">Table of Contents</A> | <a href="0371-0372.html">Next</A>

</CENTER></P>



<A NAME="PAGENUM-368"><P>Page 368</P></A>













<H4><A NAME="ch17_ 18">





The split Command

</A></H4>









<P>The split command is probably one of the handiest

commands for transporting large files around. One of its most common uses is to split up compressed source files (to upload in

pieces or fit on a floppy). The basic syntax is

</P>





<!-- CODE SNIP //-->

<PRE>

split [options] filename [output prefix]

</PRE>

<!-- END CODE SNIP //-->











<P>where the options and output prefix are optional. If no output prefix is given,

split uses the prefix of x and output files are labeled

xaa, xab, xac, and so on. By default, split puts

1000 lines in each of the output files (the last file can be fewer than 1000 lines), but because

1000 lines can mean variable file sizes, the -b or

--bytes option is used. The basic syntax is

</P>





<!-- CODE SNIP //-->

<PRE>

-b bytes[bkm]



</PRE>

<!-- END CODE SNIP //-->











<P>or

</P>



<!-- CODE SNIP //-->

<PRE>

--bytes=bytes[bkm]

</PRE>

<!-- END CODE SNIP //-->











<P>where bytes is the number of bytes of size:

</P>



<BR>







b





512 bytes









k



1KB (1024 bytes)







m



1MB (1,048,576 bytes)















<P>Thus,

</P>





<!-- CODE SNIP //-->

<PRE>

split -b1000k JDK.tar.gz

</PRE>

<!-- END CODE SNIP //-->











<P>will split the file JDK.tar.gz into 1000KB pieces. To get the output files to be

labeled JDK.tar.gz., you would use the following:

</P>





<!-- CODE SNIP //-->

<PRE>

split -b1000k JDK.tar.gz JDK.tar.gz.

</PRE>

<!-- END CODE SNIP //-->











<P>This would create 1000KB files that could be copied to a floppy or uploaded one at a time

over a slow modem link.

</P>









<P>When the files reach their destination, they can be

joined by using cat:

</P>







<!-- CODE SNIP //-->

<PRE>

cat JDK.tar.gz.* &gt; JDK.tar.gz

</PRE>

<!-- END CODE SNIP //-->











<P>A command that is useful for confirming whether or not a split file has been joined correctly

is the cksum command. Historically, it has been used to confirm if files have been

transferred properly over noisy phone lines.

</P>









<P>cksum computes a cyclic redundancy check (CRC) for each filename argument and prints

out the CRC along with the number of bytes in the file and the filename. The easiest way to

compare the CRC for the two files is to get the CRC for the original file:

</P>





<!-- CODE SNIP //-->

<PRE>

cksum JDK.tar.gz &gt; JDK.crc

</PRE>

<!-- END CODE SNIP //-->











<P>and then compare it to the output cksum for the joined file.

</P>



<A NAME="PAGENUM-369"><P>Page 369</P></A>













<H4><A NAME="ch17_ 19">





Counting Words

</A></H4>









<P>Counting words is a handy thing to be able to do,

and there are many ways to do it. Probably the easiest is the

wc command, which stands for word count, but wc only prints the number

of characters, words, or lines. What about if you need a breakdown by word? It's a good

problem, and one that serves to introduce the next set of GNU text utilities.

</P>









<P>Here are the commands you need:

</P>



<TABLE WIDTH="360">

<TR><TD>

tr

</TD><TD>

Transliterate; changes the first set of characters it is given into the

second set of characters it is given; also deletes characters

</TD></TR>

<TR><TD>

sort

</TD><TD>

Sorts the file (or its standard input)

</TD></TR>

<TR><TD>

uniq

</TD><TD>

Prints out all the unique lines in a file (collapses duplicates into one

line and optionally gives a count)

</TD></TR>

</TABLE>















<P>I used this chapter as the text for this example. First, this line gets rid of all the punctuation

and braces, and so on, in the input file:

</P>





<!-- CODE SNIP //-->

<PRE>

tr `!?&quot;:;[]{}(),.' ` ` &lt; ~/docs/ch16.doc

</PRE>

<!-- END CODE SNIP //-->











<P>This demonstrates the basic usage of tr:

</P>





<!-- CODE SNIP //-->

<PRE>

tr `set1' `set2'

</PRE>

<!-- END CODE SNIP //-->











<P>This takes all the characters in set1 and transliterates them to the characters in

set2. Usually, the characters themselves are used, but the standard C escape sequences work also (as you

will see).

</P>









<P>I specified set2 as ` ` (the space character) because words separated by those characters need

to remain separate. The next step is to transliterate all capitalized versions of words together

because the words To and to, the and The, and

Files and files are really the same word. To do

this, tell tr to change all the capital characters

`A-Z' into lowercase characters `a-z':

</P>





<!-- CODE SNIP //-->

<PRE>

tr `!?&quot;:;[]{}(),.' ` ` &lt; ~/docs/ch16.doc |

tr `A-Z' `a-z'

</PRE>

<!-- END CODE SNIP //-->











<P>I broke the command into two lines, with the pipe character as the last character in the

first line so that the shell (sh, bash, ksh) will do the right thing and use the next line as the

command to pipe to. It's easier to read and cut and paste from an

xterm this way, also. This won't work under csh or

tcsh unless you start one of the preceding shells.

</P>









<P>Multiple spaces in the output can be squeezed into single spaces with

</P>





<!-- CODE SNIP //-->

<PRE>

tr `!?&quot;:;[]{}(),.' ` ` &lt; ~/docs/ch16.doc |

tr `A-Z' `a-z' |  tr -s ` `

</PRE>

<!-- END CODE SNIP //-->











<P>To get a count of how many times each word is used, you need to sort the file. In the

simplest form, the sort command sorts each line, so you need to have one word per line to get a

good sort. This code deletes all of the tabs (\t) and the newlines

(\n) and then changes all the spaces into newlines:

</P>



<A NAME="PAGENUM-370"><P>Page 370</P></A>







<!-- CODE SNIP //-->

<PRE>

tr `!?&quot;:;[]{}(),.' ` ` &lt; ~/docs/ch16.doc |

tr `A-Z' `a-z' | tr -s ` ` | tr -d `\t\n' | tr ` ` `\n'

</PRE>

<!-- END CODE SNIP //-->











<P>Now you can sort the output, so simply tack on the

sort command:

</P>





<!-- CODE SNIP //-->

<PRE>

tr `!?&quot;:;[]{}(),.' ` ` &lt; ~/docs/ch16.doc |

tr `A-Z' `a-z' | tr -s ` ` | tr -d `\t\n' | tr ` ` `\n' | sort

</PRE>

<!-- END CODE SNIP //-->











<P>You could eliminate all the repeats at this point by giving the

sort the -u (unique) option, but you need a count of the repeats, so use the

uniq command. By default, the uniq command prints out &quot;the unique lines in a sorted file, discarding all but one of a run of matching

lines&quot; (man page uniq). uniq requires sorted files because it only compares consecutive lines. To

get uniq to print out how many times a word occurs, give it the

-c (count) option:

</P>





<!-- CODE SNIP //-->

<PRE>

tr `!?&quot;:;[]{}(),.' ` ` &lt; ~/docs/ch16.doc |

tr `A-Z' `a-z' | tr -s ` ` | tr -d `\t\n' |

tr ` ` `\n' | sort | uniq -c

</PRE>

<!-- END CODE SNIP //-->













<P>Next, you need to sort the output again because the

order in which the output is printed out is not sorted by number. This time, to get

sort to sort by numeric value instead of string

compare and have the largest number printed out first, give sort the

-n (numeric) and -r (reverse) options:

</P>





<!-- CODE SNIP //-->

<PRE>

tr `!?&quot;:;[]{}(),.' ` ` &lt; ~/docs/ch16.doc |

tr `A-Z' `a-z' | tr -s ` ` | tr -d `\t\n' |

tr ` ` `\n' | sort | uniq -c | sort -rn

</PRE>

<!-- END CODE SNIP //-->













<P>The first few lines (ten actually, I piped the output to

head) look like this:

</P>





<!-- CODE //-->

<PRE>

389 the

164 to

127 of

115 is

115 and

111 a

 80 files

 70 file

 69 in

 65 `

</PRE>

<!-- END CODE //-->











<P>Note that the tenth most common word is the single quote character, but I said we took

care of the punctuation with the very first tr. Well, I lied (sort of); we took care of all the

characters that would fit between quotes, and a single quote won't fit. So why not just backslash

escape that sucker? Well, not all shells will handle that properly.

</P>









<P>So what's the solution?

</P>









<P>The solution is to use the predefined character sets in

tr. The tr command knows several character classes, and the punctuation class is one of them.

Here is a complete list (names and definitions) of class names, from the man page for

uniq:

</P>





<TABLE WIDTH="360">

<TR><TD>

alnum

</TD><TD>

Letters and digits

</TD></TR>

<TR><TD>

alpha

</TD><TD>

Letters

</TD></TR>

</TABLE>









<P><CENTER>

<a href="0365-0367.html">Previous</A> | <a href="../ewtoc.html">Table of Contents</A> | <a href="0371-0372.html">Next</A>

</CENTER></P>









</td>
</tr>
</table>

<!-- begin footer information -->





</body></html>
0368-0370.html - 源码说明

本页面展示了「linux-unix130.linux.and.unix.ebooks130 linux and unix ebookslinuxLearning Linux - Collection of 12 E」中的 0368-0370.html 源码文件，采用 HTML 编程语言编写，共 506 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与linux相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?