⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 textproc.html

📁 Shall高级编程
💻 HTML
📖 第 1 页 / 共 5 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><HTML><HEAD><TITLE>Text Processing Commands</TITLE><METANAME="GENERATOR"CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+"><LINKREL="HOME"TITLE="Advanced Bash-Scripting Guide"HREF="index.html"><LINKREL="UP"TITLE="External Filters, Programs and Commands"HREF="external.html"><LINKREL="PREVIOUS"TITLE="Time / Date Commands"HREF="timedate.html"><LINKREL="NEXT"TITLE="File and Archiving Commands"HREF="filearchiv.html"><METAHTTP-EQUIV="Content-Style-Type"CONTENT="text/css"><LINKREL="stylesheet"HREF="common/kde-common.css"TYPE="text/css"><METAHTTP-EQUIV="Content-Type"CONTENT="text/html; charset=iso-8859-1"><METAHTTP-EQUIV="Content-Language"CONTENT="en"><LINKREL="stylesheet"HREF="common/kde-localised.css"TYPE="text/css"TITLE="KDE-English"><LINKREL="stylesheet"HREF="common/kde-default.css"TYPE="text/css"TITLE="KDE-Default"></HEAD><BODYCLASS="SECT1"BGCOLOR="#FFFFFF"TEXT="#000000"LINK="#AA0000"VLINK="#AA0055"ALINK="#AA0000"STYLE="font-family: sans-serif;"><DIVCLASS="NAVHEADER"><TABLESUMMARY="Header navigation table"WIDTH="100%"BORDER="0"CELLPADDING="0"CELLSPACING="0"><TR><THCOLSPAN="3"ALIGN="center">Advanced Bash-Scripting Guide: An in-depth exploration of the art of shell scripting</TH></TR><TR><TDWIDTH="10%"ALIGN="left"VALIGN="bottom"><AHREF="timedate.html"ACCESSKEY="P">Prev</A></TD><TDWIDTH="80%"ALIGN="center"VALIGN="bottom">Chapter 15. External Filters, Programs and Commands</TD><TDWIDTH="10%"ALIGN="right"VALIGN="bottom"><AHREF="filearchiv.html"ACCESSKEY="N">Next</A></TD></TR></TABLE><HRALIGN="LEFT"WIDTH="100%"></DIV><DIVCLASS="SECT1"><H1CLASS="SECT1"><ANAME="TEXTPROC"></A>15.4. Text Processing Commands</H1><DIVCLASS="VARIABLELIST"><P><B><ANAME="TPCOMMANDLISTING1"></A>Commands affecting text and	   text files</B></P><DL><DT><ANAME="SORTREF"></A><BCLASS="COMMAND">sort</B></DT><DD><P>File sort utility, often used as a filter in a pipe. This	      command sorts a <ICLASS="FIRSTTERM">text stream</I>	      or file forwards or backwards, or according to various	      keys or character positions. Using the <TTCLASS="OPTION">-m</TT>	      option, it merges presorted input files.	The <ICLASS="FIRSTTERM">info	      page</I> lists its many capabilities and options. See	      <AHREF="loops.html#FINDSTRING">Example 10-9</A>, <AHREF="loops.html#SYMLINKS">Example 10-10</A>,	      and <AHREF="contributed-scripts.html#MAKEDICT">Example A-8</A>.</P></DD><DT><ANAME="TSORTREF"></A><BCLASS="COMMAND">tsort</B></DT><DD><P><ICLASS="FIRSTTERM">Topological sort</I>, reading in	      pairs of whitespace-separated strings and sorting	      according to input patterns. The original purpose of	      <BCLASS="COMMAND">tsort</B> was to sort a list of dependencies	      for an obsolete version of the <ICLASS="FIRSTTERM">ld</I>	      linker in an <SPANCLASS="QUOTE">"ancient"</SPAN> version of UNIX.</P><P>The results of a <ICLASS="FIRSTTERM">tsort</I> will usually	      differ markedly from those of the standard	      <BCLASS="COMMAND">sort</B> command, above.</P></DD><DT><ANAME="UNIQREF"></A><BCLASS="COMMAND">uniq</B></DT><DD><P>This filter removes duplicate lines from a sorted	      file. It is often seen in a pipe coupled with	      <AHREF="textproc.html#SORTREF">sort</A>.</P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;cat list-1 list-2 list-3 | sort | uniq &#62; final.list   2&nbsp;# Concatenates the list files,   3&nbsp;# sorts them,   4&nbsp;# removes duplicate lines,   5&nbsp;# and finally writes the result to an output file.</PRE></TD></TR></TABLE></P><P>The useful <TTCLASS="OPTION">-c</TT> option prefixes each line of	       the input file with its number of occurrences.</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>cat testfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This line occurs only once. This line occurs twice. This line occurs twice. This line occurs three times. This line occurs three times. This line occurs three times.</TT>   <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>uniq -c testfile</B></TT> <TTCLASS="COMPUTEROUTPUT">      1 This line occurs only once.       2 This line occurs twice.       3 This line occurs three times.</TT>   <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>sort testfile | uniq -c | sort -nr</B></TT> <TTCLASS="COMPUTEROUTPUT">      3 This line occurs three times.       2 This line occurs twice.       1 This line occurs only once.</TT> 	      </PRE></TD></TR></TABLE>	     </P><P>The <TTCLASS="USERINPUT"><B>sort INPUTFILE | uniq -c | sort -nr</B></TT>	       command string produces a <ICLASS="FIRSTTERM">frequency	       of occurrence</I> listing on the	       <TTCLASS="FILENAME">INPUTFILE</TT> file (the	       <TTCLASS="OPTION">-nr</TT> options to <BCLASS="COMMAND">sort</B>	       cause a reverse numerical sort). This template finds	       use in analysis of log files and dictionary lists, and	       wherever the lexical structure of a document needs to	       be examined.</P><DIVCLASS="EXAMPLE"><HR><ANAME="WF"></A><P><B>Example 15-12. Word Frequency Analysis</B></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;#!/bin/bash   2&nbsp;# wf.sh: Crude word frequency analysis on a text file.   3&nbsp;# This is a more efficient version of the "wf2.sh" script.   4&nbsp;   5&nbsp;   6&nbsp;# Check for input file on command line.   7&nbsp;ARGS=1   8&nbsp;E_BADARGS=65   9&nbsp;E_NOFILE=66  10&nbsp;  11&nbsp;if [ $# -ne "$ARGS" ]  # Correct number of arguments passed to script?  12&nbsp;then  13&nbsp;  echo "Usage: `basename $0` filename"  14&nbsp;  exit $E_BADARGS  15&nbsp;fi  16&nbsp;  17&nbsp;if [ ! -f "$1" ]       # Check if file exists.  18&nbsp;then  19&nbsp;  echo "File \"$1\" does not exist."  20&nbsp;  exit $E_NOFILE  21&nbsp;fi  22&nbsp;  23&nbsp;  24&nbsp;  25&nbsp;########################################################  26&nbsp;# main ()  27&nbsp;sed -e 's/\.//g'  -e 's/\,//g' -e 's/ /\  28&nbsp;/g' "$1" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr  29&nbsp;#                           =========================  30&nbsp;#                            Frequency of occurrence  31&nbsp;  32&nbsp;#  Filter out periods and commas, and  33&nbsp;#+ change space between words to linefeed,  34&nbsp;#+ then shift characters to lowercase, and  35&nbsp;#+ finally prefix occurrence count and sort numerically.  36&nbsp;  37&nbsp;#  Arun Giridhar suggests modifying the above to:  38&nbsp;#  . . . | sort | uniq -c | sort +1 [-f] | sort +0 -nr  39&nbsp;#  This adds a secondary sort key, so instances of  40&nbsp;#+ equal occurrence are sorted alphabetically.  41&nbsp;#  As he explains it:  42&nbsp;#  "This is effectively a radix sort, first on the  43&nbsp;#+ least significant column  44&nbsp;#+ (word or string, optionally case-insensitive)  45&nbsp;#+ and last on the most significant column (frequency)."  46&nbsp;#  47&nbsp;#  As Frank Wang explains, the above is equivalent to  48&nbsp;#+       . . . | sort | uniq -c | sort +0 -nr  49&nbsp;#+ and the following also works:  50&nbsp;#+       . . . | sort | uniq -c | sort -k1nr -k  51&nbsp;########################################################  52&nbsp;  53&nbsp;exit 0  54&nbsp;  55&nbsp;# Exercises:  56&nbsp;# ---------  57&nbsp;# 1) Add 'sed' commands to filter out other punctuation,  58&nbsp;#+   such as semicolons.  59&nbsp;# 2) Modify the script to also filter out multiple spaces and  60&nbsp;#+   other whitespace.</PRE></TD></TR></TABLE><HR></DIV><P>	       <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>cat testfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This line occurs only once. This line occurs twice. This line occurs twice. This line occurs three times. This line occurs three times. This line occurs three times.</TT>   <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>./wf.sh testfile</B></TT> <TTCLASS="COMPUTEROUTPUT">      6 this       6 occurs       6 line       3 times       3 three       2 twice       1 only       1 once</TT> 	       </PRE></TD></TR></TABLE>	     </P></DD><DT><ANAME="EXPANDREF"></A><BCLASS="COMMAND">expand</B>, <BCLASS="COMMAND">unexpand</B></DT><DD><P>The <BCLASS="COMMAND">expand</B> filter converts tabs to	      spaces. It is often used in a pipe.</P><P>The <BCLASS="COMMAND">unexpand</B> filter	      converts spaces to tabs. This reverses the effect of	      <BCLASS="COMMAND">expand</B>.</P></DD><DT><ANAME="CUTREF"></A><BCLASS="COMMAND">cut</B></DT><DD><P>A tool for extracting fields from files. It is similar to the 	      <TTCLASS="USERINPUT"><B>print $N</B></TT> command set in <AHREF="awk.html#AWKREF">awk</A>, but more limited. It may be	      simpler to use <ICLASS="FIRSTTERM">cut</I> in a script than	      <ICLASS="FIRSTTERM">awk</I>. Particularly important are the	      <TTCLASS="OPTION">-d</TT> (delimiter) and <TTCLASS="OPTION">-f</TT>	      (field specifier) options.</P><P>Using <BCLASS="COMMAND">cut</B> to obtain a listing of the	      mounted filesystems: 	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;cut -d ' ' -f1,2 /etc/mtab</PRE></TD></TR></TABLE></P><P>Using <BCLASS="COMMAND">cut</B> to list the OS and kernel version:	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;uname -a | cut -d" " -f1,3,11,12</PRE></TD></TR></TABLE></P><P>Using <BCLASS="COMMAND">cut</B> to extract message headers from	      an e-mail folder:	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep '^Subject:' read-messages | cut -c10-80</B></TT> <TTCLASS="COMPUTEROUTPUT">Re: Linux suitable for mission-critical apps? MAKE MILLIONS WORKING AT HOME!!! Spam complaint Re: Spam complaint</TT></PRE></TD></TR></TABLE>	    </P><P>Using <BCLASS="COMMAND">cut</B> to parse a file:	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;# List all the users in /etc/passwd.   2&nbsp;   3&nbsp;FILENAME=/etc/passwd   4&nbsp;   5&nbsp;for user in $(cut -d: -f1 $FILENAME)   6&nbsp;do   7&nbsp;  echo $user   8&nbsp;done   9&nbsp;  10&nbsp;# Thanks, Oleg Philon for suggesting this.</PRE></TD></TR></TABLE></P><P><TTCLASS="USERINPUT"><B>cut -d ' ' -f2,3 filename</B></TT> is equivalent to	      <TTCLASS="USERINPUT"><B>awk -F'[ ]' '{ print $2, $3 }' filename</B></TT></P><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>It is even possible to specify a linefeed as a	      delimiter. The trick is to actually embed a linefeed	      (<BCLASS="KEYCAP">RETURN</B>) in the command sequence.</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>cut -d' ' -f3,7,19 testfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This is line 3 of testfile. This is line 7 of testfile. This is line 19 of testfile.</TT> 	      </PRE></TD></TR></TABLE>	  </P><P>Thank you, Jaka Kranjc, for pointing this out.</P></TD></TR></TABLE></DIV><P>See also <AHREF="mathc.html#BASE">Example 15-46</A>.</P></DD><DT><ANAME="PASTEREF"></A><BCLASS="COMMAND">paste</B></DT><DD><P>Tool for merging together different files into a single,	      multi-column file.  In combination with	      <AHREF="textproc.html#CUTREF">cut</A>, useful for creating system log	      files.	    </P></DD><DT

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -