📄 textproc.html

📁 Shall高级编程
💻 HTML
📖 第 1 页 / 共 5 页
字号:
><ANAME="JOINREF"></A><BCLASS="COMMAND">join</B></DT><DD><P>Consider this a special-purpose cousin of	      <BCLASS="COMMAND">paste</B>. This powerful utility allows	      merging two files in a meaningful fashion, which essentially	      creates a simple version of a relational database.</P><P>The <BCLASS="COMMAND">join</B> command operates on	      exactly two files, but pastes together only those lines	      with a common tagged field (usually a numerical label),	      and writes the result to <TTCLASS="FILENAME">stdout</TT>.	      The files to be joined should be sorted according to the	      tagged field for the matchups to work properly.</P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;File: 1.data   2&nbsp;   3&nbsp;100 Shoes   4&nbsp;200 Laces   5&nbsp;300 Socks</PRE></TD></TR></TABLE></P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;File: 2.data   2&nbsp;   3&nbsp;100 $40.00   4&nbsp;200 $1.00   5&nbsp;300 $2.00</PRE></TD></TR></TABLE></P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>join 1.data 2.data</B></TT> <TTCLASS="COMPUTEROUTPUT">File: 1.data 2.data 100 Shoes $40.00 200 Laces $1.00 300 Socks $2.00</TT> 	      </PRE></TD></TR></TABLE>	    </P><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>The tagged field appears only once in the	      output.</P></TD></TR></TABLE></DIV></DD><DT><ANAME="HEADREF"></A><BCLASS="COMMAND">head</B></DT><DD><P>lists the beginning of a file to <TTCLASS="FILENAME">stdout</TT>.	      The default is <TTCLASS="LITERAL">10</TT> lines, but this can	      be changed.	      The command has a number of interesting options.	    <DIVCLASS="EXAMPLE"><HR><ANAME="SCRIPTDETECTOR"></A><P><B>Example 15-13. Which files are scripts?</B></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;#!/bin/bash   2&nbsp;# script-detector.sh: Detects scripts within a directory.   3&nbsp;   4&nbsp;TESTCHARS=2    # Test first 2 characters.   5&nbsp;SHABANG='#!'   # Scripts begin with a "sha-bang."   6&nbsp;   7&nbsp;for file in *  # Traverse all the files in current directory.   8&nbsp;do   9&nbsp;  if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]]  10&nbsp;  #      head -c2                      #!  11&nbsp;  #  The '-c' option to "head" outputs a specified  12&nbsp;  #+ number of characters, rather than lines (the default).  13&nbsp;  then  14&nbsp;    echo "File \"$file\" is a script."  15&nbsp;  else  16&nbsp;    echo "File \"$file\" is *not* a script."  17&nbsp;  fi  18&nbsp;done  19&nbsp;    20&nbsp;exit 0  21&nbsp;  22&nbsp;#  Exercises:  23&nbsp;#  ---------  24&nbsp;#  1) Modify this script to take as an optional argument  25&nbsp;#+    the directory to scan for scripts  26&nbsp;#+    (rather than just the current working directory).  27&nbsp;#  28&nbsp;#  2) As it stands, this script gives "false positives" for  29&nbsp;#+    Perl, awk, and other scripting language scripts.  30&nbsp;#     Correct this.</PRE></TD></TR></TABLE><HR></DIV>	    	    <DIVCLASS="EXAMPLE"><HR><ANAME="RND"></A><P><B>Example 15-14. Generating 10-digit random numbers</B></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;#!/bin/bash   2&nbsp;# rnd.sh: Outputs a 10-digit random number   3&nbsp;   4&nbsp;# Script by Stephane Chazelas.   5&nbsp;   6&nbsp;head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'   7&nbsp;   8&nbsp;   9&nbsp;# =================================================================== #  10&nbsp;  11&nbsp;# Analysis  12&nbsp;# --------  13&nbsp;  14&nbsp;# head:  15&nbsp;# -c4 option takes first 4 bytes.  16&nbsp;  17&nbsp;# od:  18&nbsp;# -N4 option limits output to 4 bytes.  19&nbsp;# -tu4 option selects unsigned decimal format for output.  20&nbsp;  21&nbsp;# sed:   22&nbsp;# -n option, in combination with "p" flag to the "s" command,  23&nbsp;# outputs only matched lines.  24&nbsp;  25&nbsp;  26&nbsp;  27&nbsp;# The author of this script explains the action of 'sed', as follows.  28&nbsp;  29&nbsp;# head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'  30&nbsp;# ----------------------------------&#62; |  31&nbsp;  32&nbsp;# Assume output up to "sed" --------&#62; |  33&nbsp;# is 0000000 1198195154\n  34&nbsp;  35&nbsp;#  sed begins reading characters: 0000000 1198195154\n.  36&nbsp;#  Here it finds a newline character,  37&nbsp;#+ so it is ready to process the first line (0000000 1198195154).  38&nbsp;#  It looks at its &#60;range&#62;&#60;action&#62;s. The first and only one is  39&nbsp;  40&nbsp;#   range     action  41&nbsp;#   1         s/.* //p  42&nbsp;  43&nbsp;#  The line number is in the range, so it executes the action:  44&nbsp;#+ tries to substitute the longest string ending with a space in the line  45&nbsp;#  ("0000000 ") with nothing (//), and if it succeeds, prints the result  46&nbsp;#  ("p" is a flag to the "s" command here, this is different  47&nbsp;#+ from the "p" command).  48&nbsp;  49&nbsp;#  sed is now ready to continue reading its input. (Note that before  50&nbsp;#+ continuing, if -n option had not been passed, sed would have printed  51&nbsp;#+ the line once again).  52&nbsp;  53&nbsp;#  Now, sed reads the remainder of the characters, and finds the  54&nbsp;#+ end of the file.  55&nbsp;#  It is now ready to process its 2nd line (which is also numbered '$' as  56&nbsp;#+ it's the last one).  57&nbsp;#  It sees it is not matched by any &#60;range&#62;, so its job is done.  58&nbsp;  59&nbsp;#  In few word this sed commmand means:  60&nbsp;#  "On the first line only, remove any character up to the right-most space,  61&nbsp;#+ then print it."  62&nbsp;  63&nbsp;# A better way to do this would have been:  64&nbsp;#           sed -e 's/.* //;q'  65&nbsp;  66&nbsp;# Here, two &#60;range&#62;&#60;action&#62;s (could have been written  67&nbsp;#           sed -e 's/.* //' -e q):  68&nbsp;  69&nbsp;#   range                    action  70&nbsp;#   nothing (matches line)   s/.* //  71&nbsp;#   nothing (matches line)   q (quit)  72&nbsp;  73&nbsp;#  Here, sed only reads its first line of input.  74&nbsp;#  It performs both actions, and prints the line (substituted) before  75&nbsp;#+ quitting (because of the "q" action) since the "-n" option is not passed.  76&nbsp;  77&nbsp;# =================================================================== #  78&nbsp;  79&nbsp;# An even simpler altenative to the above one-line script would be:  80&nbsp;#           head -c4 /dev/urandom| od -An -tu4  81&nbsp;  82&nbsp;exit 0</PRE></TD></TR></TABLE><HR></DIV>	    	      See also <AHREF="filearchiv.html#EX52">Example 15-38</A>.</P></DD><DT><ANAME="TAILREF"></A><BCLASS="COMMAND">tail</B></DT><DD><P>lists the (tail) end of a file to <TTCLASS="FILENAME">stdout</TT>.	      The default is <TTCLASS="LITERAL">10</TT> lines, but this can	      be changed.	      Commonly used to keep track of	      changes to a system logfile, using the <TTCLASS="OPTION">-f</TT>	      option, which outputs lines appended to the file.</P><DIVCLASS="EXAMPLE"><HR><ANAME="EX12"></A><P><B>Example 15-15. Using <ICLASS="FIRSTTERM">tail</I> to monitor the system log</B></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;#!/bin/bash   2&nbsp;   3&nbsp;filename=sys.log   4&nbsp;   5&nbsp;cat /dev/null &#62; $filename; echo "Creating / cleaning out file."   6&nbsp;#  Creates file if it does not already exist,   7&nbsp;#+ and truncates it to zero length if it does.   8&nbsp;#  : &#62; filename   and   &#62; filename also work.   9&nbsp;  10&nbsp;tail /var/log/messages &#62; $filename    11&nbsp;# /var/log/messages must have world read permission for this to work.  12&nbsp;  13&nbsp;echo "$filename contains tail end of system log."  14&nbsp;  15&nbsp;exit 0</PRE></TD></TR></TABLE><HR></DIV><DIVCLASS="TIP"><TABLECLASS="TIP"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/tip.png"HSPACE="5"ALT="Tip"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>To list a specific line of a text file,	        <AHREF="special-chars.html#PIPEREF">pipe</A> the output of	        <BCLASS="COMMAND">head</B> to <BCLASS="COMMAND">tail -n 1</B>.		For example <TTCLASS="USERINPUT"><B>head -n 8 database.txt | tail		-n 1</B></TT> lists the 8th line of the file		<TTCLASS="FILENAME">database.txt</TT>.</P><P>To set a variable to a given block of a text file:	        <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;var=$(head -n $m $filename | tail -n $n)   2&nbsp;   3&nbsp;# filename = name of file   4&nbsp;# m = from beginning of file, number of lines to end of block   5&nbsp;# n = number of lines to set variable to (trim from end of block)</PRE></TD></TR></TABLE></P></TD></TR></TABLE></DIV><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>Newer implementations of <BCLASS="COMMAND">tail</B>	        deprecate the older <BCLASS="COMMAND">tail -$LINES	        filename</B> usage. The standard <BCLASS="COMMAND">tail -n $LINES	        filename</B> is correct.</P></TD></TR></TABLE></DIV><P>See also <AHREF="moreadv.html#EX41">Example 15-5</A>, <AHREF="filearchiv.html#EX52">Example 15-38</A> and		<AHREF="debugging.html#ONLINE">Example 29-6</A>.</P></DD><DT><ANAME="GREPREF"></A><BCLASS="COMMAND">grep</B></DT><DD><P>A multi-purpose file search tool that uses	      <AHREF="regexp.html#REGEXREF">Regular Expressions</A>.	      It was originally a command/filter in the	      venerable <BCLASS="COMMAND">ed</B> line editor:	      <TTCLASS="USERINPUT"><B>g/re/p</B></TT> -- <ICLASS="FIRSTTERM">global -	      regular expression - print</I>.</P><P><P><BCLASS="COMMAND">grep</B>   <TTCLASS="REPLACEABLE"><I>pattern</I></TT>  [<TTCLASS="REPLACEABLE"><I>file</I></TT>...]</P>Search the target file(s) for	      occurrences of <TTCLASS="REPLACEABLE"><I>pattern</I></TT>, where	      <TTCLASS="REPLACEABLE"><I>pattern</I></TT> may be literal text	      or a Regular Expression.</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep '[rst]ystem.$' osinfo.txt</B></TT> <TTCLASS="COMPUTEROUTPUT">The GPL governs the distribution of the Linux operating system.</TT> 	      </PRE></TD></TR></TABLE>	      </P><P>If no target file(s) specified, <BCLASS="COMMAND">grep</B>	      works as a filter on <TTCLASS="FILENAME">stdout</TT>, as in	      a <AHREF="special-chars.html#PIPEREF">pipe</A>.</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>ps ax | grep clock</B></TT> <TTCLASS="COMPUTEROUTPUT">765 tty1     S      0:00 xclock 901 pts/1    S      0:00 grep clock</TT> 	      </PRE></TD></TR></TABLE>	      </P><P>The <TTCLASS="OPTION">-i</TT> option causes a case-insensitive	      search.</P><P>The <TTCLASS="OPTION">-w</TT> option matches only whole	      words.</P><P>The <TTCLASS="OPTION">-l</TT> option lists only the files in which	      matches were found, but not the matching lines.</P><P>The <TTCLASS="OPTION">-r</TT> (recursive) option searches files in	      the current working directory and all subdirectories below	      it.</P><P>The <TTCLASS="OPTION">-n</TT> option lists the matching lines,	      together with line numbers.</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep -n Linux osinfo.txt</B></TT> <TTCLASS="COMPUTEROUTPUT">2:This is a file containing information about Linux. 6:The GPL governs the distribution of the Linux operating system.</TT> 	      </PRE></TD></TR></TABLE>	      </P><P>The <TTCLASS="OPTION">-v</TT> (or <TTCLASS="OPTION">--invert-match</TT>)	      option <ICLASS="FIRSTTERM">filters out</I> matches.	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;grep pattern1 *.txt | grep -v pattern2   2&nbsp;   3&nbsp;# Matches all lines in "*.txt" files containing "pattern1",   4&nbsp;# but ***not*** "pattern2".	      </PRE></TD></TR></TABLE></P><P>The <TTCLASS="OPTION">-c</TT> (<TTCLASS="OPTION">--count</TT>)	      option gives a numerical count of matches, rather than	      actually listing the matches.	        <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRE
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -