📄 textproc.html
字号:
><ANAME="JOINREF"></A><BCLASS="COMMAND">join</B></DT><DD><P>Consider this a special-purpose cousin of <BCLASS="COMMAND">paste</B>. This powerful utility allows merging two files in a meaningful fashion, which essentially creates a simple version of a relational database.</P><P>The <BCLASS="COMMAND">join</B> command operates on exactly two files, but pastes together only those lines with a common tagged field (usually a numerical label), and writes the result to <TTCLASS="FILENAME">stdout</TT>. The files to be joined should be sorted according to the tagged field for the matchups to work properly.</P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 File: 1.data 2 3 100 Shoes 4 200 Laces 5 300 Socks</PRE></TD></TR></TABLE></P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 File: 2.data 2 3 100 $40.00 4 200 $1.00 5 300 $2.00</PRE></TD></TR></TABLE></P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>join 1.data 2.data</B></TT> <TTCLASS="COMPUTEROUTPUT">File: 1.data 2.data 100 Shoes $40.00 200 Laces $1.00 300 Socks $2.00</TT> </PRE></TD></TR></TABLE> </P><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>The tagged field appears only once in the output.</P></TD></TR></TABLE></DIV></DD><DT><ANAME="HEADREF"></A><BCLASS="COMMAND">head</B></DT><DD><P>lists the beginning of a file to <TTCLASS="FILENAME">stdout</TT>. The default is <TTCLASS="LITERAL">10</TT> lines, but this can be changed. The command has a number of interesting options. <DIVCLASS="EXAMPLE"><HR><ANAME="SCRIPTDETECTOR"></A><P><B>Example 15-13. Which files are scripts?</B></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 #!/bin/bash 2 # script-detector.sh: Detects scripts within a directory. 3 4 TESTCHARS=2 # Test first 2 characters. 5 SHABANG='#!' # Scripts begin with a "sha-bang." 6 7 for file in * # Traverse all the files in current directory. 8 do 9 if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]] 10 # head -c2 #! 11 # The '-c' option to "head" outputs a specified 12 #+ number of characters, rather than lines (the default). 13 then 14 echo "File \"$file\" is a script." 15 else 16 echo "File \"$file\" is *not* a script." 17 fi 18 done 19 20 exit 0 21 22 # Exercises: 23 # --------- 24 # 1) Modify this script to take as an optional argument 25 #+ the directory to scan for scripts 26 #+ (rather than just the current working directory). 27 # 28 # 2) As it stands, this script gives "false positives" for 29 #+ Perl, awk, and other scripting language scripts. 30 # Correct this.</PRE></TD></TR></TABLE><HR></DIV> <DIVCLASS="EXAMPLE"><HR><ANAME="RND"></A><P><B>Example 15-14. Generating 10-digit random numbers</B></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 #!/bin/bash 2 # rnd.sh: Outputs a 10-digit random number 3 4 # Script by Stephane Chazelas. 5 6 head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p' 7 8 9 # =================================================================== # 10 11 # Analysis 12 # -------- 13 14 # head: 15 # -c4 option takes first 4 bytes. 16 17 # od: 18 # -N4 option limits output to 4 bytes. 19 # -tu4 option selects unsigned decimal format for output. 20 21 # sed: 22 # -n option, in combination with "p" flag to the "s" command, 23 # outputs only matched lines. 24 25 26 27 # The author of this script explains the action of 'sed', as follows. 28 29 # head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p' 30 # ----------------------------------> | 31 32 # Assume output up to "sed" --------> | 33 # is 0000000 1198195154\n 34 35 # sed begins reading characters: 0000000 1198195154\n. 36 # Here it finds a newline character, 37 #+ so it is ready to process the first line (0000000 1198195154). 38 # It looks at its <range><action>s. The first and only one is 39 40 # range action 41 # 1 s/.* //p 42 43 # The line number is in the range, so it executes the action: 44 #+ tries to substitute the longest string ending with a space in the line 45 # ("0000000 ") with nothing (//), and if it succeeds, prints the result 46 # ("p" is a flag to the "s" command here, this is different 47 #+ from the "p" command). 48 49 # sed is now ready to continue reading its input. (Note that before 50 #+ continuing, if -n option had not been passed, sed would have printed 51 #+ the line once again). 52 53 # Now, sed reads the remainder of the characters, and finds the 54 #+ end of the file. 55 # It is now ready to process its 2nd line (which is also numbered '$' as 56 #+ it's the last one). 57 # It sees it is not matched by any <range>, so its job is done. 58 59 # In few word this sed commmand means: 60 # "On the first line only, remove any character up to the right-most space, 61 #+ then print it." 62 63 # A better way to do this would have been: 64 # sed -e 's/.* //;q' 65 66 # Here, two <range><action>s (could have been written 67 # sed -e 's/.* //' -e q): 68 69 # range action 70 # nothing (matches line) s/.* // 71 # nothing (matches line) q (quit) 72 73 # Here, sed only reads its first line of input. 74 # It performs both actions, and prints the line (substituted) before 75 #+ quitting (because of the "q" action) since the "-n" option is not passed. 76 77 # =================================================================== # 78 79 # An even simpler altenative to the above one-line script would be: 80 # head -c4 /dev/urandom| od -An -tu4 81 82 exit 0</PRE></TD></TR></TABLE><HR></DIV> See also <AHREF="filearchiv.html#EX52">Example 15-38</A>.</P></DD><DT><ANAME="TAILREF"></A><BCLASS="COMMAND">tail</B></DT><DD><P>lists the (tail) end of a file to <TTCLASS="FILENAME">stdout</TT>. The default is <TTCLASS="LITERAL">10</TT> lines, but this can be changed. Commonly used to keep track of changes to a system logfile, using the <TTCLASS="OPTION">-f</TT> option, which outputs lines appended to the file.</P><DIVCLASS="EXAMPLE"><HR><ANAME="EX12"></A><P><B>Example 15-15. Using <ICLASS="FIRSTTERM">tail</I> to monitor the system log</B></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 #!/bin/bash 2 3 filename=sys.log 4 5 cat /dev/null > $filename; echo "Creating / cleaning out file." 6 # Creates file if it does not already exist, 7 #+ and truncates it to zero length if it does. 8 # : > filename and > filename also work. 9 10 tail /var/log/messages > $filename 11 # /var/log/messages must have world read permission for this to work. 12 13 echo "$filename contains tail end of system log." 14 15 exit 0</PRE></TD></TR></TABLE><HR></DIV><DIVCLASS="TIP"><TABLECLASS="TIP"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/tip.png"HSPACE="5"ALT="Tip"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>To list a specific line of a text file, <AHREF="special-chars.html#PIPEREF">pipe</A> the output of <BCLASS="COMMAND">head</B> to <BCLASS="COMMAND">tail -n 1</B>. For example <TTCLASS="USERINPUT"><B>head -n 8 database.txt | tail -n 1</B></TT> lists the 8th line of the file <TTCLASS="FILENAME">database.txt</TT>.</P><P>To set a variable to a given block of a text file: <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 var=$(head -n $m $filename | tail -n $n) 2 3 # filename = name of file 4 # m = from beginning of file, number of lines to end of block 5 # n = number of lines to set variable to (trim from end of block)</PRE></TD></TR></TABLE></P></TD></TR></TABLE></DIV><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>Newer implementations of <BCLASS="COMMAND">tail</B> deprecate the older <BCLASS="COMMAND">tail -$LINES filename</B> usage. The standard <BCLASS="COMMAND">tail -n $LINES filename</B> is correct.</P></TD></TR></TABLE></DIV><P>See also <AHREF="moreadv.html#EX41">Example 15-5</A>, <AHREF="filearchiv.html#EX52">Example 15-38</A> and <AHREF="debugging.html#ONLINE">Example 29-6</A>.</P></DD><DT><ANAME="GREPREF"></A><BCLASS="COMMAND">grep</B></DT><DD><P>A multi-purpose file search tool that uses <AHREF="regexp.html#REGEXREF">Regular Expressions</A>. It was originally a command/filter in the venerable <BCLASS="COMMAND">ed</B> line editor: <TTCLASS="USERINPUT"><B>g/re/p</B></TT> -- <ICLASS="FIRSTTERM">global - regular expression - print</I>.</P><P><P><BCLASS="COMMAND">grep</B> <TTCLASS="REPLACEABLE"><I>pattern</I></TT> [<TTCLASS="REPLACEABLE"><I>file</I></TT>...]</P>Search the target file(s) for occurrences of <TTCLASS="REPLACEABLE"><I>pattern</I></TT>, where <TTCLASS="REPLACEABLE"><I>pattern</I></TT> may be literal text or a Regular Expression.</P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep '[rst]ystem.$' osinfo.txt</B></TT> <TTCLASS="COMPUTEROUTPUT">The GPL governs the distribution of the Linux operating system.</TT> </PRE></TD></TR></TABLE> </P><P>If no target file(s) specified, <BCLASS="COMMAND">grep</B> works as a filter on <TTCLASS="FILENAME">stdout</TT>, as in a <AHREF="special-chars.html#PIPEREF">pipe</A>.</P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>ps ax | grep clock</B></TT> <TTCLASS="COMPUTEROUTPUT">765 tty1 S 0:00 xclock 901 pts/1 S 0:00 grep clock</TT> </PRE></TD></TR></TABLE> </P><P>The <TTCLASS="OPTION">-i</TT> option causes a case-insensitive search.</P><P>The <TTCLASS="OPTION">-w</TT> option matches only whole words.</P><P>The <TTCLASS="OPTION">-l</TT> option lists only the files in which matches were found, but not the matching lines.</P><P>The <TTCLASS="OPTION">-r</TT> (recursive) option searches files in the current working directory and all subdirectories below it.</P><P>The <TTCLASS="OPTION">-n</TT> option lists the matching lines, together with line numbers.</P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep -n Linux osinfo.txt</B></TT> <TTCLASS="COMPUTEROUTPUT">2:This is a file containing information about Linux. 6:The GPL governs the distribution of the Linux operating system.</TT> </PRE></TD></TR></TABLE> </P><P>The <TTCLASS="OPTION">-v</TT> (or <TTCLASS="OPTION">--invert-match</TT>) option <ICLASS="FIRSTTERM">filters out</I> matches. <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 grep pattern1 *.txt | grep -v pattern2 2 3 # Matches all lines in "*.txt" files containing "pattern1", 4 # but ***not*** "pattern2". </PRE></TD></TR></TABLE></P><P>The <TTCLASS="OPTION">-c</TT> (<TTCLASS="OPTION">--count</TT>) option gives a numerical count of matches, rather than actually listing the matches. <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRE
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -