📄 ch43_21.htm
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 43] 43.21 Preprocessing troff Input with sed </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly & Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1998-10-23T15:52:07Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch43_01.htm"TITLE="43. Printing"><LINKREL="prev"HREF="ch43_20.htm"TITLE="43.20 Displaying a troff Macro Definition "><LINKREL="next"HREF="ch43_22.htm"TITLE="43.22 Converting Text Files to PostScript "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch43_20.htm"TITLE="43.20 Displaying a troff Macro Definition "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 43.20 Displaying a troff Macro Definition "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 43<BR>Printing</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch43_22.htm"TITLE="43.22 Converting Text Files to PostScript "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 43.22 Converting Text Files to PostScript "BORDER="0"></A></TD></TR></TABLE> <HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-7776">43.21 Preprocessing troff Input with sed </A></H2><PCLASS="para"><ACLASS="indexterm"NAME="UPT-ART-7776-IX-COMMANDS-SED-EDITOR"></A><ACLASS="indexterm"NAME="UPT-ART-7776-IX-SED-EDITOR"></A><ACLASS="indexterm"NAME="UPT-ART-7776-IX-TROFF-PROGRAM-PREPROCESSING"></A><ACLASS="indexterm"NAME="UPT-ART-7776-IX-TYPOGRAPHICAL-CHARACTERS"></A><ACLASS="indexterm"NAME="AUTOID-48763"></A>On a typewriter-like device (including a CRT), an em-dashis typed as a pair of hyphens (<CODECLASS="literal">--</CODE>).[2]In typesetting, it isprinted as a single, long dash ( - ).<EMCLASS="emphasis">troff</EM> provides aspecial character name for the em-dash, but it is inconvenient to type <CODECLASS="literal">\ - </CODE>, and the escapesequence is also inappropriate for use with <EMCLASS="emphasis">nroff</EM>.</P><BLOCKQUOTECLASS="footnote"><PCLASS="para">[2] Typists often use three hyphens (<CODECLASS="literal">---</CODE>) for an em-dash,and two (<CODECLASS="literal">--</CODE>) for the shorter en-dash.</P></BLOCKQUOTE><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-48774"></A><ACLASS="indexterm"NAME="AUTOID-48776"></A><ACLASS="indexterm"NAME="AUTOID-48779"></A><ACLASS="indexterm"NAME="UPT-ART-7776-IX-TYPESETTING-TYPOGRAPHICAL-CHARACTERS"></A>Similarly, a typesetter provides "curly" quotation marks ("and ") as opposed to a typewriter's straight quotes (<CODECLASS="literal"><"></CODE>).In standard <EMCLASS="emphasis">troff</EM>, you can substitute two backquote characters(<CODECLASS="literal">"</CODE>) for open quote and two frontquote characters(<CODECLASS="literal">"</CODE>) for closed quote; these characters would appearas " and ". But it would be much betterif we could just continue to type in <CODECLASS="literal"><"></CODE> and have the computerdo the dirty work.</P><PCLASS="para">A peculiarity of<EMCLASS="emphasis">troff</EM> is that it generates the space before each word in the font used at the beginning of that word. This means that when we mix a constant-width font such as Courier within text, we get a noticeably large space before each word, which can be distractingfor readers - for example: The following<CODECLASS="literal"> text</CODE> is in<CODECLASS="literal"> Courier</CODE>; note the<CODECLASS="literal"> spaces</CODE>.The fix for this is to force<EMCLASS="emphasis">troff</EM> to generate the space in the previous font by insertinga no-space character (<CODECLASS="literal">\&</CODE>) before each constant-width font change. As you can imagine, this can turn into a largeundertaking.</P><PCLASS="para">The solution for each of these problems is to preprocess <EMCLASS="emphasis">troff</EM> input with<SPANCLASS="link"><EMCLASS="emphasis">sed</EM> (<ACLASS="linkend"HREF="ch34_24.htm"TITLE="Quick Reference: sed ">34.24</A>)</SPAN>.This is an application that shows <EMCLASS="emphasis">sed</EM>in its role as a true stream editor, making edits in a pipeline - editsthat are never written back into a file.</P><PCLASS="para">We almost never invoke <EMCLASS="emphasis">troff</EM> directly.Instead, we invoke it with ascript that strings together a pipeline including the standardpreprocessors (when appropriate) as well as doing this specialpreprocessing with <EMCLASS="emphasis">sed</EM>.</P><PCLASS="para">The <EMCLASS="emphasis">sed</EM> commands themselves are fairly simple.</P><PCLASS="para">The following command changes two consecutive dashesinto an em-dash:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">s/-/\\(em/g</PRE></BLOCKQUOTE></P><PCLASS="para">We double the backslashes in the replacement stringfor <CODECLASS="literal">\ - </CODE>, since the backslash has a special meaning to <EMCLASS="emphasis">sed</EM>.</P><PCLASS="para">However, there may be cases in which we don't want this substitutioncommand to be applied.What if someone is using hyphens to draw ahorizontal line? We can refine the script to exclude lines containing three or more consecutive hyphens. To do this, we use the<SPANCLASS="link">! address modifier (<ACLASS="linkend"HREF="ch34_19.htm"TITLE="Making Edits Everywhere Except... ">34.19</A>)</SPAN>:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">/--/!s/-/\\(em/g</PRE></BLOCKQUOTE></P><PCLASS="para">It may take a moment to penetrate this syntax. What's differentis that we use a pattern address to restrict the linesthat are affected by the substitute command, and we use ! toreverse the sense of the pattern match. It says, simply, "Ifyou find a line containing three consecutive hyphens, don't apply theedit." Onall other lines, the substitute command will be applied. </P><PCLASS="para">Similarly, to deal with the font change problem, we can use <EMCLASS="emphasis">sed</EM>to search for all strings matching <CODECLASS="literal">\f(CW</CODE>, <CODECLASS="literal">\f(CI</CODE>, and<CODECLASS="literal">\f(CB</CODE>, and insert <CODECLASS="literal">\&</CODE> before them. This can bewritten as follows:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">s/\\f(C[WIB]/\\\&&/g</PRE></BLOCKQUOTE></P><PCLASS="para">To deal with the open and closed quoteproblem, the script needs to be more involved because there aremany separate cases that must be accounted for.You need to make <EMCLASS="emphasis">sed</EM> smart enough to change doublequotes to open quotes only at the beginning of words and tochange them to closed quotes only at the end of words.Such a script might look like the one below, which obviouslycould be shortened by judicious application of<SPANCLASS="link"><CODECLASS="literal">\([...]\)</CODE> (<ACLASS="linkend"HREF="ch34_10.htm"TITLE="Referencing Portions of a Search String ">34.10</A>)</SPAN>regular expression syntax, but it is shown in its long form for effect.</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">s/^"/``/s/"$/''/s/"? /''? /gs/"?$/''?/s/ "/ ``/gs/" /'' /gs/<KBDCLASS="keycap">[TAB]</KBD>"/<KBDCLASS="keycap">[TAB]</KBD>``/gs/"<KBDCLASS="keycap">[TAB]</KBD>/''<KBDCLASS="keycap">[TAB]</KBD>/gs/")/'')/gs/"]/'']/gs/("/(``/gs/\["/\[``/gs/";/'';/gs/":/'':/gs/,"/,''/gs/",/'',/gs/\."/.\\\&''/gs/"\./''.\\\&/gs/"\\(em/''\\(em/gs/\\(em"/\\(em``/g</PRE></BLOCKQUOTE></P><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="cleanup.sed">cleanup.sed</A><BR></TH><TDVALIGN="TOP"> The preceding code shows the kind of contortions you need to go through to captureall the possible situations in which quotation marks appear.The solution to the other problems mentioned earlier in the article is leftfor your imagination.If you prefer,a more complete "typesetting preprocessor" script written in <EMCLASS="emphasis">sed</EM>,and suitable for integration into a <EMCLASS="emphasis">troff</EM> environment (perhapswith a bit of tweaking), can be found on the disc.</TD></TR></TABLE><PCLASS="para">In addition to the changes described above, it tightens up thespacing of ellipses (...), and <SPANCLASS="link">doesn't do anything between certain pairs of <EMCLASS="emphasis">troff</EM> macros (<ACLASS="linkend"HREF="ch34_19.htm"TITLE="Making Edits Everywhere Except... ">34.19</A>)</SPAN>.<ACLASS="indexterm"NAME="AUTOID-48843"></A><ACLASS="indexterm"NAME="AUTOID-48844"></A><ACLASS="indexterm"NAME="AUTOID-48845"></A><ACLASS="indexterm"NAME="AUTOID-48846"></A><ACLASS="indexterm"NAME="AUTOID-48847"></A></P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">TOR,</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch43_20.htm"TITLE="43.20 Displaying a troff Macro Definition "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 43.20 Displaying a troff Macro Definition "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch43_22.htm"TITLE="43.22 Converting Text Files to PostScript "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 43.22 Converting Text Files to PostScript "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">43.20 Displaying a troff Macro Definition </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">43.22 Converting Text Files to PostScript </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed & awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -