⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pstotext.1

📁 GSview 4.6 PostScript previewer。Ghostscript在MS-Windows, OS/2 and Unix下的图形化接口
💻 1
字号:
.\" This file generated automatically by mtex2man(1).nh.TH "pstotext" "1".SH "NAME"pstotext \- extract ASCII text from a PostScript or PDF file.SH "SYNTAX"\fBpstotext [option|pathname]...\fR.PPwhere option includes:.PP.PD 0.RS 0.TP 6\-cork.TP 6\-landscape.TP 6\-landscapeOther.TP 6\-portrait.TP 6\-.TP 6\-output file.TP 6\-gs command.TP 6\-debug.TP 6\-bboxes.RE.PD.PP.SH "DESCRIPTION"\fBpstotext\fR reads one or more PostScript or PDF files, and writes tostandard output a representation of the plain text that would bedisplayed if the PostScript file were printed.  As is described in theDETAILS section below, this representation is only an approximation.Nevertheless, it is often useful for information retrieval (e.g.,running grep(1) or building a full\-text index) or to recover the textfrom a PostScript file whose source you have lost..PP\fBpstotext\fR calls Ghostscript, and requires Aladdin Ghostscriptversion 3.51 or newer.  Ghostscript must be invokable on the currentsearch path as gs.  Alternatively, you can use the \-gs option tospecify the command (pathname and options) to run Ghostscript.  Forexample, on Windows you might use \-gs "c:\\gs\\gswin32c.exe\-Ic:\\gs;c:\\gs\\fonts"..PP\fBpstotext\fR reads and processes its command line from left to right,ignoring the case of options.  When it encounters a pathname, it opensthe file and expects to find a PostScript job or PDF document toprocess.  The option \- means to read and process a PostScript job fromstandard input.  If no \- or pathname arguments are encountered,\fBpstotext\fR reads a PostScript job from standard input. (PDFdocuments require random access, hence cannot be read from standardinput.) You can use the \-output option to specify an output file(remember to invoke it \fIbefore\fR the input file); otherwise\fBpstotext\fR writes to standard output..PPThe option \-cork is only relevant for PostScript files produced bydvips from TeX or LaTeX documents; it tells \fBpstotext\fR to use theCork encoding (known as T1 in LaTeX) rather than the old TeX textencoding (known as OT1 in LaTeX). Unfortunately files produced bydvips don't distinguish which font encodings were used..PPThe options \-landscape and \-landscapeOther should be used fordocuments that must be rotated 90 degrees clockwise orcounterclockwise, respectively, in order to be readable..PPThe options \-debug and \-bboxes are mostly of use for the maintainersof \fBpstotext\fR.  \-debug shows Ghostscript output and errormessages. \-bboxes outputs one word per line with bounding boxinformation..SH "DETAILS"\fBpstotext\fR does its work by telling Ghostscript to load aPostScript library that causes it to write to its standard outputinformation about each string rendered by a PostScript job or PDFdocument.  This information includes the characters of the string, andenough additional information to approximate the string's boundingrectangle.  \fBpstotext\fR post\-processes this information and outputsa sequence of words delimited by space, newline, and formfeed..PP\fBpstotext\fR outputs words in the same sequence as they are renderedby the document.  This usually, but not always, follows the order thata human would read the words on a page.  Within this sequence, wordsare separated by either space or newline depending on whether or notthey fall on the same line.  Each page is terminated with a formfeed.If you use the incorrect option from the set {\-portrait, \-landscape,\-landscapeOther}, \fBpstotext\fR is likely to substitute newline forspace..PPA PostScript job or PDF document often renders one word as severalstrings in order to get correct spacing between particular pairs ofcharacters.  \fBpstotext\fR does its best to assemble these stringsback into words, using a simple heuristic: strings separated by adistance of less than 0.3 times the minimum of the average characterwidths in the two strings are considered to be part of the same word.Note that this typically causes leading and trailing punctuationcharacters to be included with a word..PPThe PostScript language provides a flexible encoding scheme by whichcharacter codes in strings select specific characters (symbols), so aPostScript job is free to use any character code.  On the other hand,\fBpstotext\fR always translates to the ISO 8859\-1 (Latin\-1) charactercode, which is an extension to ASCII covering most of the WesternEuropean languages.  When a character isn't present in ISO 8859\-1,\fBpstotext\fR uses a sequence of characters, e.g., "\-\-\-" for em dashor "A\\226" for Abreve.  \fBpstotext\fR can be fooled by a font whoseEncoding vector doesn't follow Adobe's conventions, but it containsheuristics allowing it to handle a wide variety of misbehaving fonts..PP(\fBpstotext\fR no longer translates hyphen (\\255) to minus (\\055).).SH "AUTHOR"Andrew Birrell (PostScript libraries), Paul McJones (application),Russell Lang (Windows and OS/2 adaptation), and Hunter Goatley (VMSadaptation)..SH "SEE ALSO"\fBpstotext\fR incorporates technology originally developed for theVirtual Paper project at SRC; seehttp://www.research.digital.com/SRC/virtualpaper/..PPAs mentioned above, \fBpstotext\fR invokes Ghostscript.  See gs(1) orhttp://www.cs.wisc.edu/~ghost/..SH "COPYRIGHT".PPCopyright 1995\-8 Digital Equipment Corporation..brDistributed only by permission..brSee file pstotext.txt for details..br.BR.PP.EXLast modified on Sat Feb  5 21:00:00 AEST 2000 by rjl     modified on Fri Jun  5 14:02:37 PDT 1998 by mcjones       modified on Wed Jun  7 17:47:56 PDT 1995 by birrell  .EE.PPThis file was generated automatically by mtex software; see themtex home page at http://www.research.digital.com/SRC/mtex/.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -