📄 ps2ascii.ps
字号:
% Copyright (C) 1991, 1995, 1996, 1998 Aladdin Enterprises. All rights reserved.
%
% This file is part of Aladdin Ghostscript.
%
% Aladdin Ghostscript is distributed with NO WARRANTY OF ANY KIND. No author
% or distributor accepts any responsibility for the consequences of using it,
% or for whether it serves any particular purpose or works at all, unless he
% or she says so in writing. Refer to the Aladdin Ghostscript Free Public
% License (the "License") for full details.
%
% Every copy of Aladdin Ghostscript must include a copy of the License,
% normally in a plain ASCII text file named PUBLIC. The License grants you
% the right to copy, modify and redistribute Aladdin Ghostscript, but only
% under certain conditions described in the License. Among other things, the
% License requires that the copyright notice and this notice be preserved on
% all copies.
% $Id: ps2ascii.ps $
% Extract the ASCII text from a PostScript file. Nothing is displayed.
% Instead, ASCII information is written to stdout. The idea is similar to
% Glenn Reid's `distillery', only a lot more simple-minded, and less robust.
% If SIMPLE is defined, just the text is written, with a guess at line
% breaks and word spacing. If SIMPLE is not defined, lines are written
% to stdout as follows:
%
% F <height> <width> (<fontname>)
% Indicate the font height and the width of a space.
%
% P
% Indicate the end of the page.
%
% S <x> <y> (<string>) <width>
% Display a string.
%
% <width> and <height> are integer dimensions in units of 1/720".
% <x> and <y> are integer coordinates, in units of 1/720", with the origin
% at the lower left.
% <string> and <fontname> are strings represented with the standard
% PostScript escape conventions.
% If COMPLEX is defined, the following additional types of lines are
% written to stdout.
%
% C <r> <g> <b>
% Indicate the current color.
%
% I <x> <y> <width> <height>
% Note the presence of an image.
%
% R <x> <y> <width> <height>
% Fill a rectangle.
%
% <r>, <g>, and <b> are RGB values expressed as integers between 0 and 1000.
%
% Note that future versions of this program (in COMPLEX mode) may add
% other output elements, so programs parsing the output should be
% prepared to ignore elements that they do not recognize.
% Note that this code will only work in all cases if systemdict is writable
% and if `binding' the definitions of operators defined as procedures
% is deferred. For this reason, it is normally invoked with
% gs -q -dNODISPLAY -dNOBIND -dWRITESYSTEMDICT ps2ascii.ps
% Thanks to:
% J Greely <jgreely@cis.ohio-state.edu> for improvements to this code;
% Jerry Whelan <jerryw@abode.ccd.bnl.gov> for motivating other improvements;
% David M. Jones <dmjones@theory.lcs.mit.edu> for improvements noted below.
%% Additional modifications by David M. Jones
%% (dmjones@theory.lcs.mit.edu), December 23, 1997
%%
%% (a) Rewrote forall loop at the end of .show.write. This fixes a
%% stack leakage problem, but the changes are more significant
%% than that.
%%
%% .char.map includes the names of all characters in the
%% StandardEncoding, ISOLatin1Encoding, OT1Encoding and
%% T1Encoding vectors. Thus, if the Encoding vector for the
%% current font contains a name that is not in .char.map, it's
%% redundant to check if the Encoding vector is equal to one of
%% the known vectors. Previous versions of ps2ascii would give
%% up at this point, and substitute an asterisk (*) for the
%% character. I've taken the liberty of instead using the
%% OT1Encoding vector to translate the character, on the grounds
%% that in the cases I'm most interested in, a font without a
%% useful Encoding vector was most likely created by a DVI to PS
%% converter such as dvips or DVILASER (and OT1Encoding is
%% largely compatible with StandardEncoding anyway). [Note that
%% this does not make my earlier changes to support dvips (see
%% fix (a) under my 1996 changes) completely obsolete, since
%% there's additional useful information I can extract in that
%% case.]
%%
%% Overall, this should provide better support for some documents
%% (e.g, DVILASER documents will no longer be translated into a
%% series of *'s) without breaking any other documents any worse
%% than they already were broken.
%%
%% (b) Fixed two bugs in dvips.df-tail: (1) changed "dup 127" to "dup
%% 128" to fix fencepost error, and (2) gave each font it's own
%% FontName rather than having all fonts share the same name.
%%
%% (c) Added one further refinement to the heuristic for detecting
%% paragraph breaks: do not ever start a new paragraph after a
%% line ending in a hyphen.
%%
%% (d) Added a bunch of missing letters from the T1Encoding,
%% OT1Encoding and ISOLatin1Encoding vectors to .letter.chars to
%% improve hyphen-elimination algorithm. This still won't help
%% if there's no useful Encoding vector.
%%
%% NOTE: A better solution to the problem of missing Encoding vectors
%% might be to redefine definefont to check whether the Encoding
%% vector is sensible and, if not, replace it by a default. This
%% would alleviate the need for constant tests in the .show.write
%% loop, as well as automatically solving the problem noted in fix
%% (d) above, and the similar problem with .break.chars. This should
%% be investigated. Also, the hyphen-elimination algorithm really
%% needs to be looked at carefully and rethought.
%%* Modifications to ps2ascii.ps by David M. Jones
%%* (dmjones@theory.lcs.mit.edu), June 25-July 8, 1996
%%* Modifications:
%%*
%%* (a) added code to give better support for dvips files by providing
%%* FontBBox's, FontName's and Encoding vectors for downloaded
%%* bitmap fonts. This is done by using dvips's start-hook to
%%* overwrite the df-tail and D procedures that dvips uses to
%%* define its Type 3 bitmap fonts. Thus, this change should
%%* provide better support for dvips-generated PS files without
%%* affecting the handling of other documents.
%%*
%%* (b) Fixed two bugs that could potentially affect any PS file, not
%%* just those created by dvips: (1) added missing "get" operator
%%* in .show.write and (2) fixed bug that caused a hyphen at the
%%* end of a line to be replaced by a space rather than begin
%%* deleted. Note that the first bug was a source of stack
%%* leakage, causing ps2ascii to run out of operand stack space
%%* occasionally.
%%*
%%* Search for "%%* BF" to find these modifications.
%%*
%%* (c) Improved the heuristic for determining whether a line break
%%* has occurred and whether a line break represents a paragraph
%%* break. Previously, any change in the vertical position caused
%%* a line break; now a line break is only registered if the
%%* change is larger than the height of the current font. This
%%* means that superscripts, subscripts, and such things as
%%* shifted accents generated by TeX won't cause line breaks.
%%* Paragraph-recognition is now done by comparing the indentation
%%* of the new line to the indentation of the previous line and by
%%* comparing the vertical distance between the new line and the
%%* previous line to the vertical distance between the previous
%%* line and its predecessor.
%%*
%%* (d) Added a hook for renaming the files where stdout and stderr
%%* go.
%%*
%%* In general, my additions or changes to the code are described in
%%* comments beginning with "%%*". However, there are numerous other
%%* places where I have either re-formatted code or added comments to
%%* the code while I was trying to understand it. These are usually
%%* not specially marked.
%%*
/QUIET true def
systemdict wcheck { systemdict } { userdict } ifelse begin
/.max where { pop } { /.max { 2 copy lt { exch } if pop } bind def } ifelse
/COMPLEX dup where { pop true } { false } ifelse def
/SIMPLE dup where { pop true } { false } ifelse def
/setglobal where
{ pop currentglobal /setglobal load true setglobal }
{ { } }
ifelse
% Define a way to store and retrieve integers that survives save/restore.
/.i.string0 (0 ) def
/.i.string .i.string0 length string def
/.iget { cvi } bind def
/.iput { exch //.i.string exch copy cvs pop } bind def
/.inew { //.i.string0 dup length string copy } bind def
% We only want to redefine operators if they are defined already.
/codef { 1 index where { pop def } { pop pop } ifelse } def
% Redefine the end-of-page operators.
/erasepage { } codef
/copypage { SIMPLE { (\014) } { (P\n) } ifelse //print } codef
/showpage { copypage erasepage initgraphics } codef
% Redefine the fill operators to detect rectangles.
/.orderrect % <llx> <lly> <urx> <ury> .orderrect <llx> <lly> <w> <h>
{ % Ensure llx <= urx, lly <= ury.
1 index 4 index lt { 4 2 roll } if
dup 3 index lt { 3 1 roll exch } if
exch 3 index sub exch 2 index sub
} odef
/.fillcomplex
{ % Do a first pass to see if the path is all rectangles in
% the output coordinate system. We don't worry about overlapping
% rectangles that might be partially not filled.
% Stack: mark llx0 lly0 urx0 ury0 ... true mark x0 y0 ...
mark true mark
% Add a final moveto so we pick up any trailing unclosed subpath.
0 0 itransform moveto
{ .coord counttomark 2 gt
{ counttomark 4 gt { .fillcheckrect } { 4 2 roll pop pop } ifelse }
if
}
{ .coord }
{ cleartomark not mark exit }
{ counttomark -2 roll 2 copy counttomark 2 roll .fillcheckrect }
pathforall cleartomark
{ .showcolor counttomark 4 idiv
{ counttomark -4 roll .orderrect
(R ) //print .show==4
}
repeat pop
}
{ cleartomark
}
ifelse
} odef
/.fillcheckrect
{ % Check whether the current subpath is a rectangle.
% If it is, add it to the list of rectangles being accumulated;
% if not exit the .fillcomplex loop.
% The subpath has not been closed.
% Stack: as in .fillcomplex, + newx newy
counttomark 10 eq { 9 index 9 index 4 2 roll } if
counttomark 12 ne { cleartomark not mark exit } if
12 2 roll
% Check for the two possible forms of rectangles:
% x0 y0 x0 y1 x1 y1 x1 y0 x0 y0
% x0 y0 x1 y0 x1 y1 x0 y1 x0 y0
9 index 2 index eq 9 index 2 index eq and
10 index 9 index eq
{ % Check for first form.
7 index 6 index eq and 6 index 5 index eq and 3 index 2 index eq and
}
{ % Check for second form.
9 index 8 index eq and
8 index 7 index eq and 5 index 4 index eq and 4 index 3 index eq and
}
ifelse not { cleartomark not mark exit } if
% We have a rectangle.
pop pop pop pop 4 2 roll pop pop 8 4 roll
} odef
/eofill { COMPLEX { .fillcomplex } if newpath } codef
/fill { COMPLEX { .fillcomplex } if newpath } codef
/rectfill { gsave newpath .rectappend fill grestore } codef
/ueofill { gsave newpath uappend eofill grestore } codef
/ufill { gsave newpath uappend fill grestore } codef
% Redefine the stroke operators to detect rectangles.
/rectstroke
{ gsave newpath
dup type dup /arraytype eq exch /packedarraytype eq or
{ dup length 6 eq { exch .rectappend concat } { .rectappend } ifelse }
{ .rectappend }
ifelse stroke grestore
} codef
/.strokeline % <fromx> <fromy> <tox> <toy> .strokeline <tox> <toy>
% Note: fromx and fromy are in output coordinates;
% tox and toy are in user coordinates.
{ .coord 2 copy 6 2 roll .orderrect
% Add in the line width. Assume square or round caps.
currentlinewidth 2 div dup .dcoord add abs 1 max 5 1 roll
4 index add 4 1 roll 4 index add 4 1 roll
4 index sub 4 1 roll 5 -1 roll sub 4 1 roll
(R ) //print .show==4
} odef
/.strokecomplex
{ % Do a first pass to see if the path is all horizontal and vertical
% lines in the output coordinate system.
% Stack: true mark origx origy curx cury
true mark null null null null
{ .coord 6 2 roll pop pop pop pop 2 copy }
{ .coord 1 index 4 index eq 1 index 4 index eq or
{ 4 2 roll pop pop }
{ cleartomark not mark exit }
ifelse
}
{ cleartomark not mark exit }
{ counttomark -2 roll 2 copy counttomark 2 roll
1 index 4 index eq 1 index 4 index eq or
{ pop pop 2 copy }
{ cleartomark not mark exit }
ifelse
}
pathforall cleartomark
0 currentlinewidth .dcoord 0 eq exch 0 eq or and
% Do the second pass to write out the rectangles.
% Stack: origx origy curx cury
{ .showcolor null null null null
{ 6 2 roll pop pop pop pop 2 copy .coord }
{ .strokeline }
{ }
{ 3 index 3 index .strokeline }
pathforall pop pop pop pop
}
if
} odef
/stroke { COMPLEX { .strokecomplex } if newpath } codef
/ustroke
{ gsave newpath
dup length 6 eq { exch uappend concat } { uappend } ifelse
stroke grestore
} codef
% The image operators must read the input and note the dimensions.
% Eventually we should redefine these to detect 1-bit-high all-black images,
% since this is how dvips does underlining (!).
/.noteimagerect % <width> <height> <matrix> .noteimagerect -
{ COMPLEX
{ gsave setmatrix itransform 0 0 itransform
grestore .coord 4 2 roll .coord .orderrect
(I ) //print .show==4
}
{ pop pop pop
}
ifelse
} odef
/colorimage where
{ pop /colorimage
{ 1 index
{ dup 6 add index 1 index 6 add index 2 index 5 add index }
{ 6 index 6 index 5 index }
ifelse .noteimagerect gsave nulldevice //colorimage grestore
} codef
} if
/.noteimage % Arguments as for image[mask]
{ dup type /dicttype eq
{ dup /Width get 1 index /Height get 2 index /ImageMatrix get }
{ 4 index 4 index 3 index }
ifelse .noteimagerect
} odef
/image { .noteimage gsave nulldevice //image grestore } codef
/imagemask { .noteimage gsave nulldevice //imagemask grestore } codef
% Output the current color if necessary.
/.color.r .inew def
.color.r -1 .iput % make sure we write the color at the beginning
/.color.g .inew def
/.color.b .inew def
/.showcolor
{ COMPLEX
{ currentrgbcolor
1000 mul round cvi
3 1 roll 1000 mul round cvi
exch 1000 mul round cvi
% Stack: b g r
dup //.color.r .iget eq
2 index //.color.g .iget eq and
3 index //.color.b .iget eq and
{ pop pop pop
}
{ (C ) //print
dup //.color.r exch .iput .show==only
( ) //print dup //.color.g exch .iput .show==only
( ) //print dup //.color.b exch .iput .show==only
(\n) //print
}
ifelse
}
if
} bind def
% Redefine `show'.
% Set things up so our output will be in tenths of a point, with origin at
% lower left. This isolates us from the peculiarities of individual devices.
/.show.ident.matrix matrix def
/.show.ident { % - .show.ident <scale> <matrix>
% //.show.ident.matrix defaultmatrix
% % Assume the original transformation is well-behaved.
% 0.1 0 2 index dtransform abs exch abs .max /.show.scale exch def
% 0.1 dup 3 -1 roll scale
gsave initmatrix
% Assume the original transformation is well-behaved.
0.1 0 dtransform abs exch abs .max
0.1 dup scale .show.ident.matrix currentmatrix
grestore
} bind def
/.coord { % <x> <y> .coord <x'> <y'>
transform .show.ident exch pop itransform
exch round cvi exch round cvi
} odef
/.dcoord { % <dx> <dy> .coord <dx'> <dy'>
% Transforming distances is trickier, because
% the coordinate system might be rotated.
.show.ident pop 3 1 roll
exch 0 dtransform
dup mul exch dup mul add sqrt
2 index div round cvi
exch 0 exch dtransform
dup mul exch dup mul add sqrt
3 -1 roll div round cvi
} odef
% Remember the current X, Y, and height.
/.show.x .inew def
/.show.y .inew def
/.show.height .inew def
% Remember the last character of the previous string; if it was a
% hyphen preceded by a letter, we didn't output the hyphen.
/.show.last (\000) def
% Remember the current font.
/.font.name 130 string def
/.font.name.length .inew def
/.font.height .inew def
/.font.width .inew def
%%* Also remember indentation of current line and previous vertical
%%* skip
/.show.indent .inew def
/.show.dy .inew def
% We have to redirect stdout somehow....
/.show.stdout { (%stdout) (w) file } bind def
% Make sure writing will work even if a program uses =string.
/.show.string =string length string def
/.show.=string =string length string def
/.show==only
{ //=string //.show.=string copy pop
dup type /stringtype eq
{ dup length //.show.string length le
{ dup rcheck { //.show.string copy } if
} if
} if
.show.stdout exch write==only
//.show.=string //=string copy pop
} odef
/.show==4
{ 4 -1 roll .show==only ( ) //print
3 -1 roll .show==only ( ) //print
exch .show==only ( ) //print
.show==only (\n) //print
} odef
/.showwidth % Same as stringwidth, but disable COMPLEX so that
% we don't try to detect rectangles during BuildChar.
{ COMPLEX
{ /COMPLEX false def stringwidth /COMPLEX true def }
{ stringwidth }
ifelse
} odef
/.showfont % <string> .showfont <string>
{ gsave
% Try getting the height and width of the font from the FontBBox.
currentfont /FontBBox .knownget not { {0 0 0 0} } if
aload pop % llx lly urx ury
exch 4 -1 roll % lly ury urx llx
sub % lly ury dx
3 1 roll exch % dx ury lly
sub % dx dy
2 copy .max 0 ne
{ currentfont /FontMatrix get dtransform
}
{ pop pop
% Fonts produced by dvips, among other applications, have
% BuildChar procedures that bomb out when given unexpected
% characters, and there is no way to determine whether a given
% character will do this. So for Type 1 fonts, we measure a
% typical character ('X'); for others, we punt.
currentfont /FontType get 1 eq
{ (X) .showwidth pop dup 1.3 mul
}
{ % No safe way to get the character size. Punt.
0 0
}
ifelse
}
ifelse .dcoord exch
currentfont /FontName .knownget not { () } if
dup type /stringtype ne { //.show.string cvs } if
grestore
% Stack: height width fontname
SIMPLE
{ pop pop //.show.height exch .iput }
{ 2 index //.font.height .iget eq
2 index //.font.width .iget eq and
1 index //.font.name 0 //.font.name.length .iget getinterval eq and
{ pop pop pop
}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -