📄 word8.htm
字号:
<html>
<head>
<title>microsoft word 97 binary file format</title>
</head>
<body background="../jpg/di1.JPG">
<h1>microsoft word 97 binary file format</h1>
<p>microsoft word 97 (aka version 8) for windows and macintosh. from the office book,
found in the microsoft office development section in the <a
href="http://premium.microsoft.com/msdn/library/">msdn online library</a>. htmlified june
1998. revised aug 1 1998, added missing definitions section. </p>
<h2>contents</h2>
<ul>
<li><a href="#01">note</a> </li>
<li><a href="#02">word and docfiles</a> </li>
<li><a href="#defin">definitions</a> </li>
<li><a href="#03">naming conventions</a> </li>
<li><a href="#04">format of the summary info stream in a word file</a> </li>
<li><a href="#05">format of the main stream in a word non-complex file</a> </li>
<li><a href="#06">format of the main stream in a complex file</a> </li>
<li><a href="#07">format of the table stream</a> </li>
<li><a href="#08">format of the data stream</a> </li>
<li><a href="#09">fib</a> </li>
<li><a href="#10">text</a> </li>
<li><a href="#11">character and paragraph formatting properties</a> </li>
<li><a href="#12">bin tables</a> </li>
<li><a href="#13">stylesheet</a> <ul>
<li><a href="#14">stshi</a> </li>
<li><a href="#15">std</a> </li>
</ul>
</li>
<li><a href="#16">list tables </a><ul>
<li><a href="#17">lst records and the rglst</a> </li>
<li><a href="#18">list names and the sttblistnames</a> </li>
<li><a href="#19">lfo records and the pllfo</a> </li>
<li><a href="#20">paragraph list formatting</a> </li>
</ul>
</li>
<li><a href="#21">sprm definitions</a> </li>
<li><a href="#22">complex file format</a> <ul>
<li><a href="#23">algorithm to determine the bounds of a paragraph containing a certain
character in a complex file</a> </li>
<li><a href="#24">algorithm to determine paragraph properties for a paragraph in a complex
file</a> </li>
<li><a href="#25">algorithm to determine table properties for a table row in a complex file</a>
</li>
<li><a href="#26">algorithm to determine the character properties of a character in a
complex file</a> </li>
<li><a href="#27">algorithm to determine the section properties of a section in a complex
file</a> </li>
<li><a href="#28">algorithm to determine the pic of a picture in a complex file.</a> </li>
</ul>
</li>
<li><a href="#29">footnotes & endnotes</a> </li>
<li><a href="#30">headers and footers</a> </li>
<li><a href="#31">page table</a> </li>
<li><a href="#32">glossary files</a> </li>
<li><a href="#33">routing slip</a> </li>
<li><a href="#34">autosummary</a> </li>
<li><a href="#35">sttbfassoc (table of associated strings)</a> </li>
<li><a href="#36">structure definitions</a> <ul>
<li><a href="#37">annotation reference descriptor (atrd)</a> </li>
<li><a href="#38">autonumbered list data descriptor (anld)</a> </li>
<li><a href="#39">autonumber level descriptor (anlv)</a> </li>
<li><a href="#40">autosummary analysis (asumy)</a> </li>
<li><a href="#41">autosummary info (asumyi)</a> </li>
<li><a href="#42">bin table entry (bte)</a> </li>
<li><a href="#43">break descriptor (bkd)</a> </li>
<li><a href="#44">bookmark first descriptor (bkf)</a> </li>
<li><a href="#45">bookmark lim descriptor (bkl)</a> </li>
<li><a href="#46">border code (brc)</a> </li>
<li><a href="#47">border code for windows word 1.0 (brc10)</a> </li>
<li><a href="#48">character properties (chp)</a> </li>
<li><a href="#49">character property exceptions (chpx)</a> </li>
<li><a href="#50">date and time (internal date format) (dttm)</a> </li>
<li><a href="#51">drop cap specifier(dcs)</a> </li>
<li><a href="#52">drawing object grid (dogrid)</a> </li>
<li><a href="#53">document typography info (doptypography)</a> </li>
<li><a href="#54">field descriptor (fld)</a> </li>
<li><a href="#55">file shape address (fspa)</a> </li>
<li><a href="#56">font family name (ffn)</a> </li>
<li><a href="#57">file information block (fib)</a> </li>
<li><a href="#58">footnote reference descriptor (frd)</a> </li>
<li><a href="#59">formatted disk page for chpxs (chpx fkp)</a> </li>
<li><a href="#60">formatted disk page for papxs (papx fkp)</a> </li>
<li><a href="#61">list level (on file) (lvlf)</a> </li>
<li><a href="#62">line spacing descriptor (lspd)</a> </li>
<li><a href="#63">list data (on file) (lstf)</a> </li>
<li><a href="#64">list format override (lfo)</a> </li>
<li><a href="#65">list format override for a single level (lfolvl)</a> </li>
<li><a href="#66">outline list data (olst)</a> </li>
<li><a href="#67">number revision mark data (numrm)</a> </li>
<li><a href="#68">page descriptor (pgd)</a> </li>
<li><a href="#69">paragraph height (phe)</a> </li>
<li><a href="#70">paragraph properties (pap)</a> </li>
<li><a href="#71">paragraph property exceptions (papx)</a> </li>
<li><a href="#72">picture descriptor (on file) (picf)</a> </li>
<li><a href="#73">piece descriptor (pcd)</a> </li>
<li><a href="#74">plex of cps stored in file (plcf)</a> </li>
<li><a href="#75">property modifier(variant 1) (prm)</a> </li>
<li><a href="#76">property modifier(variant 2) (prm)</a> </li>
<li><a href="#77">routing slip (rs)</a> </li>
<li><a href="#78">routing recipient (rr)</a> </li>
<li><a href="#79">section descriptor (sed)</a> </li>
<li><a href="#80">section properties (sep)</a> </li>
<li><a href="#81">section property exceptions (sepx)</a> </li>
<li><a href="#82">shading descriptor (shd)</a> </li>
<li><a href="#83">tab descriptor (tbd)</a> </li>
<li><a href="#84">table cell descriptors (tc)</a> </li>
<li><a href="#85">table autoformat look specifier (tlp)</a> </li>
<li><a href="#86">table properties (tap)</a> </li>
<li><a href="#87">textbox story (ftxbxs)</a> </li>
<li><a href="#88">work book (wkb)</a> </li>
</ul>
</li>
<li><a href="#89">appendix a - reading a macintosh pict graphic</a> </li>
<li><a href="#90">appendix b - calculation of font (ftc) and language (lid)</a> </li>
</ul>
<a name="01">
<h2>note</h2>
</a>
<p>many of the structures written in word files differ slightly from the corresponding
structures word uses internally. the file-specific version of a structure is typically
named by adding a preceding or (more often) trailing f. for example, word uses internally
a plc (plex of cps), but writes to files a plcf (plex of cps in file). many discussions in
this document use the name of the internal structure when the file-specific structure is
what is really being referred to. the reader should remember that the name of a seemingly
undefined structure type may simply be missing a leading or trailing f.</p>
<a name="02">
<h2>word and docfiles</h2>
</a>
<p>word 97 is an ole 2.0 application. a word binary file is a docfile and word binary data
is written into streams within the docfile using the ole 2.0 docfile apis. these streams
are stored in the file as linked lists of file blocks and this data cannot be reliably
accessed by using the operating system open apis. to access data within a word binary
file, the file must be opened using the ole 2.0 docfile apis, and it must be read with the
appropriate docfile apis.</p>
<p>a word docfile consists of a main stream, a summary information stream, a table stream,
a data stream, and 0 or more object streams which contain private data for ole 2.0 objects
embedded within the word document. the summary information stream is described in the
section immediately following this one. the object storages contain binary data for
embedded objects. word has no knowledge of the contents of these storages; this
information is accessed and manipulated though the ole 2.0 apis. </p>
<p>the majority of this document describes the contents of the main stream and the table
stream.</p>
<a name="defin">
<h2>definitions</h2>
</a>
<p><b>ole 2.0:</b></p>
<p>object linking and embedding 2.0</p>
<p>api (application programming interface):</p>
<p>a set of libraries, functions, definitions, etc. which describe an interface to a
programming environment or model.</p>
<p><b>docfile:</b></p>
<p>an ole 2.0 compatible multi-stream file. word files are docfiles.</p>
<p><b>page (or sector):</b></p>
<p>512 byte segment of the main stream within a word docfile that begins on a 512-byte
boundary. (bytes 0-511 are in page 0, bytes 512-1023 are in page 1, etc.). in word data
structures, an unsigned two-byte integer page number is given the acronym <b>pn </b>(for <b>p</b>age
<b>n</b>umber).</p>
<p><b>document:</b></p>
<p>a named, multi-linked list of data structures, representing an ordered stream of text
with properties that was produced by a user of microsoft word</p>
<p><b>stream:</b></p>
<p>the physical encoding of a word document 's text and sub data structures in a random
access stream within a docfile.</p>
<p><b>main stream:</b></p>
<p>the stream within a word docfile containing the bulk of word's binary data.</p>
<p><b>table stream:</b></p>
<p>the stream within a word docfile containing the various plcf's and tables that describe
a documents structures.</p>
<p><b>data stream:</b></p>
<p>the stream within a word docfile containing various data that hang off of characters in
the main stream. for example, binary data describing in-line pictures and/or formfields.</p>
<p><b>summary information stream:</b></p>
<p>the stream within a word docfile containing the document summary information.</p>
<p><b>object storage:</b></p>
<p>a storage containing binary data for an embedded ole 2.0 object.</p>
<p><b>cp (character position):</b></p>
<p>a four-byte integer which is the position coordinate of a character of text within the
logical text stream of a document.</p>
<p><b>fc( file character position):</b></p>
<p>a four-byte integer which is the byte offset of a character (or other object) from the
beginning of a stream of the docfile. before a file has been edited(i.e. in a full saved
word document), cps<b> </b>can be transformed into <b>fc</b>s by adding the <b>fc</b>
coordinate of the beginning of a document's text stream to the cp. after a file has been
edited (i.e. in a fast-saved word document), the mapping from <b>cp</b> to <b>fc</b> is
recorded in the <b>piece table </b>(see below)</p>
<p><b>xchar( extended character set):</b></p>
<p>a data type which defines a "character". each xchar corresponds to a
character in the document, where "character" is defined as a glyph, regardless
of whether it is a single-byte or double-byte character. with word6/fe, word95/fe,
word97/all and future versions of word, this is defined as a 16-bit integer corresponding
to the unicode character code of the glyph.</p>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -