topic.txt

来自「开放源码的编译器open watcom 1.6.0版的源代码」· 文本 代码 · 共 371 行 · 第 1/2 页

TXT
371
字号
THE WINHELP |TOPIC INTERNAL FILE
================================

Okay, this is the biggie.  |TOPIC is where all of the actual help
text is stored, along with spacing directives, font changes, and hotlinks.
There's a lot of stuff going on in here.  The following topics will be
covered:
	
	The |TOPIC page structure
	Compression and phrase replacement
	Nodes and linked lists
	Topic Nodes
	Text Nodes
	Offsets within the |TOPIC file
	
The last section should be read only after understanding the first three.
The terms "Extended" and "Character" offsets are defined in that section.
	
The |TOPIC page structure
-------------------------

The text and other data in the |TOPIC file is broken down into 4KB pages.
The first 12 bytes of each page is always a page header, with the remaining
4084 bytes reserved for data.  Thus page headers can always be found at
4KB intervals within the |TOPIC file.

A page header has the following structure:
	
	Bytes		Meaning
	-----------------------
	0-4		Offset of last node in previous page
	5-7		Offset of first node in this page
	8-11		Offset of last topic node in previous page
	
'Nodes' are linked list structures used by WinHelp and are described
later on.  These three fields are used to properly connect the data in
seperate pages back together again.  Note the following things:
	
	-- Pages need not be full.  If the last page is not full, it
	   will extend to the end of the |TOPIC file (the |TOPIC file
	   need not be a multiple of 4KB in size).  If any other page
	   is not full, the page will still extend to the whole 4KB, but
	   some of the data at the end will be garbage.  The linked list
	   structures and the header of the next page must be used to
	   reconstruct the true end of the page.
	-- The last node in a page need not be contained entirely within
	   that page; it may spill onto the next page.  This is true even
	   if the first page is not full.  Again, the page header of the
	   next page can be used to determine how much of the node has
	   spilled over.  (And thus where the node ends in the first page.)
	
In the first page, the first field is set to 0xFFFFFFFF and the last field
is set to 0x00000000.  All three offsets are "Extended" offsets.

Compression and Phrase replacement
----------------------------------

If the .HLP file is uncompressed, the |TOPIC data is simply hacked into
pages with an eye for filling all but the last page.  Compression makes
things trickier since 4KB of compressed data may expand into 5KB or 6KB of 
uncompressed data.  

If the help file is compiled with compression set to MEDIUM or HIGH,
everything in each page except the 12-byte header is compressed via
the LZ77 algorithm described in compress.txt.  If a page is not full
(and is not the last page) the entire 4084 bytes will still be in
compressed format.  Each page must be decompressed before anything can
be read, including the linked list structures and formatting directives.

However, each page is compressed seperately.  This permits incremental
decompression:  to read a certain piece of data, you only need to decompress
all the data before it in that page, not any of the data in previous pages.
This complicates things for the compiler; see the section on offsets.

Phrase replacement is handled differently; a byte with a value between 0x01
and 0x0A in the uncompressed help text is always the first byte of a two
byte code referring to some phrase from the |Phrases file.  See phrases.txt
for more info on that.

So just to sum up so far, the algorithm for reading all of the text stored
in a help file is:
	1)	Read in a page.
	2)	If compression is being used, decompress the page.
	3)	Use the header of the next page and the info on this
	        page to determine where the page ends.
 	4)	Add the decompressed data up to the true end of the
		page to your uncompressed data pool.
	5)	Repeat for each page.
Then go through the text, throw away the linked list data and other crud,
and add the phrases back in.  Simple, huh?

Nodes and Linked Lists
-----------------------

It's about time I explained what these linked list structures actually are.
All of the help text and other information is stored in linked list nodes
consisting of a 21-byte header followed by two variable-length blocks
of data.  The headers have the following format:

	Bytes		Meaning
	-----------------------
	0-3		Size of the entire node
	4-7		Size of second data block AFTER PHRASE
			REPLACEMENT
	8-11		Offset of previous node (Extended offset)
	12-15		Offset of next node (Extended offset)
	16-19		Offset of second data block within in this node
			(i.e., size of header and first block )
	20		Record Type
	
There are three known values for the Record Type:  0x02 for topic header
header info (referred to here as 'Topic node'), 0x20 for displayable info,
and 0x23 for a table of displayable info (both 0x20 and 0x23 are referred
to as 'Data nodes').

Immediately following the header are the two data blocks, which have
different meaning depending on the Record Type.

Topic Nodes
-----------

A Record Type of 0x02 in a node header indicates a topic header node.  Each
topic begins with one of these nodes.  The first data block of a type 0x02
node contains a 28-byte record of information pertaining to the entire
topic:
	Bytes		Meaning
	-----------------------
	0-3		Size of all nodes making up this topic
	4-7		Offset of previous topic in Browse sequence
			(Character Offset!)
	8-11		Offset of next topic in Browse sequence
			(Character Offset!)
	12-15		Topic Number, assigned by compiler
	16-19		Offset of first node in non-scrolling region
			of text for this topic (Extended offset)
	20-23		Offset of first node in scrolling region of
			text for this topic (Extended offset)
	24-27		Offset of next topic header node (Extended offset)
	
Note the two Character offsets; I beleive these are the only places Character
offsets are used within the |TOPIC file itself.

The second data area for a type 0x02 record contains one or more zero-
seperated ASCII strings.  The first string is always the title of the topic;
this matches the title stored in the |TTLBTREE file.  Note the History window
gets it's titles from here, *not* from the |TTLBTREE file.  Following the
title string are the strings of all the macros, if any, to be executed
when the topic is displayed.  The last string in this list is not zero-
terminated.

Text Nodes   /*   Hang on, folks :-(   */
----------

If the Record Type is 0x20 or 0x23, the node contains displayable text.
0x20 indicates normal text while 0x23 indicates a WinHelp table.  The two
types of nodes have similar formats.  In each case, the first data block
contains formatting information, font changes, hotlinks, and the like,
while the second data area contains text and references to the first
data area.

The format of the first data block is VERY complex, with a large number
of optional or variable length components.  I don't understand all of it
myself yet, but here goes:

First size word:  The first two bytes give TWICE the size of the data
                  stored in the first data block NOT INCLUDING this field
		  and the next.  For no apparent reason, the highest bit
		  of this field is always set, so "1C 80" means a size of
		  0xE, not 0x400E bytes.
Second size byte: The next byte contains TWICE the size of the data
		  in the second data block.  However, if the lowest bit
		  of this value is set, the size could not fit in a byte
		  so you must read in the next byte as well to get the true
		  size.  E.g., "3C" means a size of 0x1E bytes, while
		  "3D" followed by "01" means a size of (0x13D/2) or 0x9E
		  bytes.
# of columns:	  The next byte will contain the number of columns in the
		  table, if this is a 0x23 record.  In a 0x20 record this
		  byte is set to zero.
		  
In a 0x23 node, the next thing is a list of 6-byte records, one for each
column:
	Bytes		Meaning
	-----------------------
	0-1		Column number (0xFFFF if this is the last record)
	2-5		Unknown (sigh) (width?)

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?