📄 changelog.txt

📁 source code the LZSS algorithm
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页


Note:
Code has only been tested on MS Visual-C 6 (although it should work under MinGW too)

==============================================================================

Microsoft Compiler Type Sizes
-----------------------------
char, unsigned char, signed char	1 byte 
typedef unsigned char	UCHAR;
typedef unsigned char   BYTE;


short, unsigned short				2 bytes
typedef unsigned short	USHORT;
typedef unsigned short  WORD;

int, unsigned int					4 bytes
typedef unsigned int    UINT;

long, unsigned long					4 bytes
typedef unsigned long	ULONG;
typedef unsigned long   DWORD;

float								4 bytes 
double								8 bytes 
long double							8 bytes 

Notes: UINT and ULONG appear to be equal and as "fast" as each other, this 
would make sense as CPU is 32bits.


LZSS Implementation
===================
Jonathan Bennett (c)2002-2003, jon@hiddensoft.com

All code contained is free to use.  Just let me know that you have found it useful.
Check the "lzss.txt" file for information on the algorithm, this file just details the
technical changes.


v0.10
=====
First implementation of the LZSS routine.  Coded to be as readable as something
this complicated can be.

The way matches are searched for is called "greedy parsing" and is very very slow.
A 2 meg file takes around 1m45s on a 366MHz CPU to compress :(



v0.11
=====
(NOTE: It was found that this is slightly incorrect -- see v1.00)

First optimization is in the compression.  We use a number of bits to code the
(offset, len) pair.  For example, 0-4095 (12 bits) for the window size and 0-63 (6bits)
for the match size.  Each literal byte encoded is 9bits.  In this example we are using
18 bits for a (offset, len) pair so this means that it is only worth encoding 3 bytes or
more.
So, if our minimum match length is 3, then the (offset, len) pair will never use the
values 0,1 and 2.  So, why don't we use this to increase the size of the numbers we
can encode into (offset, len)?  Instead of 0-4095 we have 2-4097, and instead of 
0-63 we have 2-65.  It's effectively free bits!  On a 1MB file this improved the compression
by a couple of percent.  Not much, but hey!

All the window size, and match lengths are now definable at run-time.



v0.20
=====
Speed optimizations were done by using a chaining hash table rather than greedy parsing.
Compression of a 2 meg file was reduced from 1m45s to 5s with only 1% loss of compression!
Almost usable now.  At this point I think the "hash table" related patents may kick in...



v0.21
=====
General changes.
- Compression and decompression code split into two classes so that you can include
  just one of them in your programs to save a couple of kilobytes.
- When a value is added to the hash table, and there is no more room in the chain
  the oldest entry is replaced.  Will not improve compression much (0.5%), but it will
  mean that matches are closer to the position being searched which may help in the future.
- Hash table converted into a one-way linked list (rather than 2 way) to reduce memory needed



v0.30
=====
"lazy evaluation" implemented.  Strangely enough, this speeded things up a little, and of 
course the compression ratio was increased a little.

It seems that when using small hash tables (chain limit ~2) that the most efficient window
and match bits size are 12-13 and 4.  This is because with a small chain limit (although fast)
you don't get the extra compression that the larger window gives.  For good compression 
(not best speed) a good value for the chain limit is window size / 256.

Here are my results for a 2 meg binary .exe file:

Test 1
------

A hash chain limit of 2 was used.
Original file size is 2345984 bytes

Window Bits  Match Bits  Compressed Size  Hash Mem Used  Time Taken
-----------  ----------  ---------------  -------------  ----------
	 12          4       1315650          74488          3 secs
	 13          4       1321970          81032          3 secs
	 12          5       1326140          74632          3 secs
	 13          5       1330782          81032          3 secs
	 11          4       1331880          69552          3 secs
	 14          4       1337788          90304          3 secs
	 12          3       1344038          74176          3 secs
	 14          5       1344550          90328          3 secs          
	 11          5       1345478          69596          3 secs
	 11          3       1353626          68912          3 secs
	 13          3       1354798          80968          3 secs
	 10          4       1359530          65112          3 secs
	 15          4       1366410          95872          3 secs
	 15          5       1371068          95864          3 secs
	 10          5       1374808          65272          3 secs
	 10          3       1376084          64504          3 secs
	 14          3       1375580          90264          3 secs
	 15          3       1408496          95856          3 secs

The first thing obvious from these results is that in every case
the best "match bit" size for a given window size is 4 - we will 
concentrate on this size from now on.  


Test 2
------

A hash chain limit of 16 was used.
Original file size is 2345984 bytes

Window Bits  Match Bits  Compressed Size  Hash Mem Used  Time Taken
-----------  ----------  ---------------  -------------  ----------
	 13          4       1211858          169312         4 secs
	 14          4       1212150          221680         5 secs
	 12          4       1218880          145928         4 secs
	 11          4       1247888          122264         4 secs
	 15          4       1223296          305760         5 secs
	 10          4       1284376          113680         4 secs

Interesting, again, it is 13 putting up a good performance along with
11, 12 and 14.


Test 3
------

A hash chain limit of 32 was used.
Original file size is 2345984 bytes

Window Bits  Match Bits  Compressed Size  Hash Mem Used  Time Taken
-----------  ----------  ---------------  -------------  ----------
	 14          4       1194912          249712         6 secs
	 13          4       1197928          196312         5 secs
	 15          4       1202494          358056         6 secs
	 12          4       1207556          160728         4 secs
	 11          4       1238556          139168         4 secs
	 10          4       1276936          132920         4 secs

Again 12, 13 and 14 are giving the best compression (13 is my fave at this
point due to it being quicker than the others).  I'll now drop 10 and 11 from
the testing


Test 3
------

A hash chain limit of 64 was used.
Original file size is 2345984 bytes

Window Bits  Match Bits  Compressed Size  Hash Mem Used  Time Taken
-----------  ----------  ---------------  -------------  ----------
	 14          4       1185580          280816         7 secs
	 15          4       1190544          391360         8 secs
	 13          4       1190618          212616         5 secs
	 12          4       1202214          170272         5 secs


The pattern seems to be 13 is a good general window size giving both good
compression and speed, but a window size of 14 starts giving better compression
(at the expense of speed) as the hash chain limit reaches 32 and above.  Lets do
a very large hash table to prove the trend


Test 4
------

A hash chain limit of 1024 was used.
Original file size is 2345984 bytes

Window Bits  Match Bits  Compressed Size  Hash Mem Used  Time Taken
-----------  ----------  ---------------  -------------  ----------
	 14          4       1172642          357056         25 secs
	 13          4       1181234          261240         19 secs

Yes, once over a hash chain limit of 32, a window size of 14 bits seems
to be the way to go.  Lets see at what point raising the hash limit
becomes a waste.


Test 5 
------

Window Bits  Hash Limit  Compressed Size  Hash Mem Used  Time Taken
-----------  ----------  ---------------  -------------  ----------
	 14         128      1179972          305960         12 secs
	 14         256      1176368          320112         15 secs
	 14         512      1174192          339536         21 secs
	 14         1024     1172642          357056         25 secs
	 14         2048     1171360          370504         46 secs


Based on these tests, it looks like having variable window and match bit
sizes is a little bit of a waste of time, 4 options would seem to be the
way to go:

//Compression Level  Window Bits  Match Bits  Hash Limit
//-----------------  -----------  ----------  ----------
//Fast       (0)     12           4           2               
//Normal     (1)     13           4           8
//Slow       (2)     13           4           16
//Very slow  (3)     14           4           32
//Ultra slow (4)     14           4           128



v1.00
=====
For release.
Compression options as detailed above implemented.

Bug fixed in the minimum length and "adjust" matching:

"So, if our minimum match length is 3, then the (offset, len) pair will never use the
values 0,1 and 2.  So, why don't we use this to increase the size of the numbers we
can encode into (offset, len)?  Instead of 0-4095 we have 2-4097, and instead of 
0-63 we have 2-65.  It's effectively free bits!  On a 1MB file this improved the compression
by a couple of percent.  Not much, but hey!"

Not true, it should be 3-4098 and 3-66.  



v1.01
=====
Redid hashing functions to increase speed.  Also added a "malloc" buffer so that
the constant malloc/free calls don't take up as much time.  
Changed the algorithm ID to "JB01".
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -