📄 changelog.txt

📁 source code the LZSS algorithm
💻 TXT
📖 第 1 页 / 共 2 页
字号:
上一页 12
Window bits fixed at 13, length at 4.  Compression level is now user controlled
by the "hash chain" limit.

// The compression level parameter (0-4) gives the following results:
// Compression Level  Window Bits  Match Bits  Hash Limit
// -----------------  -----------  ----------  ----------
// Fast       (0)     13           4           1               
// Normal     (1)     13           4           4
// Slow       (2)     13           4           16
// Very slow  (3)     13           4           64
// Ultra slow (4)     13           4           256



v1.02 (Single file Calgary Corpus = 3.389 bpb or 57.64%)
=====
Minor changes



V1.10 (Single file Calgary Corpus = 3.151 bpb or 60.61%)
=====
Based on work I was doing with LZP changed the way that match lengths are stored.
Instead of using a fixed number of bits, the LZP way is used where small numbers of
bits are used for small match lengths (more common) and large numbers of bits for
"freaky" longer matches.  The change gained 0.25 bpb (Calgary Corpus) which is 3% -
pretty good improvement!  Also, because longer matches are possible a nice side 
effect is that the entire routine is faster (Twice as fast on one 10MB file I tried!)
Not bad for what amounted to an extra 10 lines of code! ( See CompressedStreamWriteMatchLen() )

- Routine for finding matches tidied up.
- Lazy evaluation removed because it was very messy (only lost 0.01 bpb), may reimplement 
in the future



V1.20 (Single file Calgary Corpus = 3.159 bpb or 60.51%, 11.72s (PII366))
=====
No direct compression changes:
- Code is now compressed in small blocks so that large files can be handled
- CompressFile and CompressMem functions implemented
- Until finished, the algorithm ID will be BETA

// The start of the compression stream will contain "other" items apart from data.
// The data is compressed in blocks, the compressed then uncompressed size of the block
// is output followed by the compressed block itself.  This is repeated until a compressed
// block size of 0 is reached
//
// Algorithm ID				4 bytes
// Total Uncompressed Size	4 bytes (1 ULONG)
// ...
// Compressed Block1 Size	4 bytes (1 ULONG)
// Uncompressed Block1 Size	4 bytes (1 ULONG)
// Compressed Block1 Data	nn bytes (where nn=compressed block size)
// ...
// Compressed Blockn Size	4 bytes (1 ULONG)
// Uncompressed Blockn Size	4 bytes (1 ULONG)
// Compressed Blockn Data	nn bytes (where nn=compressed block size)
// ...
// 00 00 00 00				4 bytes (1 ULONG)

Obviously, this means are redundant values in the data stream which hurts compression.  



V1.21 (Single file Calgary Corpus = 2.934 bpb or 63.33%, 31.24s (PII366)) (or 3.2bpb/58% in 5s)
=====
- Data and Compression blocks total 1MB of memory.
- Lazy eval of +1 reintroduced now that the code is a lot tidier :)
- Number of bits used for the window length vary depending on file size, which
  helps compression be good on both small and large files



V1.22 (Single file Calgary Corpus = 2.883 bpb or 63.96%, 31.32s (PII366)) (or 3.5bpb/56% in 3s)
=====
- Added huffman coding for literals.  It generates the tree based on the frequency of
  the last 8192 literals written.  Therefore the table does not need to be transmitted
  as the decompressor just does the same.  The tree is updated every 8192 literals which
  helped compression.
- Memory and speed tweaked.  


V1.23 (Single file Calgary Corpus = 2.826 bpb or 64.67%, 20.61s (PII366)) (or 3.8bpb/52% in 3s)
=====
- Minor changes to the huffman code
- Number of bits used for window length increases from the start of a block until == 15
  (Helps compression at the start of a block/small files)
- Some hash/linked list improvements (memory pruned periodically to save memory)
- Match length coding tweaked for slightly better compression in most cases
- Changed "Uncompress" to "Decompress" - it was bugging me.


V1.24 (Single file Calgary Corpus = 2.786 bpb or 65.17%, 22.06s (PII366)) (or 3.7bpb/53% in 3s)
=====
- Minor changes to the huffman code
- Number of bits used for window length increases from the start of a block until == 15
  (Helps compression at the start of a block/small files)
- Some hash/linked list improvements (memory pruned periodically to save memory)
- Match length coding tweaked for slightly better compression in most cases
- Changed "Uncompress" to "Decompress" - it was bugging me.
- Changed the max match length to 256 and huffman encoded it along with the literals
  as the same 512 character alphabet.  Mind-blowing stuff.


V1.25 (Single file Calgary Corpus = 2.757 bpb or 65.54%, 20.62s (PII366)) (or 3.6bpb/53% in 3s)
=====
- 32 literal codes are used to describe the range of the match length, and then additional
  bits are used to "finished it off".  Similar to the LZP method I was using but more
  cunning. :)  This results in a huffman alphabet of 288 literals:

Code	More Bits?	Length

----	----------	------
256		0			0
257		0			1
258		0			2
259		0			3
260		0			4
261		0			5
262		0			6
263		0			7

264		1			8-9				
265		1			10-11
266		1			12-13
267		1			14-15

268		2			16-19			
269		2			20-23
270		2			24-27
271		2			28-31

272		3			32-39			
273		3			40-47
274		3			48-55
275		3			56-63

276		4			64-79
277		4			80-95
278		4			96-111
279		4			112-127

280		5			128-159
281		5			160-191
282		5			192-223
283		5			224-255

284		6			256-319
285		6			320-383
286		6			384-447
287		6			448-511



V1.26 (Single file Calgary Corpus = 2.706 bpb or 66.18%, 20.82s (PII366)) (or 3.3bpb/58% in 3s)
=====
- Similar trick this time.  A new alphabet (second huffman tree) is used to code
  the match offsets.  30 codes are used to represent the MSB of the offset
  (values based on an article on PKZIP).


Code	More Bits?	Offset
----	----------	------
0		0			0
1		0			1
2		0			2
3		0			3

4		1			4-5
5		1			6-7

6		2			8-11
7		2			12-15

8		3			16-23
9		3			24-31

10		4			32-47
11		4			48-63

12		5			64-95
13		5			96-127

14		6			128-191
15		6			192-255

16		7			256-383
17		7			384-511

18		8			512-767
19		8			768-1023

20		9			1024-1535
21		9			1536-2047

22		10			2048-3071
23		10			3072-4095

24		11			4096-6143
25		11			6144-8191

26		12			8192-12287
27		12			12288-16383

28		13			16384-24575
29		13			24576-32767



V1.27 (Single file Calgary Corpus = 2.706 bpb or 66.18%, 20.82s (PII366)) (or 3.3bpb/58% in 3s)
=====
- Added a callback function for monitoring progress and current compression ratio :)
- Added flexible ways of compressing from mem/file to mem/file
- The callback function can abort compression by returning 0 (return 1 to continue)
- Changed hash table code, uses a little more memory but is twice as quick


V1.28 (Single file Calgary Corpus = 2.599 bpb or 67.51%, 16.67s (PII366)) (or 3.1bpb/60% in 4s)
=====
- Improved string matching routine


V1.29 (Single file Calgary Corpus = 2.627 bpb or 67.17%, 12.82s (PII366)) (or 3.2bpb/59% in 3s)
=====
- Hash tables and string matching rewritten (again!) Much better on large files (although stats
  above don't really show this.  50% speed improvement on a 9meg file).
- Using larger values for HuffmanDelay as decompression was too slow (LZ77 is supposed to have
  fast decompression! :) )
- Added memory allocation testing
- Huffman codes limited to 16 bits


V1.30 (Single file Calgary Corpus = 2.572 bpb or 67.85%, 10.53s (P4 2Ghz)) (or 3.0bpb/61% in 1s)
=====
- Got a faster computer so reworked things for better compression sacrificing speed...
- Added memory allocation testing to decompress routines (d'oh)
- inlined some functions (minor speed improvement)
- Improved the initial huffman generation
- Tweaked the compression level parameters
- Changed source code into a similar format that I'm currently using for AutoIt v3


v1.40 (Single file Calgary Corpus = 2.549 bpb or 68.14%, 6.68s (P4 2Ghz)) (or 3.1bpb/61% in 1s)
=====
- Rewrote many things, most important is that a "circular/ring" buffer is now used and files are no longer
    split into "blocks" which helps compression.  Buffers reduced from 2meg to 256KB.
- Fixed very bad and stupid bug with match offsets (crash city under the correct circumstances)
- Hash tables once again rewritten - now uses half the memory (although still too much at 700KB) and faster
- Tweaked huffman code generation (and fixed a bug, smallest frequencies were not always selected... :( )


v1.40a 
======
- Fixed error when trying to compress files less that 4 bytes long (bad crash)
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -