📄 changelog.txt
字号:
Window bits fixed at 13, length at 4. Compression level is now user controlled
by the "hash chain" limit.
// The compression level parameter (0-4) gives the following results:
// Compression Level Window Bits Match Bits Hash Limit
// ----------------- ----------- ---------- ----------
// Fast (0) 13 4 1
// Normal (1) 13 4 4
// Slow (2) 13 4 16
// Very slow (3) 13 4 64
// Ultra slow (4) 13 4 256
v1.02 (Single file Calgary Corpus = 3.389 bpb or 57.64%)
=====
Minor changes
V1.10 (Single file Calgary Corpus = 3.151 bpb or 60.61%)
=====
Based on work I was doing with LZP changed the way that match lengths are stored.
Instead of using a fixed number of bits, the LZP way is used where small numbers of
bits are used for small match lengths (more common) and large numbers of bits for
"freaky" longer matches. The change gained 0.25 bpb (Calgary Corpus) which is 3% -
pretty good improvement! Also, because longer matches are possible a nice side
effect is that the entire routine is faster (Twice as fast on one 10MB file I tried!)
Not bad for what amounted to an extra 10 lines of code! ( See CompressedStreamWriteMatchLen() )
- Routine for finding matches tidied up.
- Lazy evaluation removed because it was very messy (only lost 0.01 bpb), may reimplement
in the future
V1.20 (Single file Calgary Corpus = 3.159 bpb or 60.51%, 11.72s (PII366))
=====
No direct compression changes:
- Code is now compressed in small blocks so that large files can be handled
- CompressFile and CompressMem functions implemented
- Until finished, the algorithm ID will be BETA
// The start of the compression stream will contain "other" items apart from data.
// The data is compressed in blocks, the compressed then uncompressed size of the block
// is output followed by the compressed block itself. This is repeated until a compressed
// block size of 0 is reached
//
// Algorithm ID 4 bytes
// Total Uncompressed Size 4 bytes (1 ULONG)
// ...
// Compressed Block1 Size 4 bytes (1 ULONG)
// Uncompressed Block1 Size 4 bytes (1 ULONG)
// Compressed Block1 Data nn bytes (where nn=compressed block size)
// ...
// Compressed Blockn Size 4 bytes (1 ULONG)
// Uncompressed Blockn Size 4 bytes (1 ULONG)
// Compressed Blockn Data nn bytes (where nn=compressed block size)
// ...
// 00 00 00 00 4 bytes (1 ULONG)
Obviously, this means are redundant values in the data stream which hurts compression.
V1.21 (Single file Calgary Corpus = 2.934 bpb or 63.33%, 31.24s (PII366)) (or 3.2bpb/58% in 5s)
=====
- Data and Compression blocks total 1MB of memory.
- Lazy eval of +1 reintroduced now that the code is a lot tidier :)
- Number of bits used for the window length vary depending on file size, which
helps compression be good on both small and large files
V1.22 (Single file Calgary Corpus = 2.883 bpb or 63.96%, 31.32s (PII366)) (or 3.5bpb/56% in 3s)
=====
- Added huffman coding for literals. It generates the tree based on the frequency of
the last 8192 literals written. Therefore the table does not need to be transmitted
as the decompressor just does the same. The tree is updated every 8192 literals which
helped compression.
- Memory and speed tweaked.
V1.23 (Single file Calgary Corpus = 2.826 bpb or 64.67%, 20.61s (PII366)) (or 3.8bpb/52% in 3s)
=====
- Minor changes to the huffman code
- Number of bits used for window length increases from the start of a block until == 15
(Helps compression at the start of a block/small files)
- Some hash/linked list improvements (memory pruned periodically to save memory)
- Match length coding tweaked for slightly better compression in most cases
- Changed "Uncompress" to "Decompress" - it was bugging me.
V1.24 (Single file Calgary Corpus = 2.786 bpb or 65.17%, 22.06s (PII366)) (or 3.7bpb/53% in 3s)
=====
- Minor changes to the huffman code
- Number of bits used for window length increases from the start of a block until == 15
(Helps compression at the start of a block/small files)
- Some hash/linked list improvements (memory pruned periodically to save memory)
- Match length coding tweaked for slightly better compression in most cases
- Changed "Uncompress" to "Decompress" - it was bugging me.
- Changed the max match length to 256 and huffman encoded it along with the literals
as the same 512 character alphabet. Mind-blowing stuff.
V1.25 (Single file Calgary Corpus = 2.757 bpb or 65.54%, 20.62s (PII366)) (or 3.6bpb/53% in 3s)
=====
- 32 literal codes are used to describe the range of the match length, and then additional
bits are used to "finished it off". Similar to the LZP method I was using but more
cunning. :) This results in a huffman alphabet of 288 literals:
Code More Bits? Length
---- ---------- ------
256 0 0
257 0 1
258 0 2
259 0 3
260 0 4
261 0 5
262 0 6
263 0 7
264 1 8-9
265 1 10-11
266 1 12-13
267 1 14-15
268 2 16-19
269 2 20-23
270 2 24-27
271 2 28-31
272 3 32-39
273 3 40-47
274 3 48-55
275 3 56-63
276 4 64-79
277 4 80-95
278 4 96-111
279 4 112-127
280 5 128-159
281 5 160-191
282 5 192-223
283 5 224-255
284 6 256-319
285 6 320-383
286 6 384-447
287 6 448-511
V1.26 (Single file Calgary Corpus = 2.706 bpb or 66.18%, 20.82s (PII366)) (or 3.3bpb/58% in 3s)
=====
- Similar trick this time. A new alphabet (second huffman tree) is used to code
the match offsets. 30 codes are used to represent the MSB of the offset
(values based on an article on PKZIP).
Code More Bits? Offset
---- ---------- ------
0 0 0
1 0 1
2 0 2
3 0 3
4 1 4-5
5 1 6-7
6 2 8-11
7 2 12-15
8 3 16-23
9 3 24-31
10 4 32-47
11 4 48-63
12 5 64-95
13 5 96-127
14 6 128-191
15 6 192-255
16 7 256-383
17 7 384-511
18 8 512-767
19 8 768-1023
20 9 1024-1535
21 9 1536-2047
22 10 2048-3071
23 10 3072-4095
24 11 4096-6143
25 11 6144-8191
26 12 8192-12287
27 12 12288-16383
28 13 16384-24575
29 13 24576-32767
V1.27 (Single file Calgary Corpus = 2.706 bpb or 66.18%, 20.82s (PII366)) (or 3.3bpb/58% in 3s)
=====
- Added a callback function for monitoring progress and current compression ratio :)
- Added flexible ways of compressing from mem/file to mem/file
- The callback function can abort compression by returning 0 (return 1 to continue)
- Changed hash table code, uses a little more memory but is twice as quick
V1.28 (Single file Calgary Corpus = 2.599 bpb or 67.51%, 16.67s (PII366)) (or 3.1bpb/60% in 4s)
=====
- Improved string matching routine
V1.29 (Single file Calgary Corpus = 2.627 bpb or 67.17%, 12.82s (PII366)) (or 3.2bpb/59% in 3s)
=====
- Hash tables and string matching rewritten (again!) Much better on large files (although stats
above don't really show this. 50% speed improvement on a 9meg file).
- Using larger values for HuffmanDelay as decompression was too slow (LZ77 is supposed to have
fast decompression! :) )
- Added memory allocation testing
- Huffman codes limited to 16 bits
V1.30 (Single file Calgary Corpus = 2.572 bpb or 67.85%, 10.53s (P4 2Ghz)) (or 3.0bpb/61% in 1s)
=====
- Got a faster computer so reworked things for better compression sacrificing speed...
- Added memory allocation testing to decompress routines (d'oh)
- inlined some functions (minor speed improvement)
- Improved the initial huffman generation
- Tweaked the compression level parameters
- Changed source code into a similar format that I'm currently using for AutoIt v3
v1.40 (Single file Calgary Corpus = 2.549 bpb or 68.14%, 6.68s (P4 2Ghz)) (or 3.1bpb/61% in 1s)
=====
- Rewrote many things, most important is that a "circular/ring" buffer is now used and files are no longer
split into "blocks" which helps compression. Buffers reduced from 2meg to 256KB.
- Fixed very bad and stupid bug with match offsets (crash city under the correct circumstances)
- Hash tables once again rewritten - now uses half the memory (although still too much at 700KB) and faster
- Tweaked huffman code generation (and fixed a bug, smallest frequencies were not always selected... :( )
v1.40a
======
- Fixed error when trying to compress files less that 4 bytes long (bad crash)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -