📄 lpaq6.hpp
字号:
/* lpaq6.cpp file compressor, September 29, 2007.
(C) 2007, Matt Mahoney, matmahoney@yahoo.com
Alexander Ratushnyak, artest@inbox.ru
LICENSE
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details at
Visit <http://www.gnu.org/copyleft/gpl.html>.
To compress: N input output (N=0..9, uses 3 + 3*2^N MB memory)
To decompress: d input output (requires same memory)
For example:
lpaq6 9 foo foo.lpq
compresses foo to foo.lpq using 1.5 GB memory.
lpaq6 d foo.lpq foo
decompresses to foo, also using 1.5 GB memory. Option 0 uses
6 MB memory. The maximum file size is 2 GB.
DESCRIPTION OF LPAQ1:
lpaq1 is a "lite" version of PAQ, about 35 times faster than paq8l
at the cost of some compression (but similar to high end BWT and
PPM compressors). It is a context-mixing compressor combining 7
contexts: orders 1, 2, 3, 4, 6, a lowercase unigram word context
(for ASCII text), and a "match" order, which predicts the next
bit in the last matching context. The independent bit predictions of
the 7 models are combined using one of 80 neural networks (selected by
a small context), then adjusted using 2 SSE stages (order 0 and 1)
and arithmetic coded.
Prediction is bitwise. This means that an order-n context consists
of the last n whole bytes plus any of the 0 to 7 previously coded
bits of the current byte starting with the most significant bit.
The unigram word context consists of a hash of the last (at most) 11
consecutive letters (A-Z, a-z) folded to lower case. The context
does not include any nonalphabetic characters nor any characters
preceding the last nonalphabetic character.
The first 6 contexts (orders 1..4, 6, word) are used to index a
hash table to look up a bit-history represented by an 8-bit state.
The representable states are the same as in paq8l. A state can
either represent all histories up to 4 bits long, or a pair of
0,1 counts plus a flag to indicate the most recent bit. The counts
are bounded by (41,0), (40,1), (12,2), (5,3), (4,4) and likewise
for 1,0. When a count is exceeded, the opposite count is reduced to
approximately preserve the count ratio. The last bit flag is present
only for states whose total count is less than 16. There are 253
possible states.
A bit history is mapped to a probability using an adaptive table
(StateMap). This differs from paq8l in that each table entry includes
a count so that adaptation is rapid at first. Each table entry
has a 22-bit probability (initially p = 0.5) and 10-bit count (initially
n = 0) packed into 32 bits. After bit y is predicted, n is incremented
up to the limit (1023) and the probability is adjusted by
p := p + (y - p)/(n + 0.5). This model is stationary:
p = (n1 + 0.5)/(n + 1), where n1 is the number of times y = 1 out of n.
The "match" model (MatchModel) looks up the current context in a
hash table, first using a longer context, then a shorter one. If
a match is found, then the following bits are predicted until there is
a misprediction. The prediction is computed by mapping the predicted
bit, the length of the match (1..15 or quantized by 4 in 16..62, max 62),
and the last whole byte as context into a StateMap. If no match is found,
then the order 0 context (last 0..7 bits of the current byte) is used
as context to the StateMap.
The 7 predictions are combined using a neural network (Mixer) as in
paq8l, except it is a single level network without MMX code. The
inputs p_i, i=0..6 are first stretched: t_i = log(p_i/(1 - p_i)),
then the output is computed: p = squash(SUM_i t_i * w_i), where
squash(x) = 1/(1 + exp(-x)) is the inverse of stretch(). The weights
are adjusted to reduce the error: w_i := w_i + L * t_i * (y - p) where
(y - p) is the prediction error and L ~ 0.002 is the learning rate.
This is a standard single layer backpropagation network modified to
minimize coding cost rather than RMS prediction error (thus dropping
the factors p * (1 - p) from learning).
One of 80 neural networks are selected by a context that depends on
the 3 high order bits of the last whole byte plus the context order
(quantized to 0, 1, 2, 3, 4, 6, 8, 12, 16, 32). The order is
determined by the number of nonzero bit histories and the length of
the match from MatchModel.
The Mixer output is adjusted by 2 SSE stages (called APM for adaptive
probability map). An APM is a StateMap that accepts both a discrte
context and an input probability, pr. pr is stetched and quantized
to 24 levels. The output is interpolated between the 2 nearest
table entries, and then only the nearest entry is updated. The entries
are initialized to p = pr and n = 6 (to slow down initial adaptation)
with a limit n <= 255. The APM differs from paq8l in that it uses
the new StateMap rapid initial adaptation, does not update both
adjacent table entries, and uses 24 levels instead of 33. The two
stages use a discrete order 0 context (last 0..7 bits) and a hashed order-1
context (14 bits). Each output is averaged with its input weighted
by 1/4.
The output is arithmetic coded. The code for a string s with probability
p(s) is a number between Q and Q+p(x) where Q is the total probability
of all strings lexicographically preceding s. The number is coded as
a big-endian base-256 fraction. A header is prepended as follows:
- "pQ" 2 bytes must be present or decompression gives an error.
- 1 (0x01) version number (other values give an error).
- memory option N as one byte '0'..'9' (0x30..0x39).
- file size as a 4 byte big-endian number.
- arithmetic coded data.
Two thirds of the memory (2 * 2^N MB) is used for a hash table mapping
the 6 regular contexts (orders 1-4, 6, word) to bit histories. A lookup
occurs every 4 bits. The input is a byte-oriented context plus possibly
the first nibble of the next byte. The output is an array of 15 bit
histories (1 byte each) for all possible contexts formed by appending
0..3 more bits. The table entries have the format:
{checksum, "", 0, 1, 00, 10, 01, 11, 000, 100, 010, 110, 001, 101, 011, 111}
The second byte is the bit history for the context ending on a nibble
boundary. It also serves as a priority for replacement. The states
are ordered by increasing total count, where state 0 represents the
initial state (no history). When a context is looked up, the 8 bit
checksum (part of the hash) is compared with 3 adjacent entries, and
if there is no match, the entry with the lowest priority is cleared
and the new checksum is stored.
The hash table is aligned on 64 byte cache lines. A table lookup never
crosses a 64 byte address boundary. Given a 32-bit hash h of the context,
8 bits are used for the checksum and 17 + N bits are used for the
index i. Then the entries i, i XOR 1, and i XOR 2 are tried. The hash h
is actually a collision-free permuation, consisting of multiplying the
input by a large odd number mod 2^32, a 16-bit rotate, and another multiply.
The order-1 context is mapped to a bit history using a 64K direct
lookup table, not a hash table.
One third of memory is used by MatchModel, divided equally between
a rotating input buffer of 2^(N+19) bytes and an index (hash table)
with 2^(N+17) entries. Two context hashes are maintained, a long one,
h1, of length ceil((N+17)/3) bytes and a shorter one, h2, of length
ceil((N+17)/5) bytes, where ceil() is the ceiling function. The index
does not use collision detection. At each byte boundary, if there is
not currently a match, then the bytes before the current byte are
compared with the location indexed by h1. If less than 2 bytes match,
then h2 is tried. If a match of length 1 or more is found, the
match is maintained until the next bit mismatches the predicted bit.
The table is updated at h1 and h2 after every byte.
To compile (g++ 3.4.5, upx 3.00w):
g++ -Wall lpaq1.cpp -O2 -Os -march=pentiumpro -fomit-frame-pointer
-s -o lpaq1.exe
upx -qqq lpaq1.exe
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <math.h>
#include <ctype.h>
#define NDEBUG // remove for debugging
#include <assert.h>
#include "lpaq6.h"
// 8, 16, 32 bit unsigned types (adjust as appropriate)
typedef unsigned char U8;
typedef unsigned short U16;
typedef unsigned int U32;
#ifdef WIKI
#define FB_SIZE 8192
#define TOLIMIT_1a 1023
#define TOLIMIT_1b 512
#define TOLIMIT_2a 1023
#define TOLIMIT_2b 1023
#define SM_FIRST 0
#define APM_FIRST 7
#define SQUARD 4
#define add2order0 (c>>4)
#define upd3_cp\
cp[2]=t1a.get0(hash7(c0*23-(c4&0xc0ffffff)*241 )); /* 3.25 */\
cp[3]=t2a.get0(hash8(c0 -c4*19+(c8&0xc0ff)*19991)); /* 5.25 */\
cp[4]=t3b.get0(hash9(c0*31-c4*59-(c8&0xfcffffff)*59999));/* 7.75 */
#define upd7_cp\
cp[2]=t1b.get1(hash2((c4&0xffffff)*251)); /* order 3 */\
cp[3]=t2b.get1(hash3(c4*127+((c8>>4)&15) )); /* order 4.5 */\
cp[4]=t3a.get1(hash4(c4*197-(c8&0xfcffff)*63331)); /* order 6.75*/
#define upd7_h123\
h2=hash1(c4*59+c8*59999)&HN;\
h1=hash2(c4*73-c8*101+cc*421)&HN;
#else
#define FB_SIZE 32768
#define TOLIMIT_1a 108
#define TOLIMIT_1b 256
#define TOLIMIT_2a 200
#define TOLIMIT_2b 368
#define SM_FIRST 0
#define APM_FIRST 8
#define SQUARD 2
#define add2order0 ( (c>>6) + ((c4>>4)&12) )
#define get0 get
#define get1 get
int calcprevfail[256];
#define upd3_cp\
cp[2]=t1a.get(hash7(c0*23-(c4&0xffffff)*251));\
cp[3]=t2a.get(hash8(c0 -c4*59));\
cp[4]=t3b.get(hash9(c0*31-c4*197+(c8& 0xffff)*63331 ));
#define upd7_cp\
cp[2]=t1b.get(hash2((c4&0xffffff)*251));\
cp[3]=t2b.get(hash3(c4*127));\
cp[4]=t3a.get(hash4(c4*197-(c8& 0xffff)*59999));
#define upd7_h123\
t0c1=t0+c1*256;\
prevfail=calcprevfail[fails];\
fails=1;\
h1=h1*(7 <<1)-c-1&HN;\
h2=hash2(c4*73-c8*101+cc*421)&HN;\
h3=hash1(c4*59+c8*59999)&HN;
#endif
U8 file_buf[FB_SIZE+4], *fb_pos=&file_buf[0], *fb_stop=&file_buf[FB_SIZE+4], *fb_len;
U32 fb_done=0, ExeFlag=0;
FILE *uncompressed;
int DP_SHIFT=14;
U32 mem_usage=0;
inline void E8E9_encode(U8 *p, int i) {
for (; i>0; ++p, --i) {
if ( (*p & 0xfe)==0xe8 && (*(p+4)==0 || *(p+4)==0xff)) {
int f=*(U32*)(p+1), a=(f+(fb_done+p-&file_buf[0])) << 7;
*(U32*)(p+1)=a >> 7; // This is NOT big-endian-compatible
}
}
fb_done+=FB_SIZE;
}
inline void E8E9_decode(U8 *p, int i) {
for (; i>0; --p, --i) {
if ( (*p & 0xfe)==0xe8 && (*(p+4)==0 || *(p+4)==0xff)) {
int f=*(U32*)(p+1), a=(f-(fb_done+p-&file_buf[0])) << 7;
*(U32*)(p+1)=a >> 7; // This is NOT big-endian-compatible
}
}
fb_done+=FB_SIZE;
}
inline int do_getc() {
if (fb_pos<fb_stop) return *fb_pos++;
if (fb_stop!=&file_buf[FB_SIZE] || fb_stop==fb_len) return EOF;
fb_pos=&file_buf[0];
*(U32*)fb_pos = *(U32*)fb_stop;
fb_stop=fb_len-FB_SIZE;
if (fb_stop==&file_buf[4]) {
fb_len=fb_stop + fread(&file_buf[4], 1, FB_SIZE, uncompressed);
if (ExeFlag) E8E9_encode(fb_pos, fb_len-8-fb_pos);
fb_stop=fb_len; if (fb_stop>&file_buf[FB_SIZE]) fb_stop=&file_buf[FB_SIZE];
}
return *fb_pos++;
}
inline void do_putc(U8 c) {
if (fb_pos>=fb_stop) {
if (ExeFlag) E8E9_decode(&file_buf[FB_SIZE-1-4], FB_SIZE-4);
fwrite(&file_buf[0], 1, FB_SIZE, uncompressed);
fb_pos=&file_buf[4];
*(U32*)(fb_pos-4) = *(U32*)(fb_stop-4);
}
*fb_pos++=c;
}
inline void do_putc_end() {
int LEN=fb_pos-&file_buf[0];
if (ExeFlag) E8E9_decode(&file_buf[LEN-1-4], LEN-4);
fwrite(&file_buf[0], 1, LEN, uncompressed);
}
// Error handler: print message if any, and exit
void quit(const char* message=0) {
if (message) printf("%s\n", message);
exit(1);
}
// Create an array p of n elements of type T
template <class T> void alloc(T*&p, int n) {
p=(T*)calloc(n, sizeof(T));
mem_usage+=n*sizeof(T);
if (!p) quit("out of memory");
}
///////////////////////////// Squash //////////////////////////////
int squash_t[4096]; //initialized when Encoder is created
#define squash(x) squash_t[(x)+2047]
// return p = 1/(1 + exp(-d)), d scaled by 8 bits, p scaled by 12 bits
int squash_init(int d) {
if (d==2047) return 4095;
if (d==-2047) return 0;
double k=4096/(double(1+exp(-(double(d)/256))));
return int(k);
}
//////////////////////////// Stretch ///////////////////////////////
// Inverse of squash. stretch(d) returns ln(p/(1-p)), d scaled by 8 bits,
// p by 12 bits. d has range -2047 to 2047 representing -8 to 8.
// p has range 0 to 4095 representing 0 to 1.
static int stretch_t[4096]; //initialized when Encoder is created
static int stretch_t2[4096];
static int dt[1024]; // i -> 16K/(i+3)
static int dta[1024]; // i -> 8K/(i+3)
static int calcfails[8192]; //as above, initialized when Encoder is created
static int TextFlag=0;
static int y20; // y<<20
static int c0=1; // last 0-7 bits with leading 1
static int c4=0; // last 4 bytes
static int c1=0; // last two higher 4-bit nibbles
static int bcount=7; // bit count
static U8* buf; // input buffer
static int pos; // number of bytes in buf
///////////////////////// state table ////////////////////////
// State table:
// nex(state, 0) = next state if bit y is 0, 0 <= state < 256
// nex(state, 1) = next state if bit y is 1
//
// States represent a bit history within some context.
// State 0 is the starting state (no bits seen).
// States 1-30 represent all possible sequences of 1-4 bits.
// States 31-252 represent a pair of counts, (n0,n1), the number
// of 0 and 1 bits respectively. If n0+n1 < 16 then there are
// two states for each pair, depending on if a 0 or 1 was the last
// bit seen.
// If n0 and n1 are too large, then there is no state to represent this
// pair, so another state with about the same ratio of n0/n1 is substituted.
// Also, when a bit is observed and the count of the opposite bit is large,
// then part of this count is discarded to favor newer data over old.
static const U8 State_table[512*4]={
1, 3, 4, 7, 8, 9, 11, 15, 16, 17, 18, 20, 21, 22, 26, 31, 32, 32, 32, 32,
34, 34, 34, 34, 34, 34, 36, 36, 36, 36, 38, 41, 42, 42, 44, 44, 46, 46, 48, 48,
50, 53, 54, 54, 56, 56, 58, 58, 60, 60, 62, 62, 50, 67, 68, 68, 70, 70, 72, 72,
74, 74, 76, 76, 62, 62, 64, 83, 84, 84, 86, 86, 44, 44, 58, 58, 60, 60, 76, 76,
78, 78, 80, 93, 94, 94, 96, 96, 48, 48, 88, 88, 80,103,104,104,106,106, 62, 62,
88, 88, 80,113,114,114,116,116, 62, 62, 88, 88, 90,123,124,124,126,126, 62, 62,
98, 98, 90,133,134,134,136,136, 62, 62, 98, 98, 90,143,144,144, 68, 68, 62, 62,
98, 98,100,149,150,150,108,108,100,153,154,108,100,157,158,108,100,161,162,108,
110,165,166,118,110,169,170,118,110,173,174,118,110,177,178,118,110,181,182,118,
120,185,186,128,120,189,190,128,120,193,194,128,120,197,198,128,120,201,202,128,
120,205,206,128,120,209,210,128,130,213,214,138,130,217,218,138,130,221,222,138,
130,225,226,138,130,229,230,138,130,233,234,138,130,237,238,138,130,241,242,138,
130,245,246,138,140,249,250, 80,140,253,254, 80,140,253,254, 80,
2, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38,
40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,
80, 82, 84, 86, 88, 90, 92, 94, 96, 98,100,102,104,106,108,110,112,114,116,118,
120,122,124,126,128,130,132,134,136,138,140,142,144,146,148,150,152,154,156,158,
160,162,164,166,168,170,172,174,176,178,180,182,184,186,188,190,192,194,196,198,
200,202,204,206,208,210,212,214,216,218,220,222,224,226,228,230,232,234,236,238,
240,242,244,246,248,250,252,254,128,130,132,134,136,138,140,142,144,146,148,150,
152,154,156,158,160,162,164,166,168,170,172,174,176,178,180,182,184,186,188,190,
192,194,196,198,200,202,204,206,208,210,212,214,216,218,220,222,224,226,228,230,
232,234,236,238,240,242,244,246,248,250,252,190,192,130,132,134,136,138,140,142,
144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,176,178,180,182,
184,186,188,190,192,194,196,198,200,202,204,206,208,210,212,214,216,218,220,222,
224,226,228,230,232,234,236,238,240,242,244,246,248,250,252,254,
1, 16, 17, 18, 38, 40, 42, 70, 46, 63, 95,195,159,191,223,240, 2, 32, 33, 34,
39, 41, 43, 45, 47, 79,111,143,175,207,225,241, 3, 48, 49, 50, 36, 52, 53, 54,
55, 53, 57, 58, 59, 60, 61, 62, 19, 35, 36, 66, 67, 64, 54, 70, 71, 72, 73, 74,
75, 76, 77, 78, 68, 69, 37, 82, 83, 84, 80, 81, 87, 88, 89, 6, 91, 92, 93, 94,
85, 86, 4, 98, 99,100,101, 96, 97,104,105,106,107,108,109,110,102,103, 20,114,
115,116,117,118,112,113, 6,122,123,124,125,126,119,120, 5,130,131, 70,133,134,
135,128, 97,138,139,140,141,157,136,152, 21,146,147,148,149,150,151,152, 97,145,
155,156,157,158,168,154, 6,162,163,164,165,166,167,168,149,160,161,172,173,174,
170,171, 22,178,179,180,181,182,183,199,185,186,176,177,189,190,187,188, 7,194,
195,196,197,198,199,200,201,202,195,192,226,206,204,220, 23,210,211,212,213,214,
149,216,217,233,219,220,208,209,221,222, 8,226,227,228,229,245,231,232,233,234,
235,236,226,224,238,239, 24,242,243,244,245,246,247,248,249,250,251, 14,253,254,
255,242, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31,
2, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38,
40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78,
80, 82, 84, 86, 88, 90, 92, 94, 96, 98,100,102,104,106,108,110,112,114,116,118,
120,122,124,126,128,130,132,134,136,138,140,142,144,146,148,150,152,154,156,158,
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -