📄 compress.txt
字号:
+======================================================+| Introduction to the losslessy compression schemes || Description of the source files and the methods |+------------------------------------------------------+| From David Bourgin (E-mail: dbourgin@ufrima.imag.fr) || Date: 12/10/95 VERSION: 1.5 |+======================================================+ ------ BE CARE ------This file must be given in a package containing the following files into twodirectories:* French files in codecs.dir/francais directory:lisezmoi,compress.txt,codrle1.c,codrle2.c,codrle3.c,codrle4.c,codhuff.c,codlzw.c,dcodrle1.c,dcodrle2.c,dcodrle3.c,dcodrle4.c,dcodhuff.c,dcodlzw.c* English files in codecs.dir/english directory:readme,compress.txt,codrle1.c,codrle2.c,codrle3.c,codrle4.c,codhuff.c,codlzw.c,dcodrle1.c,dcodrle2.c,dcodrle3.c,dcodrle4.c,dcodhuff.c,dcodlzw.cPlease read the file 'readme' to get more infos about copyrights and thecontents of the files. ---------------------There are several means to compress data. Here, we are only going to deal withthe losslessy schemes. These schemes are also called non-destructive becauseyou always recover the initial data you had, and this, as soon as you need them.With losslessy schemes, you won't never lose any informations (except perhapswhen you store or transmit your data but this is another problem...).In this introduction, we are going to see:- The RLE scheme (with different possible algorithms)- The Huffman schemes (dynamical scheme)- And the LZW schemeFor the novice, a compresser is a program able to read several data (e.g. bytes)in input and to write several data in output. The data you obtain from theoutput (also called compressed data) will - of course - take less space thanthe the input data. This is true in most of cases, if the compresser worksand if the type of the data is correct to be compressed with the given scheme.The codec (coder-decoder) enables you to save space on your hard disk and/orto save the communication costs because you always store/transmit the compresseddata. You'll use the decompresser as soon as you need to recover your initialuseful data. Note that the compressed data are useless if you have notthe decoder...You are doubtless asking "How can I reduce the data size without losing someinformations?". It's easy to answer to this question. I'll only take an example.I'm sure you have heard about the morse. This system established in the 19thcentury use a scheme very close to the huffman one. In the morse you encodethe letters to transmit with two kinds of signs. If you encode these two signpossibilities in one bit, the symbol 'e' is transmitted in a single bit andthe symbols 'y' and 'z' need four bits. Look at the symbols in the text you arereading, you'll fast understand the compression ratio...From there I explain into two parts:- How to change the source codes- What are RLE1, RLE2, RLE3, Huffman, and LZW encoding/decoding********** FIRST PART: Source modifications you can do **********Important: The source codes associated to the algorithms I present arecompletely adaptative on what you need to compress. They all use basicalmacros on the top of the file. Usually the macros to change are:- beginning_of_data- end_of_data- read_byte- read_block- write_byte- write_blockThese allow the programmer to modify only a little part of the headerof the source codes in order to compress as well memory as files.beginning_of_data(): Macro used to set the program so that the next read_byte()call will read the first byte to compress.end_of_data(): Returns a boolean to know whether there is no more bytes to readfrom the input stream. Return 0 if there is no more byte to compress, anothernon-zero value otherwise.read_byte(): Returns a byte read from the input stream if available.write_byte(x): Writes the byte 'x' to the output stream.read_block(...) and write_block(...): Same use as read_byte and write_byte(x)but these macros work on blocks of bytes and not only on a single byte.If you want to compress *from* the memory, before entering in a xxxcodingprocedure ('xxx' is the actual extension to replace with a given codec), youhave to add a pointer set up to the beginning of the zone to compress. Notethat the following pointer 'source_memory_base' is not to add, it is just givenhere to specify a name to the address of the memory zone you are going toencode or decode. That is the same about source_memory_end which can be eithera pointer to create or an existing pointer.unsigned char *source_memory_base, /* Base of the source memory */ *source_memory_end, /* Last address to read.source_memory_end=source_memory_base+source_zone_length-1 */ *source_ptr; /* Used in the xxxcoding procedure */void pre_start(){ source_ptr=source_memory_base; xxxcoding();}end_of_data() and read_byte() are also to modify to compress *from* memory:#define end_of_data() (source_ptr>source_memory_end)#define read_byte() (*(source_ptr++))If you want to compress *to* memory, before entering in a xxxcoding procedure('xxx' is the actual extension to replace with a given codec), you have to adda pointer. Note that the pointer 'dest_memory_base' is not to add, it is justgiven there to specify the address of the destination memory zone you aregoing to encode or decode.unsigned char *dest_memory_base, /* Base of the destination memory */ *dest_ptr; /* Used in the xxxcoding procedure */void pre_start(){ dest_ptr=dest_memory_base; xxxcoding();}Of course, you can combine both from and to memory in the pre_start() procedure.The files dest_file and source_file handled in the main() function areto remove...void pre_start(){ source_ptr=source_memory_base; dest_ptr=dest_memory_base; xxxcoding();}In fact, to write to memory, the problem is in the write_byte(x) procedure.This problem exists because your destination zone can either be a staticzone or a dynamically allocated zone. In the two cases, you have to checkif there is no overflow, especially if the coder is not efficient and mustproduce more bytes than you reserved in memory.In the first case, with a *static* zone, write_byte(x) macro should look likethat:unsigned long int dest_zone_length, current_size;#define write_byte(x) { if (current_size==dest_zone_length) \ exit(1); \ dest_ptr[current_size++]=(unsigned char)(x); \ }In the static version, the pre_start() procedure is to modify as following:void pre_start(){ source_ptr=source_memory_base; dest_ptr=dest_memory_base; dest_zone_length=...; /* Set up to the actual destination zone length */ current_size=0; /* Number of written bytes */ xxxcoding();}Otherwise, dest_ptr is a zone created by the malloc instruction and you can tryto resize the allocated zone with the realloc instruction. Note that I incrementthe zone one kilo-bytes by one kylo-bytes. You have to add two other variables:unsigned long int dest_zone_length, current_size;#define write_byte(x) { if (current_size==dest_zone_length) \ { dest_zone_length += 1024; \ if ((dest_ptr=(unsigned char *)realloc(dest_ptr,dest_zone_length*sizeof(unsigned char)))==NULL) \ exit(1); /* You can't compress in memory \ => I exit but *you* can make a routine to swap on disk */ \ } \ dest_ptr[current_size++]=(unsigned char)(x); \ }With the dynamically allocated version, change the pre_start() routine as following:void pre_start(){ source_ptr=source_memory_base; dest_ptr=dest_memory_base; dest_zone_length=1024; if ((dest_ptr=(unsigned char *)malloc(dest_zone_length*sizeof(unsigned char)))==NULL) exit(1); /* You need at least 1 kb in the dynamical memory ! */ current_size=0; /* Number of written bytes */ xxxcoding(); /* Handle the bytes in dest_ptr but don't forget to free these bytes with: free(dest_ptr); */}The previously given macros work as:void demo() /* The file opening, closing and variables must be set up by the calling procedure */{ unsigned char byte; /* And not 'char byte' (!) */ while (!end_of_data()) { byte=read_byte(); printf("Byte read=%c\n",byte); }}You must not change the rest of the program unless you're really sure andreally need to do it!********** SECOND PART: Encoding/Decoding explainations **********+==========================================================+| The RLE encoding |+==========================================================+RLE is an acronym that stands for Run Length Encoding. You may encounter itas an other acronym: RLC, Run Length Coding.The idea in this scheme is to recode your data with regard to the repetitionframes. A frame is one or more bytes that occurr one or several times.There are several means to encode occurrences. So, you'll have several codecs.For example, you may have a sequence such as:0,0,0,0,0,0,255,255,255,2,3,4,2,3,4,5,8,11Some codecs will only deal with the repetitions of '0' and '255' but some otherwill deal with the repetitions of '0', '255', and '2,3,4'.You have to keep in your mind something important based on this example. A codecwon't work on all the data you will try to compress. So, in case of nonexistence of sequence repetitions, the codecs based on RLE schemes must notdisplay a message to say: "Bye bye". Actually, they will try to encode thesenon repeted data with a value that says "Sorry, I only make a copy of the initalinput". Of course, a copy of the input data with an header in front of this copywill make a biggest output data but if you consider the whole data to compress,the encoding of repeated frames will take less space than the encodingof non-repeated frames.All of the algorithms with the name of RLE have the following look with threeor four values:- Value saying if there's a repetition- Value saying how many repetitions (or non repetition)- Value of the length of the frame (useless if you just encode framewith one byte as maximum length)- Value of the frame to repeat (or not)I gave four algorithms to explain what I say.*** First RLE scheme ***The first scheme is the simpliest I know, and looks like the one used in MACsystem (MacPackBit) and some image file formats such as Targa, PCX, TIFF, ...Here, all compressed blocks begin with a byte, named header, which descriptionis:Bits 7 6 5 4 3 2 1 0Header X X X X X X X XBits 7: Compression status (1=Compression applied) 0 to 6: Number of bytes to handleSo, if the bit 7 is set up to 0, the 0 to 6 bits give the number of bytesthat follow (minus 1, to gain more over compress) and that were not compressed(native bytes). If the bit 7 is set up to 1, the same 0 to 6 bits givethe number of repetition (minus 2) of the following byte.As you see, this method only handle frame with one byte.Additional note: You have 'minus 1' for non-repeated frames because you musthave at least one byte to compress and 'minus 2' for repeated frames because therepetition must be 2, at least.Compression scheme: First byte=Next /\ / \Count the byte Count the occurrence of NON identicaloccurrences bytes (maximum 128 times)(maximum 129 times) and store them in an array | |
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -