📄 readme

📁 source code for arithmatic coding
💻
字号:
Word, Character, Integer, and Bit Based Compression Using Arithmetic Coding.-------------------------------------------------------------------Authors:John CarpinelliAlistair Moffat (alistair@cs.mu.oz.au)Radford Neal (radford@ai.toronto.edu)Wayne SalamonsenLang Stuiver (langs@cs.mu.oz.au)Andrew Turpin (aht@cs.mu.oz.au)Ian Witten (ihw@waikato.ac.nz)Contact:Alistair MoffatDepartment of Computer ScienceUniversity of MelbourneVictoria, Australia 3052.Fax: +61 3 93481184alistair@cs.mu.oz.auhttp://www.cs.mu.oz.au/~alistair/Based on:A. Moffat, R. Neal, I.H. Witten, "Arithmetic Coding Revisted", ACMTransactions on Information Systems, 16(3):256-294, 1998.http://www.cs.mu.oz.au/~alistair/abstracts/mnw98:acmtois.htmlA. Moffat, "An improved data structure for cummulative probabilitytables", Software-Practice and Experience, Software-Practice andExperience, 29(7):647-659, 1999.http://www.cs.mu.oz.au/~alistair/abstracts/mof99:spe.htmlSee also:I.H. Witten, R. Neal, J.G. Cleary, "Arithmetic Coding for DataCompression", Comm ACM., 30(6):520-541, June 1987.ftp://ftp.cpsc.ucalgary.ca/pub/projects/ar.codRadford M. Neal, Low-Precision Arithmetic Coding Implementation.ftp://ftp.cs.toronto.edu/pub/radford/lowp_ac.sharThanks:To Mahesh Naik, Julian Seward, Michael Schindler, Sascha Kratky, TimWentford, and the many other people who have provided feedback onearlier versions of this software.Overview:This package contains C source code for an illustrative implementationof arithmetic coding.  Four different models are included with thepackage: an adaptive zero-order word based model, an adaptivezero-order character based model, and adaptive zero-order (unsigned)integer based model, and an adaptive fixed-context bit-based model.The option of using shift/add operations to effect the necessaryinteger multiplication and division operations is also supported.  Useof the shift/add option may result in speed improvement on somearchitectures.Each of the four programs can be thought of as consisting of threesections:  model, statistics, and coder. Four different models havebeen provided, one statistics module, and one coder module (but with acompile time option to use hardware or software arithmetic).Models:The word model follows a strictly alternating sequence of reading aword then a non-word. After reading each word/non-word it uses ahashtable to find the corresponding ordinal word number which it thenpasses to the statistics module together with the correspondingword/non word context. A context stores the corresponding frequency foreach ordinal symbol. This model illustrates the advantages of the newcoder on large alphabets, and the manipulation of a number of``contexts'' within a single program.The adaptive zero-order character-based model is broadly equivalent tothe one presented in the CACM paper of Witten, Neal and Cleary.Compression is relatively poor.The adaptive zero-order integer-based model is a simple derivative ofthe character-based one, and reads uints from the input rather thanbytes.  Note that each integer must be in the range 1..r+1, where r isthe largest integer seen to date. The first integer in the file must be1.  Think of these integers as being symbol identifiers assigned on thefly in order of first appearance.The bit-based model takes one bit at a time from the input stream andpasses it and the corresponding context into the statistics module.The correct context is chosen based on the previous n bits, where 0 <=n <= 20 is specified as a commandline parameter.  Relatively goodcompression results when n is large, but it runs slowly. The intentionis to illustrate the use of the arithmetic coding routines whenrestricted to a binary alphabet.The zero-order unsigned-integer based model was used in experimentswhen testing the cummulative statistics data structure used in stats.c,and many other coding exeriments (see the WWW site for details).Statistics:The statistics module interfaces the model and the coder.  It isresponsible for probability estimation and generation of the parametersrequired by the coder.  It does so by adaptively counting symboloccurrences, and uses a tree-based structure to maintain cumulativefrequencies.  Accessing the structure takes O(log s) time per symbolnumber s.  Just as important, it is space-economical, and requires justn words of storage compared to the 3n words required by the CACMstructure.Coder:The default coder uses shift/add integer arithmetic in itscalculations, as described in "Arithmetic Coding Revisited". Thecompile time option "-DMULT_DIV" forces the use of multiplications anddivisions.  The low precision that is required allows, on somearchitectures, faster execution with shift/add.One notable feature about the coder is that by using a limitedprecision implementation, the total frequency can be up to 30 bits(assuming 32 bit integers, or more generally n-2 for n-bit integers).That is, the total frequency may reach a value as large as1,073,741,824 before frequencies need be halved. The number of bitsused for frequency counts can be set as a command line argumentallowing one to use fewer bits for more precise coding.  The defaultnumber of bits used for frequency counts is 27.The use of large values for the F_bits parameter also allows fasterexecution, since much of the arithmetic can be done to low precision.Very large values for F_bits do, however, introduce rounding effects inthe coder that may negate the gain of using more accurate modelprobabilities.  See the paper for analysis of these rounding effects.The package:The package consists of 22 files: 9 source files (.c), 5 header files (.h), 1include file (.i), a makefile, 4 man pages (.1), a WHATSNEW file, and thisREADME file.arith_coder.c:	Front end to the program.word.c:		Driving code for the word based compression model.char.c: 	Driving code for the character based compression model.bits.c:		Driving code for the bit based compression model.uint.c:		Driving code for the unsigned integer based compression model.bitio.c:	Input output functions (bit and byte level)stats.c: 	Functions for maintaining frequency counts.arith.c: 	Functions associated with arithmetic coding.  hashtable.c: 	Functions associated with maintaining the hash table              	used to convert words to ordinal word numbersarith.h:	Function prototypes exported from arith.c, #defines used in 		arith.cstats.h:	Function prototypes exported from stats.c, #defines used in		stats.cbitio.h:	Input / output macros (bit and byte level)arith_coder.h: Function prototypes #defines used in arith_coder.c, char.c,               bits.c, word.c.hashtable.h: Function prototypes exported from hashtable.c, #defines used             in hashtable.cunroll.i:	Generic include file to unroll loops / repeat code fragments.arith_coder.1:	Manual pagechar.1, word.1, bits.1, uint.1:		Pointers to arith_coder.1 man page.Usage:  The programs share a common interface (and are in fact the same program).Output is written to stdout. If no input file is specified, input is takenfrom stdin. One of -e or -d must be specified, to encode or decode.For help on usage, type "arith_coder -h".  For more detail refer to the manpage.Here is a sample output of arith_coder -h:Usage: arith_coder [-e [-t s] [-f n] [-b n] [-m n] [-c n] | -d] [-v] [file]-e: Encode    -t s    Encoding method               (char, word, bits, default = char)    -b n    Code bits to use              (27, fixed at compile time)    -f n    Frequency bits to use         (32, fixed at compile time)    -m n    Memory size to use (Mbytes)   (1..255, default = 1)    bits & word    -c n    Bits of context to use        (0..20, default = 16)    bits only-d: Decode-v: Verbose Give timing, compression, and memory usage information.Conditions:These programs are supplied free of charge for research purposes only, andmay not sold or incorporated into any commercial product.  There isABSOLUTELY NO WARRANTY of any sort, nor any undertaking that they are fit forANY PURPOSE WHATSOEVER.  Use them at your own risk.  If you do happen to finda bug, or have modifications to suggest, please report the same to AlistairMoffat at the address above.Have fun!Alistair Moffat,February 1999
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -