📄 lowp_ac.doc
字号:
LOW-PRECISION ARITHMETIC CODING IMPLEMENTATION Radford M. Neal Initial release: 8 July 1991 Documentation update: 16 July 1991 Bug fix: 25 July 1991 Bug fix: 16 Sept 1992 Changes for ANSI C compatibility: 29 October 1992 Bug fix: 19 May 1993This directory contains C source for an implementation of arithmeticcoding using low-precision division. This division can be performedusing shift/add operations, with potential savings in time on anymachine without parallel multiply/divide hardware.The implementation is based on that in the paper of Witten, Neal, andCleary published in the June 1987 Communications of the ACM. Anyonewishing to understand this program is urged to first read that paper.Differences in this version are as follows: 1) The arithmetic coding operations have been fiddled so that the division involved can be done to very low precision. There is a tradeoff between precision and compression performance, but nearly optimal results are obtained with a precision of six bits, and precisions of as low as three bits give reasonable results. A precision of at least two bits is required for correct operation. 2) In order for (1) to be possible, the model is now required to produce "partially normalized" frequencies, in which the total for all symbols is always more than half the maximum allowed total. This is not onerous, at least for the models used here. 3) The model must also now arrainge for the most probable symbol to have index 1. This was always the case, but previously this was solely a matter of time efficiency. Now, failure to do this would impact compression efficiency, though not correct operation. 4) The precision to which symbol frequencies may be held is much higher in this implementation - 27 bits with the default parameter settings. The CACM implementation was restricted to 14 bit frequencies. This is of significance in applications where the number of symbols is large, such as with word-based models. 5) Encode/decode procedures specialized for use with a two-symbol alphabet have been added. These are demonstrated by a simple adaptive image compression program. 6) Various minor modifications and structural changes to the program have been made.Two versions of the coding procedures are provided - one using Cmultiply and divide operators, the other using shifts and adds. Theseversions, and the resulting encode/decode programs, are distinguishedby the suffixes "_mul" and "_sft". Which version is fastest willdepend on the particular machine and compiler used. All encode/decodeprograms simply read from standard input and write to standard output.The file 'tstpic' contains a test picture for the imageencoding/decoding programs. The format of such pictures may bediscerned by examination of this example, and of the program code.A program for calculating a bound on maximum expected redundancy resulting from low-precision division is included. Typical redundancyis much less than this bound.For the multiply/divide version, the requirement that the modelproduce partially normalized frequencies is not really necessary.The program is intended for demonstation purposes. It is not optimizedfor time efficiency, and provides little in the way of error checking.The method used in this program has some resemblences to that presentedby Rissanen and Mohiuddin in "A Multiplication-Free Multialphabet ArithmeticCode", IEEE Transactions on Communications, February, 1989. The mainsimilarities are the following: 1) The idea of constraining the size of the coding region and the range of the occurrence counts so as to allow an approximation. 2) The placement of the most-probable symbol at the top of the coding region.There are a number of significant dissimilarities, however. The detailsof the constraints mentioned above are different. The low-precisionmethod implemented here is more general, giving a smooth trade-off betweencompression performance and speed through choice of precision for themultiplication and division. Other unique features of this code include: 1) Incremental maintenance of partialy-normalized occurrence counts, eliminating the need for such normalization in the coding process, as is the case with the Rissanen and Mohiuddin method. 2) Merging of multiply and divide operations for faster operation with serial arithmetic (not relevant in the less-general Rissanen and Mohiuddin method). 3) A variable-precision computation in order to locate the next symbol in the non-binary decode procedure (also not relevant in the Rissanen and Mohiuddin method).The Rissanen and Mohiuddin method should be somewhat faster than thatused here. Its coding efficiency appears similar to that which would beobtained with this method if divisions are performed to a precision of two bits.The detailed algorithm presented in the paper by Rissanen and Mohiuddinuses the supposedly patented "bit stuffing" procedure. This procedureis _not_ used in this code. This code is public domain, and may be used by anyone for any purpose.I do, however, expect that distributions utilizing this code willinclude an acknowledgement of its source. The program is providedwithout any warranty as to correct operation. The Rissanen andMohiuddin method is said to be patented, and I can offer no guaranteesas to whether use of the code presented here might infringe thosepatents (off hand, this would seem to be a complex question with nodefinitive answer). My amateur understanding of patent law leads me tobelieve that use for research purposes would be permitted under anycircumstances, but I could well be deluded in this regard.Address comments to: Radford Neal Dept. of Computer Science University of Toronto 10 King's College Road Toronto, Ontario, CANADA M5S 1A4 e-mail: radford@ai.toronto.edu
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -