📄 paper1
字号:
example files.The difference would be accentuated with more sophisticated models thatpredict symbols with probabilities approaching one under certain circumstances(eg letter ``u'' following ``q'')..pp.ulNon-adaptive codingcan be performed arithmetically using fixed, pre-specified models like that inthe first part of Figure\ 4.Compression performance will be better than Huffman coding.In order to minimize execution time, the total frequency count,$cum_freq[0]$, should be chosen as a power of two so the divisionsin the bounds calculations (Figure\ 3 lines 91-94 and 190-193) can be doneas shifts.Encode/decode times of around 60\ $mu$s/90\ $mu$s should then be possiblefor an assembly language implementation on a VAX-11/780.A carefully-written implementation of Huffman coding, using table look-up forencoding and decoding, would be a bit faster in this application..pp.ulCompressing black/white imagesusing arithmetic coding has been investigated by Langdon & Rissanen (1981),who achieved excellent results using a model which conditioned the probabilityof a pixel's being black on a template of pixels surrounding it..[Langdon Rissanen 1981 compression of black-white images.]The template contained a total of ten pixels, selected from those above andto the left of the current one so that they precede it in the raster scan.This creates 1024 different possible contexts, and for each the probability ofthe pixel being black was estimated adaptively as the picture was transmitted.Each pixel's polarity was then coded arithmetically according to thisprobability.A 20%\-30% improvement in compression was attained over earlier methods.To increase coding speed Langdon & Rissanen used an approximate methodof arithmetic coding which avoided multiplication by representingprobabilities as integer powers of 1/2.Huffman coding cannot be directly used in this application, as it nevercompresses with a two-symbol alphabet.Run-length coding, a popular method for use with two-valued alphabets,provides another opportunity for arithmetic coding.The model reduces the data to a sequence of lengths of runs of the same symbol(eg for picture coding, run-lengths of black followed by white followed byblack followed by white ...).The sequence of lengths must be transmitted.The CCITT facsimile coding standard (Hunter & Robinson, 1980), for example,bases a Huffman code on the frequencies with which black and white runs ofdifferent lengths occur in sample documents..[Hunter Robinson 1980 facsimile.]A fixed arithmetic code using these same frequencies would give betterperformance; adapting the frequencies to each particular document would bebetter still..pp.ulCoding arbitrarily-distributed integersis often called for when using more sophisticated models of text, image,or other data.Consider, for instance, Bentley \fIet al\fR's (1986) locally-adaptive datacompression scheme, in which the encoder and decoder cache the last $N$different words seen..[Bentley Sleator Tarjan Wei 1986 locally adaptive%J Communications of the ACM.]A word present in the cache is transmitted by sending the integer cache index.Words not in the cache are transmitted by sending a new-word marker followedby the characters of the word.This is an excellent model for text in which words are used frequently overshort intervals and then fall into long periods of disuse.Their paper discusses several variable-length codings for the integers usedas cache indexes.Arithmetic coding allows \fIany\fR probability distribution to be used as thebasis for a variable-length encoding, including \(em amongst countless others\(em the ones implied by the particular codes discussed there.It also permits use of an adaptive model for cacheindexes, which is desirable if the distribution of cache hits isdifficult to predict in advance.Furthermore, with arithmetic coding, the code space allotted to the cacheindexes can be scaled down to accommodate any desired probability for thenew-word marker..sh "Acknowledgement".ppFinancial support for this work has been provided by theNatural Sciences and Engineering Research Council of Canada..sh "References".sp.in+4n.[$LIST$.].in 0.bp.sh "APPENDIX: Proof of decoding inequality".spUsing 1-letter abbreviations for $cum_freq$, $symbol$, $low$, $high$, and$value$, suppose.LB$c[s] ~ <= ~~ left f {(v-l+1) times c[0] ~-~ 1} over {h-l+1} right f ~~ < ~c[s-1]$;.LEin other words,.LB.ta \n(.lu-\n(.iuR$c[s] ~ <= ~~ {(v-l+1) times c[0] ~-~ 1} over {r} ~~-~~ epsilon ~~ <= ~c[s-1] ~-~1$, (1).LE.ta 8nwhere $r ~=~ h-l+1$, $0 ~ <= ~ epsilon ~ <= ~ {r-1} over r $..sp(The last inequality of (1) derives from the fact that $c[s-1]$ must be aninteger.) \cThen we wish to show that $l' ~ <= ~ v ~ <= ~ h'$, where $l'$ and $h'$are the updated values for $low$ and $high$ as defined below..sp.ta \w'(a) 'u(a) $l' ~ == ~~ l ~+~~ left f {r times c[s]} over c[0] right f ~~ mark<= ~~ l ~+~~ {r} over c[0] ~ left [ ~ {(v-l+1) times c[0] ~-~ 1} over {r} ~~ - ~ epsilon ~ right ]$ from (1),.sp 0.5$lineup <= ~~ v ~ + ~ 1 ~ - ~ 1 over c[0]$ ,.sp 0.5 so $l' ~ <= ~~ v$ since both $v$ and $l'$ are integersand $c[0] > 0$..sp(b) $h' ~ == ~~ l ~+~~ left f {r times c[s-1]} over c[0] right f ~~-~1~~ mark>= ~~ l ~+~~ {r} over c[0] ~ left [ ~ {(v-l+1) times c[0] ~-~ 1} over {r} ~~ + ~ 1 ~ - ~ epsilon ~ right ] ~~ - ~ 1$ from (1),.sp 0.5$lineup >= ~~ v ~ + ~~ r over c[0] ~ left [ ~ - ~ 1 over r ~+~ 1 ~-~~ r-1 over r right ]~~ = ~~ v$..bp.sh "Captions for tables".sp.nf.ta \w'Figure 1 'uTable 1 Example fixed model for alphabet {\fIa, e, i, o, u, !\fR}Table 2 Results for encoding and decoding 100,000-byte filesTable 3 Breakdown of timings for VAX-11/780 assembly language versionTable 4 Comparison of arithmetic and adaptive Huffman coding.fi.sh "Captions for figures".sp.nf.ta \w'Figure 1 'uFigure 1 (a) Representation of the arithmetic coding process (b) Like (a) but with the interval scaled up at each stageFigure 2 Pseudo-code for the encoding and decoding proceduresFigure 3 C implementation of arithmetic encoding and decodingFigure 4 Fixed and adaptive models for use with Figure 3Figure 5 Scaling the interval to prevent underflow.fi.bp 0.ev2.nr x2 \w'symbol'/2.nr x3 (\w'symbol'/2)+0.5i+(\w'probability'/2).nr x4 (\w'probability'/2)+0.5i.nr x5 (\w'[0.0, '.nr x1 \n(x2+\n(x3+\n(x4+\n(x5+\w'0.0)'.nr x0 (\n(.l-\n(x1)/2.in \n(x0u.ta \n(x2uC +\n(x3uC +\n(x4u +\n(x5u\l'\n(x1u'.sp symbol probability \0\0range\l'\n(x1u'.sp \fIa\fR 0.2 [0, 0.2) \fIe\fR 0.3 [0.2, 0.5) \fIi\fR 0.1 [0.5, 0.6) \fIo\fR 0.2 [0.6, 0.8) \fIu\fR 0.1 [0.8, 0.9) \fI!\fR 0.1 [0.9, 1.0)\l'\n(x1u'.sp.in 0.FE "Table 1 Example fixed model for alphabet {\fIa, e, i, o, u, !\fR}".bp 0.ev2.nr x1 0.5i+\w'\fIVAX object program\fR '+\w'100,000 '+\w'time ($mu$s) '+\w'time ($mu$s) '+\w'time ($mu$s) '+\w'time ($mu$s) '+\w'time ($mu$s) '+\w'time ($mu$s)'.nr x0 (\n(.l-\n(x1)/2.in \n(x0u.ta 0.5i +\w'\fIVAX object program\fR 'u +\w'100,000 'u +\w'time ($mu$s) 'u +\w'time ($mu$s) 'u +\w'time ($mu$s) 'u +\w'time ($mu$s) 'u +\w'time ($mu$s) 'u\l'\n(x1u'.sp \0\0VAX-11/780 \0\0\0Macintosh \0\0\0\0SUN-3/75 output encode decode encode decode encode decode (bytes) time ($mu$s) time ($mu$s) time ($mu$s) time ($mu$s) time ($mu$s) time ($mu$s)\l'\n(x1u'.spMildly optimized C implementation.sp \fIText file\fR \057718 \0\0214 \0\0262 \0\0687 \0\0881 \0\0\098 \0\0121 \fIC program\fR \062991 \0\0230 \0\0288 \0\0729 \0\0950 \0\0105 \0\0131 \fIVAX object program\fR \073501 \0\0313 \0\0406 \0\0950 \01334 \0\0145 \0\0190 \fIAlphabet\fR \059292 \0\0223 \0\0277 \0\0719 \0\0942 \0\0105 \0\0130 \fISkew-statistics\fR \012092 \0\0143 \0\0170 \0\0507 \0\0645 \0\0\070 \0\0\085.spCarefully optimized assembly language implementation.sp \fIText file\fR \057718 \0\0104 \0\0135 \0\0194 \0\0243 \0\0\046 \0\0\058 \fIC program\fR \062991 \0\0109 \0\0151 \0\0208 \0\0266 \0\0\051 \0\0\065 \fIVAX object program\fR \073501 \0\0158 \0\0241 \0\0280 \0\0402 \0\0\075 \0\0107 \fIAlphabet\fR \059292 \0\0105 \0\0145 \0\0204 \0\0264 \0\0\051 \0\0\065 \fISkew-statistics\fR \012092 \0\0\063 \0\0\081 \0\0126 \0\0160 \0\0\028 \0\0\036\l'\n(x1u'.sp 2.nr x0 \n(.l.ll \n(.lu-\n(.iu.fi.in \w'\fINotes:\fR 'u.ti -\w'\fINotes:\fR 'u\fINotes:\fR\ \ \cTimes are measured in $mu$s per byte of uncompressed data..sp 0.5The VAX-11/780 had a floating-point accelerator, which reduces integermultiply and divide times..sp 0.5The Macintosh uses an 8\ MHz MC68000 with some memory wait states..sp 0.5The SUN-3/75 uses a 16.67\ MHz MC68020..sp 0.5All times exclude I/O and operating system overhead in support of I/O.VAX and SUN figures give user time from the U\s-2NIX\s+2 \fItime\fRcommand; on the Macintosh I/O was explicitly directed to an array..sp 0.5The 4.2BSD C compiler was used for VAX and SUN; Aztec C 1.06g for Macintosh..sp.ll \n(x0u.nf.in 0.FE "Table 2 Results for encoding and decoding 100,000-byte files".bp 0.ev2.nr x1 \w'\fIVAX object program\fR '+\w'Bounds calculation '+\w'time ($mu$s) '+\w'time ($mu$s)'.nr x0 (\n(.l-\n(x1)/2.in \n(x0u.ta \w'\fIVAX object program\fR 'u +\w'Bounds calculation 'u +\w'time ($mu$s) 'u +\w'time ($mu$s)'u\l'\n(x1u'.sp encode decode time ($mu$s) time ($mu$s)\l'\n(x1u'.sp\fIText file\fR Bounds calculation \0\0\032 \0\0\031 Bit shifting \0\0\039 \0\0\030 Model update \0\0\029 \0\0\029 Symbol decode \0\0\0\(em \0\0\045 Other \0\0\0\04 \0\0\0\00 \0\0\l'\w'100'u' \0\0\l'\w'100'u' \0\0104 \0\0135.sp\fIC program\fR Bounds calculation \0\0\030 \0\0\028 Bit shifting \0\0\042 \0\0\035 Model update \0\0\033 \0\0\036 Symbol decode \0\0\0\(em \0\0\051 Other \0\0\0\04 \0\0\0\01 \0\0\l'\w'100'u' \0\0\l'\w'100'u' \0\0109 \0\0151.sp\fIVAX object program\fR Bounds calculation \0\0\034 \0\0\031 Bit shifting \0\0\046 \0\0\040 Model update \0\0\075 \0\0\075 Symbol decode \0\0\0\(em \0\0\094 Other \0\0\0\03 \0\0\0\01 \0\0\l'\w'100'u' \0\0\l'\w'100'u' \0\0158 \0\0241\l'\n(x1u'.in 0.FE "Table 3 Breakdown of timings for VAX-11/780 assembly language version".bp 0.ev2.nr x1 \w'\fIVAX object program\fR '+\w'100,000 '+\w'time ($mu$s) '+\w'time ($mu$s) '+\w'100,000 '+\w'time ($mu$s) '+\w'time ($mu$s)'.nr x0 (\n(.l-\n(x1)/2.in \n(x0u.ta \w'\fIVAX object program\fR 'u +\w'100,000 'u +\w'time ($mu$s) 'u +\w'time ($mu$s) 'u +\w'100,000 'u +\w'time ($mu$s) 'u +\w'time ($mu$s)'u\l'\n(x1u'.sp \0\0\0\0\0\0Arithmetic coding \0\0\0Adaptive Huffman coding output encode decode output encode decode (bytes) time ($mu$s) time ($mu$s) (bytes) time ($mu$s) time ($mu$s)\l'\n(x1u'.sp\fIText file\fR \057718 \0\0214 \0\0262 \057781 \0\0550 \0\0414\fIC program\fR \062991 \0\0230 \0\0288 \063731 \0\0596 \0\0441\fIVAX object program\fR \073546 \0\0313 \0\0406 \076950 \0\0822 \0\0606\fIAlphabet\fR \059292 \0\0223 \0\0277 \060127 \0\0598 \0\0411\fISkew-statistics\fR \012092 \0\0143 \0\0170 \016257 \0\0215 \0\0132\l'\n(x1u'.sp 2.nr x0 \n(.l.ll \n(.lu-\n(.iu.fi.in +\w'\fINotes:\fR 'u.ti -\w'\fINotes:\fR 'u\fINotes:\fR\ \ \cMildly optimized C implementation used for arithmetic coding.sp 0.5U\s-2NIX\s+2 \fIcompact\fR used for adaptive Huffman coding.sp 0.5Times are for a VAX-11/780, and exclude I/O and operating system overhead insupport of I/O..sp.ll \n(x0u.nf.in 0.FE "Table 4 Comparison of arithmetic and adaptive Huffman coding"
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -