📄 manual_4.html
字号:
If you've incorporated <CODE>libbzip2</CODE> into your own programand are getting problems, please, please, please, check that the parameters you are passing in calls to the library, arecorrect, and in accordance with what the documentation saysis allowable. I have tried to make the library robust againstsuch problems, but I'm sure I haven't succeeded.</P><P>Finally, if the above comments don't help, you'll have to sendme a bug report. Now, it's just amazing how many people will send me a bug report saying something like<TABLE><tr><td> </td><td class=display><pre style="font-family: serif"> bzip2 crashed with segmentation fault on my machine</pre></td></tr></table>and absolutely nothing else. Needless to say, a such a reportis <EM>totally, utterly, completely and comprehensively 100% useless; a waste of your time, my time, and net bandwidth</EM>.With no details at all, there's no way I can possibly beginto figure out what the problem is.</P><P>The rules of the game are: facts, facts, facts. Don't omitthem because "oh, they won't be relevant". At the bare minimum:<TABLE><tr><td> </td><td class=display><pre style="font-family: serif"> Machine type. Operating system version. Exact version of <CODE>bzip2</CODE> (do <CODE>bzip2 -V</CODE>). Exact version of the compiler used. Flags passed to the compiler.</pre></td></tr></table>However, the most important single thing that will help me isthe file that you were trying to compress or decompress at thetime the problem happened. Without that, my ability to do anythingmore than speculate about the cause, is limited.</P><P>Please remember that I connect to the Internet with a modem, soyou should contact me before mailing me huge files.</P><P><HR SIZE="6"><A NAME="SEC47"></A><TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0><TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC46"> < </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC48"> > </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD></TR></TABLE><H2> 4.4 Did you get the right package? </H2><!--docid::SEC47::--><P><CODE>bzip2</CODE> is a resource hog. It soaks up large amounts of CPU cyclesand memory. Also, it gives very large latencies. In the worst case, youcan feed many megabytes of uncompressed data into the library beforegetting any compressed output, so this probably rules out applicationsrequiring interactive behaviour.</P><P>These aren't faults of my implementation, I hope, but morean intrinsic property of the Burrows-Wheeler transform (unfortunately). Maybe this isn't what you want.</P><P>If you want a compressor and/or library which is faster, uses lessmemory but gets pretty good compression, and has minimal latency,consider Jean-loupGailly's and Mark Adler's work, <CODE>zlib-1.1.3</CODE> and<CODE>gzip-1.2.4</CODE>. Look for them at</P><P><CODE>http://www.zlib.org</CODE> and<CODE>http://www.gzip.org</CODE> respectively.</P><P>For something faster and lighter still, you might try Markus F X JOberhumer's <CODE>LZO</CODE> real-time compression/decompression library, at<BR> <CODE>http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html</CODE>.</P><P>If you want to use the <CODE>bzip2</CODE> algorithms to compress small blocksof data, 64k bytes or smaller, for example on an on-the-fly diskcompressor, you'd be well advised not to use this library. Instead,I've made a special library tuned for that kind of use. It's part of<CODE>e2compr-0.40</CODE>, an on-the-fly disk compressor for the Linux<CODE>ext2</CODE> filesystem. Look at<CODE>http://www.netspace.net.au/~reiter/e2compr</CODE>.</P><P><HR SIZE="6"><A NAME="SEC48"></A><TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0><TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC47"> < </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC49"> > </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD></TR></TABLE><H2> 4.5 Testing </H2><!--docid::SEC48::--><P>A record of the tests I've done.</P><P>First, some data sets:<UL><LI>B: a directory containing 6001 files, one for every length in the range 0 to 6000 bytes. The files contain random lowercase letters. 18.7 megabytes.<LI>H: my home directory tree. Documents, source code, mail files, compressed data. H contains B, and also a directory of files designed as boundary cases for the sorting; mostly very repetitive, nasty files. 565 megabytes.<LI>A: directory tree holding various applications built from source: <CODE>egcs</CODE>, <CODE>gcc-2.8.1</CODE>, KDE, GTK, Octave, etc. 2200 megabytes.</UL>The tests conducted are as follows. Each test means compressing (a copy of) each file in the data set, decompressing it andcomparing it against the original.<P>First, a bunch of tests with block sizes and internal buffersizes set very small, to detect any problems with theblocking and buffering mechanisms. This required modifying the source code so as to try to break it.<OL><LI>Data set H, with buffer size of 1 byte, and block size of 23 bytes.<LI>Data set B, buffer sizes 1 byte, block size 1 byte.<LI>As (2) but small-mode decompression.<LI>As (2) with block size 2 bytes.<LI>As (2) with block size 3 bytes.<LI>As (2) with block size 4 bytes.<LI>As (2) with block size 5 bytes.<LI>As (2) with block size 6 bytes and small-mode decompression.<LI>H with buffer size of 1 byte, but normal block size (up to 900000 bytes).</OL>Then some tests with unmodified source code.<OL><LI>H, all settings normal.<LI>As (1), with small-mode decompress.<LI>H, compress with flag <CODE>-1</CODE>.<LI>H, compress with flag <CODE>-s</CODE>, decompress with flag <CODE>-s</CODE>.<LI>Forwards compatibility: H, <CODE>bzip2-0.1pl2</CODE> compressing, <CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.<LI>Backwards compatibility: H, <CODE>bzip2-0.9.5</CODE> compressing, <CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.<LI>Bigger tests: A, all settings normal.<LI>As (7), using the fallback (Sadakane-like) sorting algorithm.<LI>As (8), compress with flag <CODE>-1</CODE>, decompress with flag <CODE>-s</CODE>.<LI>H, using the fallback sorting algorithm.<LI>Forwards compatibility: A, <CODE>bzip2-0.1pl2</CODE> compressing, <CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.<LI>Backwards compatibility: A, <CODE>bzip2-0.9.5</CODE> compressing, <CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.<LI>Misc test: about 400 megabytes of <CODE>.tar</CODE> files with <CODE>bzip2</CODE> compiled with Checker (a memory access error detector, like Purify).<LI>Misc tests to make sure it builds and runs ok on non-Linux/x86 platforms.</OL>These tests were conducted on a 225 MHz IDT WinChip machine, runningLinux 2.0.36. They represent nearly a week of continuous computation.All tests completed successfully.<P><HR SIZE="6"><A NAME="SEC49"></A><TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0><TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC48"> < </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ > ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD></TR></TABLE><H2> 4.6 Further reading </H2><!--docid::SEC49::--><CODE>bzip2</CODE> is not research work, in the sense that it doesn't presentany new ideas. Rather, it's an engineering exercise based on existingideas.<P>Four documents describe essentially all the ideas behind <CODE>bzip2</CODE>:<TABLE><tr><td> </td><td class=example><pre>Michael Burrows and D. J. Wheeler: "A block-sorting lossless data compression algorithm" 10th May 1994. Digital SRC Research Report 124. ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz If you have trouble finding it, try searching at the New Zealand Digital Library, http://www.nzdl.org.Daniel S. Hirschberg and Debra A. LeLewer "Efficient Decoding of Prefix Codes" Communications of the ACM, April 1990, Vol 33, Number 4. You might be able to get an electronic copy of this from the ACM Digital Library.David J. Wheeler Program bred3.c and accompanying document bred3.ps. This contains the idea behind the multi-table Huffman coding scheme. ftp://ftp.cl.cam.ac.uk/users/djw3/Jon L. Bentley and Robert Sedgewick "Fast Algorithms for Sorting and Searching Strings" Available from Sedgewick's web page, www.cs.princeton.edu/~rs</pre></td></tr></table>The following paper gives valuable additional insights into thealgorithm, but is not immediately the basis of any codeused in bzip2.<TABLE><tr><td> </td><td class=example><pre>Peter Fenwick: Block Sorting Text Compression Proceedings of the 19th Australasian Computer Science Conference, Melbourne, Australia. Jan 31 - Feb 2, 1996. ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</pre></td></tr></table>Kunihiko Sadakane's sorting algorithm, mentioned above,is available from:<TABLE><tr><td> </td><td class=example><pre>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz</pre></td></tr></table>The Manber-Myers suffix array constructionalgorithm is described in a paperavailable from:<TABLE><tr><td> </td><td class=example><pre>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps</pre></td></tr></table>Finally, the following paper documents some recent investigationsI made into the performance of sorting algorithms:<TABLE><tr><td> </td><td class=example><pre>Julian Seward: On the Performance of BWT Sorting Algorithms Proceedings of the IEEE Data Compression Conference 2000 Snowbird, Utah. 28-30 March 2000.</pre></td></tr></table></P><P><HR SIZE="6"><TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0><TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD></TR></TABLE><BR> <FONT SIZE="-1">This document was generatedby <I>Julian Seward</I> on <I>January, 5 2002</I>using <A HREF="http://www.mathematik.uni-kl.de/~obachman/Texi2html"><I>texi2html</I></A></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -