📄 manual_4.html

📁 高效率的一种通用压缩/解压程序
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
minimum:<PRE>   Machine type.  Operating system version.     Exact version of <CODE>bzip2</CODE> (do <CODE>bzip2 -V</CODE>).     Exact version of the compiler used.     Flags passed to the compiler.</PRE><P>However, the most important single thing that will help me isthe file that you were trying to compress or decompress at thetime the problem happened.  Without that, my ability to do anythingmore than speculate about the cause, is limited.</P><P>Please remember that I connect to the Internet with a modem, soyou should contact me before mailing me huge files.</P><H2><A NAME="SEC37" HREF="manual_toc.html#TOC37">Did you get the right package?</A></H2><P><CODE>bzip2</CODE> is a resource hog.  It soaks up large amounts of CPU cyclesand memory.  Also, it gives very large latencies.  In the worst case, youcan feed many megabytes of uncompressed data into the library beforegetting any compressed output, so this probably rules out applicationsrequiring interactive behaviour.</P><P>These aren't faults of my implementation, I hope, but morean intrinsic property of the Burrows-Wheeler transform (unfortunately).  Maybe this isn't what you want.</P><P>If you want a compressor and/or library which is faster, uses lessmemory but gets pretty good compression, and has minimal latency,consider Jean-loupGailly's and Mark Adler's work, <CODE>zlib-1.1.2</CODE> and<CODE>gzip-1.2.4</CODE>.  Look for them at<CODE>http://www.cdrom.com/pub/infozip/zlib</CODE> and<CODE>http://www.gzip.org</CODE> respectively.</P><P>For something faster and lighter still, you might try Markus F X JOberhumer's <CODE>LZO</CODE> real-time compression/decompression library, at<BR> <CODE>http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html</CODE>.</P><P>If you want to use the <CODE>bzip2</CODE> algorithms to compress small blocksof data, 64k bytes or smaller, for example on an on-the-fly diskcompressor, you'd be well advised not to use this library.  Instead,I've made a special library tuned for that kind of use.  It's part of<CODE>e2compr-0.40</CODE>, an on-the-fly disk compressor for the Linux<CODE>ext2</CODE> filesystem.  Look at<CODE>http://www.netspace.net.au/~reiter/e2compr</CODE>.</P><H2><A NAME="SEC38" HREF="manual_toc.html#TOC38">Testing</A></H2><P>A record of the tests I've done.</P><P>First, some data sets:<UL><LI>B: a directory containing a 6001 files, one for every length in the      range 0 to 6000 bytes.  The files contain random lowercase      letters.  18.7 megabytes.<LI>H: my home directory tree.  Documents, source code, mail files,      compressed data.  H contains B, and also a directory of       files designed as boundary cases for the sorting; mostly very      repetitive, nasty files.  445 megabytes.<LI>A: directory tree holding various applications built from source:      <CODE>egcs-1.0.2</CODE>, <CODE>gcc-2.8.1</CODE>, KDE Beta 4, GTK, Octave, etc.      827 megabytes.<LI>P: directory tree holding large amounts of source code (<CODE>.tar</CODE>      files) of the entire GNU distribution, plus a couple of      Linux distributions.  2400 megabytes.</UL><P>The tests conducted are as follows.  Each test means compressing (a copy of) each file in the data set, decompressing it andcomparing it against the original.</P><P>First, a bunch of tests with block sizes, internal buffersizes and randomisation lengths set very small, to detect any problems with theblocking, buffering and randomisation mechanisms.  This required modifying the source code so as to try to break it.<OL><LI>Data set H, with      buffer size of 1 byte, and block size of 23 bytes.<LI>Data set B, buffer sizes 1 byte, block size 1 byte.<LI>As (2) but small-mode decompression (first 1700 files).<LI>As (2) with block size 2 bytes.<LI>As (2) with block size 3 bytes.<LI>As (2) with block size 4 bytes.<LI>As (2) with block size 5 bytes.<LI>As (2) with block size 6 bytes and small-mode decompression.<LI>H with normal buffer sizes (5000 bytes), normal block      size (up to 900000 bytes), but with randomisation      mechanism running intensely (randomising approximately every      third byte).<LI>As (9) with small-mode decompression.</OL><P>Then some tests with unmodified source code.<OL><LI>H, all settings normal.<LI>As (1), with small-mode decompress.<LI>H, compress with flag <CODE>-1</CODE>.<LI>H, compress with flag <CODE>-s</CODE>, decompress with flag <CODE>-s</CODE>.<LI>Forwards compatibility: H, <CODE>bzip2-0.1pl2</CODE> compressing,      <CODE>bzip2-0.9.0</CODE> decompressing, all settings normal.<LI>Backwards compatibility:  H, <CODE>bzip2-0.9.0</CODE> compressing,      <CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.<LI>Bigger tests: A, all settings normal.<LI>P, all settings normal.<LI>Misc test: about 100 megabytes of <CODE>.tar</CODE> files with      <CODE>bzip2</CODE> compiled with Purify.<LI>Misc tests to make sure it builds and runs ok on non-Linux/x86      platforms.</OL><P>These tests were conducted on a 205 MHz Cyrix 6x86MX machine, runningLinux 2.0.32.  They represent nearly a week of continuous computation.All tests completed successfully.</P><H2><A NAME="SEC39" HREF="manual_toc.html#TOC39">Further reading</A></H2><P><CODE>bzip2</CODE> is not research work, in the sense that it doesn't presentany new ideas.  Rather, it's an engineering exercise based on existingideas.</P><P>Four documents describe essentially all the ideas behind <CODE>bzip2</CODE>:<PRE>Michael Burrows and D. J. Wheeler:  "A block-sorting lossless data compression algorithm"   10th May 1994.    Digital SRC Research Report 124.   ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz   If you have trouble finding it, try searching at the   New Zealand Digital Library, http://www.nzdl.org.Daniel S. Hirschberg and Debra A. LeLewer  "Efficient Decoding of Prefix Codes"   Communications of the ACM, April 1990, Vol 33, Number 4.   You might be able to get an electronic copy of this      from the ACM Digital Library.David J. Wheeler   Program bred3.c and accompanying document bred3.ps.   This contains the idea behind the multi-table Huffman   coding scheme.   ftp://ftp.cl.cam.ac.uk/pub/user/djw3/Jon L. Bentley and Robert Sedgewick  "Fast Algorithms for Sorting and Searching Strings"   Available from Sedgewick's web page,   www.cs.princeton.edu/~rs</PRE><P>The following paper gives valuable additional insights into thealgorithm, but is not immediately the basis of any codeused in bzip2.<PRE>Peter Fenwick:   Block Sorting Text Compression   Proceedings of the 19th Australasian Computer Science Conference,     Melbourne, Australia.  Jan 31 - Feb 2, 1996.   ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</PRE><P>Kunihiko Sadakane's sorting algorithm, mentioned above,is available from:<PRE>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz</PRE><P>The Manber-Myers suffix array constructionalgorithm is described in a paperavailable from:<PRE>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps</PRE><P><HR><P>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_3.html">previous</A>, next, last section, <A HREF="manual_toc.html">table of contents</A>.</BODY></HTML>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -