📄 manual_4.html
字号:
<HTML><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><!-- Created on January, 5 2002 by texi2html 1.64 --><!-- Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author) Karl Berry <karl@freefriends.org> Olaf Bachmann <obachman@mathematik.uni-kl.de> and many others.Maintained by: Olaf Bachmann <obachman@mathematik.uni-kl.de>Send bugs and suggestions to <texi2html@mathematik.uni-kl.de> --><HEAD><TITLE>Untitled Document: 4. Miscellanea</TITLE><META NAME="description" CONTENT="Untitled Document: 4. Miscellanea"><META NAME="keywords" CONTENT="Untitled Document: 4. Miscellanea"><META NAME="resource-type" CONTENT="document"><META NAME="distribution" CONTENT="global"><META NAME="Generator" CONTENT="texi2html 1.64"></HEAD><BODY LANG="" BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#800080" ALINK="#FF0000"><A NAME="SEC43"></A><TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0><TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_3.html#SEC42"> < </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC44"> > </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD></TR></TABLE><H1> 4. Miscellanea </H1><!--docid::SEC43::--><P>These are just some random thoughts of mine. Your mileage mayvary.</P><P><HR SIZE="6"><A NAME="SEC44"></A><TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0><TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC43"> < </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC45"> > </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD></TR></TABLE><H2> 4.1 Limitations of the compressed file format </H2><!--docid::SEC44::--><CODE>bzip2-1.0</CODE>, <CODE>0.9.5</CODE> and <CODE>0.9.0</CODE>use exactly the same file format as the previousversion, <CODE>bzip2-0.1</CODE>. This decision was made in the interests ofstability. Creating yet another incompatible compressed file formatwould create further confusion and disruption for users.<P>Nevertheless, this is not a painless decision. Developmentwork since the release of <CODE>bzip2-0.1</CODE> in August 1997has shown complexities in the file format which slow downdecompression and, in retrospect, are unnecessary. These are:<UL><LI>The run-length encoder, which is the first of the compression transformations, is entirely irrelevant. The original purpose was to protect the sorting algorithm from the very worst case input: a string of repeated symbols. But algorithm steps Q6a and Q6b in the original Burrows-Wheeler technical report (SRC-124) show how repeats can be handled without difficulty in block sorting.<LI>The randomisation mechanism doesn't really need to be there. Udi Manber and Gene Myers published a suffix array construction algorithm a few years back, which can be employed to sort any block, no matter how repetitive, in O(N log N) time. Subsequent work by Kunihiko Sadakane has produced a derivative O(N (log N)^2) algorithm which usually outperforms the Manber-Myers algorithm.<P> I could have changed to Sadakane's algorithm, but I find it to be slower than <CODE>bzip2</CODE>'s existing algorithm for most inputs, and the randomisation mechanism protects adequately against bad cases. I didn't think it was a good tradeoff to make. Partly this is due to the fact that I was not flooded with email complaints about <CODE>bzip2-0.1</CODE>'s performance on repetitive data, so perhaps it isn't a problem for real inputs.</P><P> Probably the best long-term solution, and the one I have incorporated into 0.9.5 and above, is to use the existing sorting algorithm initially, and fall back to a O(N (log N)^2) algorithm if the standard algorithm gets into difficulties.<LI>The compressed file format was never designed to be handled by a library, and I have had to jump though some hoops to produce an efficient implementation of decompression. It's a bit hairy. Try passing <CODE>decompress.c</CODE> through the C preprocessor and you'll see what I mean. Much of this complexity could have been avoided if the compressed size of each block of data was recorded in the data stream.<LI>An Adler-32 checksum, rather than a CRC32 checksum, would be faster to compute.</UL>It would be fair to say that the <CODE>bzip2</CODE> format was frozenbefore I properly and fully understood the performanceconsequences of doing so.<P>Improvements which I was able to incorporate into0.9.0, despite using the same file format, are:<UL><LI>Single array implementation of the inverse BWT. This significantly speeds up decompression, presumably because it reduces the number of cache misses.<LI>Faster inverse MTF transform for large MTF values. The new implementation is based on the notion of sliding blocks of values.<LI><CODE>bzip2-0.9.0</CODE> now reads and writes files with <CODE>fread</CODE> and <CODE>fwrite</CODE>; version 0.1 used <CODE>putc</CODE> and <CODE>getc</CODE>. Duh! Well, you live and learn.<P></UL>Further ahead, it would be nice to be able to do random access into files. This will require some careful design of compressed file formats.<P><HR SIZE="6"><A NAME="SEC45"></A><TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0><TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC44"> < </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC46"> > </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD></TR></TABLE><H2> 4.2 Portability issues </H2><!--docid::SEC45::-->After some consideration, I have decided not to useGNU <CODE>autoconf</CODE> to configure 0.9.5 or 1.0.<P><CODE>autoconf</CODE>, admirable and wonderful though it is, mainly assists with portability problems between Unix-likeplatforms. But <CODE>bzip2</CODE> doesn't have much in the wayof portability problems on Unix; most of the difficulties appearwhen porting to the Mac, or to Microsoft's operating systems.<CODE>autoconf</CODE> doesn't help in those cases, and brings in a whole load of new complexity.</P><P>Most people should be able to compile the library and programunder Unix straight out-of-the-box, so to speak, especially if you have a version of GNU C available.</P><P>There are a couple of <CODE>__inline__</CODE> directives in the code. GNU C(<CODE>gcc</CODE>) should be able to handle them. If you're not usingGNU C, your C compiler shouldn't see them at all.If your compiler does, for some reason, see them and doesn'tlike them, just <CODE>#define</CODE> <CODE>__inline__</CODE> to be <CODE>/* */</CODE>. Oneeasy way to do this is to compile with the flag <CODE>-D__inline__=</CODE>, which should be understood by most Unix compilers.</P><P>If you still have difficulties, try compiling with the macro<CODE>BZ_STRICT_ANSI</CODE> defined. This should enable you to build thelibrary in a strictly ANSI compliant environment. Building the programitself like this is dangerous and not supported, since you remove<CODE>bzip2</CODE>'s checks against compressing directories, symbolic links,devices, and other not-really-a-file entities. This could causefilesystem corruption!</P><P>One other thing: if you create a <CODE>bzip2</CODE> binary for publicdistribution, please try and link it statically (<CODE>gcc -s</CODE>). Thisavoids all sorts of library-version issues that others may encounterlater on.</P><P>If you build <CODE>bzip2</CODE> on Win32, you must set <CODE>BZ_UNIX</CODE> to 0 and<CODE>BZ_LCCWIN32</CODE> to 1, in the file <CODE>bzip2.c</CODE>, before compiling.Otherwise the resulting binary won't work correctly.</P><P><HR SIZE="6"><A NAME="SEC46"></A><TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0><TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC45"> < </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC47"> > </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD></TR></TABLE><H2> 4.3 Reporting bugs </H2><!--docid::SEC46::-->I tried pretty hard to make sure <CODE>bzip2</CODE> isbug free, both by design and by testing. Hopefullyyou'll never need to read this section for real.<P>Nevertheless, if <CODE>bzip2</CODE> dies with a segmentationfault, a bus error or an internal assertion failure, itwill ask you to email me a bug report. Experience withversion 0.1 shows that almost all these problems canbe traced to either compiler bugs or hardware problems.<UL><LI>Recompile the program with no optimisation, and see if itworks. And/or try a different compiler.I heard all sorts of stories about various flavoursof GNU C (and other compilers) generating bad code for<CODE>bzip2</CODE>, and I've run across two such examples myself.<P>2.7.X versions of GNU C are known to generate bad code fromtime to time, at high optimisation levels. If you get problems, try using the flags<CODE>-O2</CODE> <CODE>-fomit-frame-pointer</CODE> <CODE>-fno-strength-reduce</CODE>.You should specifically <EM>not</EM> use <CODE>-funroll-loops</CODE>.</P><P>You may notice that the Makefile runs six tests as part ofthe build process. If the program passes all of these, it'sa pretty good (but not 100%) indication that the compiler hasdone its job correctly.<LI>If <CODE>bzip2</CODE> crashes randomly, and the crashes are notrepeatable, you may have a flaky memory subsystem. <CODE>bzip2</CODE>really hammers your memory hierarchy, and if it's a bit marginal,you may get these problems. Ditto if your disk or I/O subsystemis slowly failing. Yup, this really does happen.<P>Try using a different machine of the same type, and see ifyou can repeat the problem.<LI>This isn't really a bug, but ... If <CODE>bzip2</CODE> tellsyou your file is corrupted on decompression, and youobtained the file via FTP, there is a possibility that youforgot to tell FTP to do a binary mode transfer. That absolutelywill cause the file to be non-decompressible. You'll have to transferit again.</UL><P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -