📄 manual_3.html
字号:
<HTML><HEAD><!-- This HTML file has been created by texi2html 1.51 from manual.texi on 22 October 1998 --><TITLE>bzip2 and libbzip2 - Programming with libbzip2</TITLE></HEAD><BODY>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_2.html">previous</A>, <A HREF="manual_4.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.<P><HR><P><H1><A NAME="SEC3" HREF="manual_toc.html#TOC3">Programming with <CODE>libbzip2</CODE></A></H1><P>This chapter describes the programming interface to <CODE>libbzip2</CODE>.</P><P>For general background information, particularly about memoryuse and performance aspects, you'd be well advised to read Chapter 2as well.</P><H2><A NAME="SEC4" HREF="manual_toc.html#TOC4">Top-level structure</A></H2><P><CODE>libbzip2</CODE> is a flexible library for compressing and decompressingdata in the <CODE>bzip2</CODE> data format. Although packaged as a singleentity, it helps to regard the library as three separate parts: the lowlevel interface, and the high level interface, and some utilityfunctions.</P><P>The structure of <CODE>libbzip2</CODE>'s interfaces is similar tothat of Jean-loup Gailly's and Mark Adler's excellent <CODE>zlib</CODE> library.</P><H3><A NAME="SEC5" HREF="manual_toc.html#TOC5">Low-level summary</A></H3><P>This interface provides services for compressing and decompressingdata in memory. There's no provision for dealing with files, streamsor any other I/O mechanisms, just straight memory-to-memory work.In fact, this part of the library can be compiled without inclusionof <CODE>stdio.h</CODE>, which may be helpful for embedded applications.</P><P>The low-level part of the library has no global variables andis therefore thread-safe.</P><P>Six routines make up the low level interface: <CODE>bzCompressInit</CODE>, <CODE>bzCompress</CODE>, and <BR> <CODE>bzCompressEnd</CODE>for compression,and a corresponding trio <CODE>bzDecompressInit</CODE>, <BR> <CODE>bzDecompress</CODE>and <CODE>bzDecompressEnd</CODE> for decompression. The <CODE>*Init</CODE> functions allocatememory for compression/decompression and do otherinitialisations, whilst the <CODE>*End</CODE> functions close down operationsand release memory.</P><P>The real work is done by <CODE>bzCompress</CODE> and <CODE>bzDecompress</CODE>. These compress/decompress data from a user-supplied input bufferto a user-supplied output buffer. These buffers can be any size;arbitrary quantities of data are handled by making repeated callsto these functions. This is a flexible mechanism allowing a consumer-pull style of activity, or producer-push, or a mixture ofboth.</P><H3><A NAME="SEC6" HREF="manual_toc.html#TOC6">High-level summary</A></H3><P>This interface provides some handy wrappers around the low-levelinterface to facilitate reading and writing <CODE>bzip2</CODE> formatfiles (<CODE>.bz2</CODE> files). The routines provide hooks to facilitatereading files in which the <CODE>bzip2</CODE> data stream is embedded within some larger-scale file structure, or where there aremultiple <CODE>bzip2</CODE> data streams concatenated end-to-end.</P><P>For reading files, <CODE>bzReadOpen</CODE>, <CODE>bzRead</CODE>, <CODE>bzReadClose</CODE>and <CODE>bzReadGetUnused</CODE> are supplied. For writing files,<CODE>bzWriteOpen</CODE>, <CODE>bzWrite</CODE> and <CODE>bzWriteFinish</CODE> areavailable.</P><P>As with the low-level library, no global variables are usedso the library is per se thread-safe. However, if I/O errorsoccur whilst reading or writing the underlying compressed files,you may have to consult <CODE>errno</CODE> to determine the cause ofthe error. In that case, you'd need a C library which correctlysupports <CODE>errno</CODE> in a multithreaded environment.</P><P>To make the library a little simpler and more portable,<CODE>bzReadOpen</CODE> and <CODE>bzWriteOpen</CODE> require you to pass them filehandles (<CODE>FILE*</CODE>s) which have previously been opened for reading orwriting respectively. That avoids portability problems associated withfile operations and file attributes, whilst not being much of animposition on the programmer.</P><H3><A NAME="SEC7" HREF="manual_toc.html#TOC7">Utility functions summary</A></H3><P>For very simple needs, <CODE>bzBuffToBuffCompress</CODE> and<CODE>bzBuffToBuffDecompress</CODE> are provided. These compressdata in memory from one buffer to another buffer in a singlefunction call. You should assess whether these functionsfulfill your memory-to-memory compression/decompressionrequirements before investing effort in understanding the moregeneral but more complex low-level interface.</P><P>Yoshioka Tsuneo (<CODE>QWF00133@niftyserve.or.jp</CODE> /<CODE>tsuneo-y@is.aist-nara.ac.jp</CODE>) has contributed some functions togive better <CODE>zlib</CODE> compatibility. These functions are<CODE>bzopen</CODE>, <CODE>bzread</CODE>, <CODE>bzwrite</CODE>, <CODE>bzflush</CODE>,<CODE>bzclose</CODE>,<CODE>bzerror</CODE> and <CODE>bzlibVersion</CODE>. You may find these functionsmore convenient for simple file reading and writing, than those in thehigh-level interface. These functions are not (yet) officially part ofthe library, and are not further documented here. If they break, youget to keep all the pieces. I hope to document them properly when timepermits.</P><P>Yoshioka also contributed modifications to allow the library to bebuilt as a Windows DLL.</P><H2><A NAME="SEC8" HREF="manual_toc.html#TOC8">Error handling</A></H2><P>The library is designed to recover cleanly in all situations, includingthe worst-case situation of decompressing random data. I'm not 100% sure that it can always do this, so you might want to adda signal handler to catch segmentation violations during decompressionif you are feeling especially paranoid. I would be interested inhearing more about the robustness of the library to corruptedcompressed data.</P><P>The file <CODE>bzlib.h</CODE> contains all definitions needed to usethe library. In particular, you should definitely not include<CODE>bzlib_private.h</CODE>.</P><P>In <CODE>bzlib.h</CODE>, the various return values are defined. The followinglist is not intended as an exhaustive description of the circumstances in which a given value may be returned -- those descriptions are givenlater. Rather, it is intended to convey the rough meaning of eachreturn value. The first five actions are normal and not intended to denote an error situation.<DL COMPACT><DT><CODE>BZ_OK</CODE><DD>The requested action was completed successfully.<DT><CODE>BZ_RUN_OK</CODE><DD><DT><CODE>BZ_FLUSH_OK</CODE><DD><DT><CODE>BZ_FINISH_OK</CODE><DD>In <CODE>bzCompress</CODE>, the requested flush/finish/nothing-special actionwas completed successfully.<DT><CODE>BZ_STREAM_END</CODE><DD>Compression of data was completed, or the logical stream end wasdetected during decompression.</DL><P>The following return values indicate an error of some kind.<DL COMPACT><DT><CODE>BZ_SEQUENCE_ERROR</CODE><DD>When using the library, it is important to call the functions in thecorrect sequence and with data structures (buffers etc) in the correctstates. <CODE>libbzip2</CODE> checks as much as it can to ensure this ishappening, and returns <CODE>BZ_SEQUENCE_ERROR</CODE> if not. Code whichcomplies precisely with the function semantics, as detailed below,should never receive this value; such an event denotes buggy codewhich you should investigate.<DT><CODE>BZ_PARAM_ERROR</CODE><DD>Returned when a parameter to a function call is out of range or otherwise manifestly incorrect. As with <CODE>BZ_SEQUENCE_ERROR</CODE>,this denotes a bug in the client code. The distinction between<CODE>BZ_PARAM_ERROR</CODE> and <CODE>BZ_SEQUENCE_ERROR</CODE> is a bit hazy, but still worthmaking.<DT><CODE>BZ_MEM_ERROR</CODE><DD>Returned when a request to allocate memory failed. Note that thequantity of memory needed to decompress a stream cannot be determineduntil the stream's header has been read. So <CODE>bzDecompress</CODE> and<CODE>bzRead</CODE> may return <CODE>BZ_MEM_ERROR</CODE> even though some ofthe compressed data has been read. The same is not true forcompression; once <CODE>bzCompressInit</CODE> or <CODE>bzWriteOpen</CODE> havesuccessfully completed, <CODE>BZ_MEM_ERROR</CODE> cannot occur.<DT><CODE>BZ_DATA_ERROR</CODE><DD>Returned when a data integrity error is detected during decompression.Most importantly, this means when stored and computed CRCs for thedata do not match. This value is also returned upon detection of anyother anomaly in the compressed data.<DT><CODE>BZ_DATA_ERROR_MAGIC</CODE><DD>As a special case of <CODE>BZ_DATA_ERROR</CODE>, it is sometimes useful toknow when the compressed stream does not start with the correctmagic bytes (<CODE>'B' 'Z' 'h'</CODE>). <DT><CODE>BZ_IO_ERROR</CODE><DD>Returned by <CODE>bzRead</CODE> and <CODE>bzRead</CODE> when there is an errorreading or writing in the compressed file, and by <CODE>bzReadOpen</CODE>and <CODE>bzWriteOpen</CODE> for attempts to use a file for which theerror indicator (viz, <CODE>ferror(f)</CODE>) is set.On receipt of <CODE>BZ_IO_ERROR</CODE>, the caller should consult<CODE>errno</CODE> and/or <CODE>perror</CODE> to acquire operating-systemspecific information about the problem.<DT><CODE>BZ_UNEXPECTED_EOF</CODE><DD>Returned by <CODE>bzRead</CODE> when the compressed file finishesbefore the logical end of stream is detected.<DT><CODE>BZ_OUTBUFF_FULL</CODE><DD>Returned by <CODE>bzBuffToBuffCompress</CODE> and<CODE>bzBuffToBuffDecompress</CODE> to indicate that the output datawill not fit into the output buffer provided.</DL><H2><A NAME="SEC9" HREF="manual_toc.html#TOC9">Low-level interface</A></H2><H3><A NAME="SEC10" HREF="manual_toc.html#TOC10"><CODE>bzCompressInit</CODE></A></H3><PRE>typedef struct { char *next_in; unsigned int avail_in; unsigned int total_in; char *next_out; unsigned int avail_out; unsigned int total_out; void *state; void *(*bzalloc)(void *,int,int); void (*bzfree)(void *,void *); void *opaque; } bz_stream;int bzCompressInit ( bz_stream *strm, int blockSize100k, int verbosity, int workFactor );</PRE><P>Prepares for compression. The <CODE>bz_stream</CODE> structureholds all data pertaining to the compression activity. A <CODE>bz_stream</CODE> structure should be allocated and initialisedprior to the call.The fields of <CODE>bz_stream</CODE>comprise the entirety of the user-visible data. <CODE>state</CODE>is a pointer to the private data structures required for compression.</P><P>Custom memory allocators are supported, via fields <CODE>bzalloc</CODE>, <CODE>bzfree</CODE>,and <CODE>opaque</CODE>. The value <CODE>opaque</CODE> is passed to as the first argument toall calls to <CODE>bzalloc</CODE> and <CODE>bzfree</CODE>, but is otherwise ignored by the library.The call <CODE>bzalloc ( opaque, n, m )</CODE> is expected to return a pointer <CODE>p</CODE> to<CODE>n * m</CODE> bytes of memory, and <CODE>bzfree ( opaque, p )</CODE> should freethat memory.</P><P>If you don't want to use a custom memory allocator, set <CODE>bzalloc</CODE>, <CODE>bzfree</CODE> and<CODE>opaque</CODE> to <CODE>NULL</CODE>, and the library will then use the standard <CODE>malloc</CODE>/<CODE>free</CODE>routines.</P><P>Before calling <CODE>bzCompressInit</CODE>, fields <CODE>bzalloc</CODE>, <CODE>bzfree</CODE> and <CODE>opaque</CODE> shouldbe filled appropriately, as just described. Upon return, the internalstate will have been allocated and initialised, and <CODE>total_in</CODE> and <CODE>total_out</CODE> will have been set to zero. These last two fields are used by the libraryto inform the caller of the total amount of data passed into and out ofthe library, respectively. You should not try to change them.</P><P>Parameter <CODE>blockSize100k</CODE> specifies the block size to be used forcompression. It should be a value between 1 and 9 inclusive, and theactual block size used is 100000 x this figure. 9 gives the bestcompression but takes most memory.</P><P>Parameter <CODE>verbosity</CODE> should be set to a number between 0 and 4inclusive. 0 is silent, and greater numbers give increasingly verbosemonitoring/debugging output. If the library has been compiled with<CODE>-DBZ_NO_STDIO</CODE>, no such output will appear for any verbositysetting.</P><P>Parameter <CODE>workFactor</CODE> controls how the compression phase behaveswhen presented with worst case, highly repetitive, input data.If compression runs into difficulties caused by repetitive data,some pseudo-random variations are inserted into the block, andcompression is restarted. Lower values of <CODE>workFactor</CODE>reduce the tolerance of compression to repetitive data.You should set this parameter carefully; too low, and compression ratio suffers, too high, and your average-to-worstcase compression times can become very large. The default value of 30gives reasonable behaviour over a wide range of circumstances.</P><P>Allowable values range from 0 to 250 inclusive. 0 is a specialcase, equivalent to using the default value of 30.</P><P>Note that the randomisation process is entirely transparent. If the library decides to randomise and restart compression on ablock, it does so without comment. Randomised blocks areautomatically de-randomised during decompression, so dataintegrity is never compromised.</P><P>Possible return values:<PRE> <CODE>BZ_PARAM_ERROR</CODE> if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>blockSize</CODE> < 1 or <CODE>blockSize</CODE> > 9 or <CODE>verbosity</CODE> < 0 or <CODE>verbosity</CODE> > 4 or <CODE>workFactor</CODE> < 0 or <CODE>workFactor</CODE> > 250 <CODE>BZ_MEM_ERROR</CODE> if not enough memory is available <CODE>BZ_OK</CODE> otherwise</PRE><P>Allowable next actions:<PRE> <CODE>bzCompress</CODE> if <CODE>BZ_OK</CODE> is returned no specific action needed in case of error</PRE><H3><A NAME="SEC11" HREF="manual_toc.html#TOC11"><CODE>bzCompress</CODE></A></H3>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -