📄 406-456.html
字号:
* going to the end of the larger_child descendant chain.*/int FindNextNode( node )int node;{ int next; next = tree[ node ].smaller_child; while ( tree [ next ].larger_child != UNUSED ) next = tree[ next ].larger_child; return( next );}/** This routine performs the classic binary tree deletion algorithm.* If the node to be deleted has a null link in either direction, we* just pull the non-null link up one to replace the existing link.* If both links exist, we instead delete the next link in order, which* is guaranteed to have a null link, then replace the node to be deleted* with the next link.*/void DeleteString( p )int p;{ int replacement; if ( tree[ p ].parent == UNUSED ) return; if ( tree[ p ].larger_child == UNUSED ) ContractNode( p, tree [ p ].smaller_child ); else if (tree[ p ].smaller_child == UNUSED ) ContractNode( p , tree[ p ].larger_child ); else { replacement = FindNextNode( p ); DeleteString( replacement ); ReplaceNode( p , replacement ); }}/** This is where most of the work done by the encoder takes place. This* routine is responsible for adding the new node to the binary tree.* It also has to find the best match among all the existing nodes in* the tree, and return that to the calling routine. To make mattes* even more complicated, if the new_node has a duplicate in the tree,* the old_node is deleted, for reasons of efficiency.*/int AddString( new_node, match_position )int new_node;int *match_position;{ int i; int test_node; int delta; int match_length; int *child; if ( new_node == END_OF_STREAM ) return( 0 ) ; test_node = tree[ TREE_ROOT ].larger_child; match_length = 0; for ( ; ; ) { for ( i = 0 ; i < LOOK_AHEAD_SIZE ; i++ ) { delta = window[ MOD_WINDOW( new_node + 1 ) ] - window[ MOD_WINDOW( test_node + i ) ]; if ( delta != 0 ) break; } if ( i >= match_length ) { match_length = i; *match_position = test_node; if ( match_length >= LOOK_AHEAD_SIZE ) { ReplaceNode( test_node, new_node ); return( match_length ); } } if ( delta >= 0 ) child = &tree[ test_node ].larger_child; else child = &tree[ test_node ].smaller_child; if ( *child == UNUSED ) { *child = new_node; tree[ new_node].parent = test_node; tree[ new_node].larger_child = UNUSED; tree[ new_node].smaller_child = UNUSED; return( match_length ); } test_node = *child; }}/** This section of code and data makes up the blocked I/O portion of the* program. Every token output consists of a single flag bit, followed* by either a single character or a index/length pair. The flag bits* are stored in the first byte of a buffer array, and the characters* and index/length pairs are stored sequentially in the remaining* positions in the array. After every eight output operations, the* first character of the array is full of flag bits, so the remaining* bytes stored in the array can be output. This can be done with a* single fwrite() operation, making for greater efficiency.** All that is needed to implement this is a few routines, plus three* data objects, which follow below. The buffer has the flag bits* packed into its first character, with the remainder consisting of* the characters and index/length pairs, appearing in the order they* were output. The FlagBitMask is used to indicate where the next* flag bit will go when packed into DataBuffer[ 0 ]. Finally, the* BufferOffset is used to indicate where the next token will be stored* in the buffer.*/char DataBuffer[ 17 ];int FlagBitMask;int BufferOffset;/** To initialize the output buffer, we set the FlagBitMask to the first* bit position, can clear DataBuffer[0], which will hold all the* Flag bits. Finally, the BufferOffset is set to 1, which is where the* first character or index/length pair will go.*/void InitOutputBuffer(){ DataBuffer[ 0 ] = 0; FlagBitMask = 1; BufferOffset = 1;}/** This routine is called during one of two different situations. First,* it can potentially be called right after a character or a length/index* pair is added to the DataBuffer[]. If the position of the bit in the* FlagBitMask indicates that it is full, the output routine calls this* routine to flush data into the output file, and reset the output* variables to their initial state. The other time this routine is* called is when the compression routine is ready to exit. If there is* any data in the buffer at that time, it needs to the flushed.** Note that this routine checks carefully to be sure that it doesn't* ever write out more data than was in the original uncompressed file.* It returns a 0 if this happens, which filters back to the compression* program, so that it can abort if this happens.**/int FlushOutputBuffer(){ if ( BufferOffset == 1 ) return( 1 ); Header.compressed_size += BufferOffset; if ( ( Header.compressed_size ) >= Header.original_size ) return( 0 ); if ( fwrite( DataBuffer, 1, BufferOffset, OutputCarFile ) !=BufferOffset ) FatalError( "Error writing compressed data to CAR file" ); InitOutputBuffer(); return( 1 );}/** This routine adds a single character to the output buffer. In this* case, the flag bit is set, indicating that the next character is an* uncompressed byte. After setting the flag and storing the byte,* the flag bit is shifted over, and checked. If it turns out that all* eight bits in the flag bit character are used up, then we have to* flush the buffer and reinitialize the data. Note that if the* FlushOutputBuffer() routine detects that the output has grown larger* than the input, it returns a 0 back to the calling routine.*/int OutputChar( data )int data;{ DataBuffer[ BufferOffset++ ] = (char) data; DataBuffer[ 0 ] |= FlagBitMask; FlagBitMask <<= 1; if ( FlagBitMask == 0x100 ) return( FlushOutputBuffer() ); else return( 1 );}/** This routine is called to output a 12-bit position pointer and a 4-bit* length. The 4-bit length is shifted to the top four bits of the first* of two DataBuffer[] characters. The lower four bits contain the upper* four bits of the 12-bit position index. The next of the two DataBuffer* characters gets the lower eight bits of the position index. After* all that work to store those 16 bits, the FlagBitMask is shifted over,* and checked to see if we have used up all our bits. If we have,* the output buffer is flushed, and the output data elements are reset.* If the FlushOutputBuffer routine detects that the output file has* grown too large, it passes and error return back via this routine,* so that it can abort.*/int OutputPair( position, length )int position;int length;{ DataBuffer[ BufferOffset ] = (char) ( length << 4 ); DataBuffer[ BufferOffest++ ] |= ( position >> 8 ); DataBuffer[ BufferOffset++ ] = (char) ( position & Oxff ); FlagBitMask <<= 1; if ( FlagBitMask == 0x100 ) return( FlushOutputBuffer() ); else return( 1 );}/** The input process uses the same data structures as the blocked output* routines, but it is somewhat simpler, in that it doesn't actually have* to read in a whole block of data at once. Instead, it just reads in* a single character full of flag bits into DataBuffer[0], and passes* individual bits back to the Expansion program when asked for them.* The expansion program is left to its own devices for reading in the* characters, indices, and match lengths. They can be read in* sequentially using normal file I/0.*/void InitInputBuffer(){ FlagBitMask = 1; DataBuffer[ 0 ] = (char) getc( InputCarFile );}/** When the Expansion program wants a flag bit, it calls this routine.* This routine has to keep track of whether or not it has run out of* flag bits. If it has, it has to go back and reinitialize so as to* have a fresh set.*/int InputBit(){ if ( FlagBitMask == 0x00 ) InitInputBuffer(); FlagBitMask <<= 1; return( DataBuffer[ 0 ] & ( FlagBitMask >> 1 ) );}/** This is the compression routine. It has to first load up the look* ahead buffer, then go into the main compression loop. The main loop* decides whether to output a single character or an index/length* token that defines a phrase. Once the character or phrase has been* sent out, another loop has to run. The second loop reads in new* characters, deletes the strings that are overwritten by the new* character, then adds the strings that are created by the new* character. While running it has the additional responsibility of* creating the checksum of the input data, and checking for when the* output data grows too large. The program returns a success or failure* indicator. It also has to update the original_crc and compressed_size* elements in Header data structure.**/int LZSSCompress( input_text_file )FILE *input_text_file;{ int i; int c; int look_ahead_bytes; int current_position; int replace_count; int match_length; int match_position; Header.compressed_size = 0; Header.original_crc = CRC_MASK; InitOutputBuffer(); current_position = 1; for ( i = 0 ; i < LOOK_AHEAD_SIZE ; i++ ) { if ( ( c = getc( input_text_file ) ) == EOF ) break; window[ current_position + i ] = (unsigned char) c; Header.original_crc = UpdateCharacterCRC32( Header.original_crc, c ); } look_ahead_bytes = i; InitTree( current_position ); match_length = 0; match_position = 0; while ( look_ahead_bytes > 0 ) { if ( match_length > look_ahead_bytes ) match_length = look_ahead_bytes; if ( match_length <= BREAK_EVEN ) { replace_count = 1; if ( ! OutputChar( window[ current_position ] ) ) return( 0 ); } else { if ( !OutputPair( match_position, match_length - ( BREAK_EVEN + 1 ) ) ) return( 0 ): replace_count = match_length; } for ( i = 0 ; i < replace_count ; i++ ) { DeleteString( MOD_WINDOW( current_position + LOOK_AHEAD_SIZE ) ); if ( ( c = getc( input_text_file ) ) == EOF ) { look ahead bytes--; } else { Header.original_crc = UpdateCharacterCRC32( Header.original_crc, c ); window[ MOD_WINDOW( current_position + LOOK_AHEAD_SIZE ) ] = (unsigned char) c; } current_position = MOD_WINDOW( current_position + 1 ); if ( current_position == 0 ) putc( '.', stderr ); if ( look_ahead_bytes ) match_length = AddString( current_position, &match_position ); } }; Header.original_crc ^= CRC_MASK; return( FlushOutputBuffer() );}/** This is the expansion routine for the LZSS algorithm. All it has to do* is read in flag bits, decide whether to read in a character or a* index/length pair, and take the appropriate action. It is responsible* for keeping track of the crc of the output data, and must return it* to the calling routine, for verification.*/unsigned long LZSSExpand( output )FILE *output;{ int i; int current_position; int c; int match_length; int match_position; unsigned long crc; unsigned long output_count; output_count = 0; crc = CRC_MASK; InitInputBuffer(); current_position = 1; while ( output_count < Header.original_size ) { if ( InputBit() ) { c = getc( InputCarFile ); putc( c, output ); output_count++; crc = UpdateCharacterCRC32( crc, c ); window[ current_position ] = (unsigned char) c; current_position = MOD_WINDOW( current_position + 1 ); if ( current_position == 0 && output != stdout ) putc( '.', stderr ); } else { match_length = getc( InputCarFile ); match_position = getc( InputCarFile ); match_position |= (match_length & Oxf ) << 8; match_length >>= 4; match_length += BREAK_EVEN; output_count += match_length + 1; for ( i = 0 ; i <= match_length ; i++ ) { c = window[ MOD_WINDOW( match_position + i ) ]; putc( c, output ); crc = UpdateCharacterCRC32( crc, c ); window[ current_position ] = (unsigned char) c; current_position = MOD_WINDOW( current_position + 1 ); if ( current_position == 0 && output != stdout ) putc( '.', stderr ); } } } return( crc ^ CRC_MASK );}/*************************** End of CARMAN.C ***************************/</PRE><!-- END CODE //--><P><BR></P><CENTER><TABLE BORDER><TR><TD><A HREF="403-406.html">Previous</A></TD><TD><A HREF="../ewtoc.html">Table of Contents</A></TD><TD><A HREF="../ch13/457-459.html">Next</A></TD></TR></TABLE></CENTER></TD></TR></TABLE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -