📄 compressionchapter.txt
字号:
Major Technique: Compression
Version 04/12/99 18:11 - 7
How can you fit a quart of program and data into a pint pot of memory?
* The memory requirements of the code and data exceed the memory space in the system primary memory, secondary storage, read-only memory, or some combination.
* You need to transmit information across a communications link, and the memory requirements of the information exceed the capacity of the link.
* You cannot THINK SMALL and delete some of the data or code
* You cannot choose SUITABLE DATA STRUCTURES to reduce the memory requirements further
Sometimes, when you're programming for a small system, you just don't have enough memory to go around. Most often, the program needs to store more data than the space available, but sometimes the executable code is too large to be stored, especially if there's a lot of program data too.
The other chapters in the book consider ways to reduce our program's requirements for main memory. For example, choosing a suitable DATA STRUCTURE ensures that the right amount of memory is allocated to store the data; SECONDARY STORAGE and READ-ONLY STORAGE move the data out of RAM memory and store it elsewhere. These techniques have one main limitation: they don't reduce the total amount of storage - of whatever kind - needed to store the underlying data. Rather, they tailor memory allocation to reduce the amount of memory allocated in the program, or displace information that cannot be stored in main memory to secondary storage. Tailoring memory allocation doesn't help if there really isn't enough memory to go around, and displacement may simply move the space problems from primary to secondary storage.
For example, the Strap-It-On wrist-mounted PC needs to store the data for the documents the user is working on. It also needs sound files recorded by the internal microphone, data traces from optional body well-being monitors, and a large amount of executable code downloaded by the user to support "optional applications" (Tetris, Space Invaders, Qix, and Hunt-the-Wumpus). This information exceeds the capacity of the Strap-It-On's primary memory, and will sorely test its secondary memory.
In fact, no matter how much memory a system has, we can guarantee there will be users who need more. Extra storage is expensive, so we're always under pressure to use what we have as effectively as possible.
Therefore: Use compression to reduce the memory required to store the information.
Store the information in a compressed form and decompress when you need to access it. There are a wide variety of compression algorithms and approaches you can choose from, each with different space and time trade-offs. In particular there are smaller ad-hoc techniques such as run length encoding and byte codes, and larger techniques such as adaptive compression algorithms.
For example, the Strap-It-On PC uses a variety of compression techniques to reduce its memory requirements. The optional applications are stored as bytecodes, the sound files and data traces are stored using sequence code, and document texts are stored using string compression. The device drivers for Strap-It-On's secondary storage use adaptive file compression algorithms to compress all of the user's data files.
Consequences
Your memory requirements decrease because some code or data is compressed.
However: the compression scheme will make the program more complex, requiring programmer effort to implement and discipline to use.
The decompression process reduces time performance and may require extra temporary memory, increasing the possibilities for failure.
The compression process also takes time, although this time can often be spent when the program is written rather than at run-time.
Compressed information may more difficult to process from within the program. Some compression techniques permit only sequential access to the compressed information, so they cannot be used if the information must be accessed randomly. Even purely sequential access is impractical with some more extreme compression algorithms, so the compressed information must be decompressed before it can be accessed, requiring enough temporary memory to store all the decompressed information, in addition to the temporary memory needed by the decompression algorithm,
Implementation
The key idea behind compression is that most program data contains a large amount of redundancy - information that is included in the data set but which is not strictly required. Character sets commonly used to encode Western languages are classic examples. The ASCII code defines around 100 printable characters, all of which could be stored in seven bits, but most text files use eight-bit bytes to facilitate processing on eight-bit machines. If the eight-bit bytes could be replaced by seven-bit bytes, this would reduce the number of bits needed to store a text string. The amount of compression is expressed as the compression ratio - the compressed size divided by the decompressed size. Storing characters in seven bit-bytes rather than eight-bit bytes would give a compression ration of 7/8 or 87.5%.
Techniques that ensure that the result of decompressing some compressed information is exactly the same as the original information (before compression) are known as lossless compression techniques. Lossless compression works by removing redundancy; either removing it altogether, as above, or by storing redundant information once, and replacing further copies with references to the previously stored version.
Lossy compression techniques are sometimes suitable alternatives. Lossy compression produces an approximation to the original information, rather than an exact copy. So it is important that the approximations do not affect the use to which the decompressed data will be put. We can think of lossy compression as making a generalisation of the original input file to increase its redundancy, then compressing the generalised version. The compression patterns in this chapter are lossless; however, several can be adapted to provide lossy compression when it is required.
Specialised Patterns
The rest of this chapter contains six specialised patterns that describe a range of compression techniques. The patterns form a sequence starting with simple patterns which can be implemented without too much programmer effort or time or space costs and progressing to more complicated patterns. Each of these patterns removes different kinds of redundancy from programs or data, in different ways, and with different consequences for accessing the compressed data.
Figure 1: Compression Patterns
The patterns are as follows:
* STRING COMPRESSION reduces the number of bits used to store each character of a string of text. Most character encodings are partially redundant, because they can store many more characters than are really needed. For example, English text requires only about seventy characters, but commonly used character sets support 120, 256, or even sixty-five thousand characters. Reducing the number of bits used to store common characters reduces the memory required for storing the whole text.
* SEQUENCE COMPRESSION is similar to string compression, but addresses data series or sequences, rather than text strings. Rather that storing the absolute value of each data item, Sequence Compression stores deltas - differences between the current item and the previous item. If the values of the series of data items are similar across the series, storing relative item values will require less memory than the storing the absolute value of each item. Furthermore, if many items have exactly the same absolute value, storing the value repeatedly is redundant, so sequence compression stores a count of the number of items.
* FILE COMPRESSION can store large amounts of bulk data very efficiently. File compression is based on adaptive compression algorithms that find common sequences across entire data sets, and then store these sequences only once, typically working at the level of individual bits to remove many kinds of redundancy. File compression algorithms need significant programmer effort to implement (and high levels of specialised skills to implement well), however many effective adaptive compression algorithms are available in commercial or open source libraries. Compared with other compression techniques, adaptive compression algorithms have high processor time and temporary memory costs.
* BYTE CODING can store program code in a form that is still executable. Real machine codes include a large amount of redundancy to ensure that they can be efficiently decoded and implemented in hardware. Bytecodes are instruction sets designed to be executed in software, and to minimise memory requirements. Typically you use an interpreter to execute Byte codes, although some environments translate them to machine code, an approach which takes much more programmer effort. Bytecodes can also increase the portability of software, because the interpreter insulates the program from the architecture of the underlying machine - any machine with a suitable interpreter can execute the software.
The figures below contrast these patterns in two different ways. One shows typical memory gains you might from using each pattern as against the amount of programming required. The other contrasts the local memory required to decode the compression as against the compression achieved. As they show, you get the greatest benefits from using File Compression, but the run-time overhead required for each use often makes it unsuitable in practice.
See Also
Rather than (or as well as) compressing information, you may be able to store it in SECONDARY STORAGE. On the other hand, THINKING SMALL, making a MEMORY BUDGET, and MAKING THE USER WORRY may make your program small enough that you don't need to use compression.
Known Uses
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -