📄 lzwexp.htm

📁 刚刚看到本站有Visual C++数字图象处理（人民邮电出版社）的电子书
💻 HTM
📖 第 1 页 / 共 2 页
字号:
上一页 12
codestream that we can't translate into a string.

      we need to define something called a &quot;current code&quot;, which i will refer

to as &quot;&lt;code&gt;&quot;, and an &quot;old-code&quot;, which i will refer to as &quot;&lt;old&gt;&quot;. to start

things off, look at the first code. this is now &lt;code&gt;. this code will be in

the intialized string table as the code for a root. output the root to the

charstream. make this code the old-code &lt;old&gt;. *now look at the next code, and

make it &lt;code&gt;. it is possible that this code will not be in the string table,

but let's assume for now that it is. output the string corresponding to &lt;code&gt;

to the codestream. now find the first character in the string you just

translated. call this k. add this to the prefix [...] generated by &lt;old&gt; to

form a new string [...]k. add this string [...]k to the string table, and set

the old-code &lt;old&gt; to the current code &lt;code&gt;. repeat from where i typed the

asterisk, and you're all set. read this paragraph again if you just skimmed

it!!!  now let's consider the possibility that &lt;code&gt; is not in the string

table. think back to compression, and try to understand what happens when you

have a string like p[...]p[...]pq appear in the charstream. suppose p[...] is

already in the string table, but p[...]p is not. the compressor will parse out

p[...], and find that p[...]p is not in the string table. it will output the

code for p[...], and add p[...]p to the string table. then it will get up to

p[...]p for the next string, and find that p[...]p is in the table, as

     the code just added. so it will output the code for p[...]p if it finds

that p[...]pq is not in the table. the decompressor is always &quot;one step

behind&quot; the compressor. when the decompressor sees the code for p[...]p, it

will not have added that code to it's string table yet because it needed the

beginning character of p[...]p to add to the string for the last code, p[...],

to form the code for p[...]p. however, when a decompressor finds a code that

it doesn't know yet, it will always be the very next one to be added to the

string table. so it can guess at what the string for the code should be, and,

in fact, it will always be correct. if i am a decompressor, and i see

code#124, and yet my string table has entries only up to code#123, i can

figure out what code#124 must be, add it to my string table, and output the

string. if code#123 generated the string, which i will refer to here as a

prefix, [...], then code#124, in this special case, will be [...] plus the

first character of [...]. so just add the first character of [...] to the end

of itself. not too bad.  as an example (and a very common one) of this special

case, let's assume we have a raster image in which the first three pixels have

the same color value. that is, my charstream looks like: qqq.... for the sake

of argument, let's say we have 32 colors, and q is the color#12. the

compressor will generate the code sequence 12,32,.... (if you don't know why,

take a minute to understand it.) remember that #32 is not in the initial

table, which goes from #0 to #31. the decompressor will see #12 and translate

it just fine as color q. then it will see #32 and not yet know what that

means. but if it thinks about it long enough, it can figure out that qq should

be entry#32 in the table and qq should be the next string output.  so the

decompression pseudo-code goes something like:</pre>

    <pre>      [1] initialize string table;

     [2] get first code: &lt;code&gt;;

     [3] output the string for &lt;code&gt; to the charstream;

     [4] &lt;old&gt; = &lt;code&gt;;

     [5] &lt;code&gt; &lt;- next code in codestream;

     [6] does &lt;code&gt; exist in the string table?

      (yes: output the string for &lt;code&gt; to the charstream;

            [...] &lt;- translation for &lt;old&gt;;

            k &lt;- first character of translation for &lt;code&gt;;

            add [...]k to the string table;        &lt;old&gt; &lt;- &lt;code&gt;;  )

      (no: [...] &lt;- translation for &lt;old&gt;;

           k &lt;- first character of [...];

           output [...]k to charstream and add it to string table;

           &lt;old&gt; &lt;- &lt;code&gt;

      )

     [7] go to [5];</pre>

    <pre>      again, when you get to step [5] and there are no more codes, you're

finished.  outputting of strings, and finding of initial characters in strings

are efficiency problems all to themselves, but i'm not going to suggest ways

to do them here. half the fun of programming is figuring these things out!

      ---

      now for the gif variations on the theme. in part of the header of a gif

file, there is a field, in the raster data stream, called &quot;code size&quot;. this is

a very misleading name for the field, but we have to live with it. what it is

really is the &quot;root size&quot;. the actual size, in bits, of the compression codes

actually changes during compression/decompression, and i will refer to that

size here as the &quot;compression size&quot;. the initial table is just the codes for

all the roots, as usual, but two special codes are added on top of those.

suppose you have a &quot;code size&quot;, which is usually the number of bits per pixel

in the image, of n. if the number of bits/pixel is one, then n must be 2: the

roots take up slots #0 and #1 in the initial table, and the two special codes

will take up slots #4 and #5. in any other case, n is the number of bits per

pixel, and the roots take up slots #0 through #(2**n-1), and the special codes

are (2**n) and (2**n + 1). the initial compression size will be n+1 bits per

code. if you're encoding, you output the codes (n+1) bits at a time to start

with, and if you're decoding, you grab (n+1) bits from the codestream at a

time.  as for the special codes: &lt;cc&gt; or the clear code, is (2**n), and &lt;eoi&gt;,

or end-of-information, is (2**n + 1). &lt;cc&gt; tells the compressor to re-

initialize the string table, and to reset the compression size to (n+1). &lt;eoi&gt;

means there's no more in the codestream.  if you're encoding or decoding, you

should start adding things to the string table at &lt;cc&gt; + 2. if you're

encoding, you should output &lt;cc&gt; as the very first code, and then whenever

after that you reach code #4095 (hex fff), because gif does not allow

compression sizes to be greater than 12 bits. if you're decoding, you should

reinitialize your string table when you observe &lt;cc&gt;.  the variable

compression sizes are really no big deal. if you're encoding, you start with a

compression size of (n+1) bits, and, whenever you output the code

(2**(compression size)-1), you bump the compression size up one bit. so the

next code you output will be one bit longer. remember that the largest

compression size is 12 bits, corresponding to a code of 4095. if you get that

far, you must output &lt;cc&gt; as the next code, and start over.  if you're

decoding, you must increase your compression size as soon as you write entry

#(2**(compression size) - 1) to the string table. the next code you read will

be one bit longer. don't make the mistake of waiting until you need to add the

code (2**compression size) to the table. you'll have already missed a bit from

the last code.  the packaging of codes into a bitsream for the raster data is

also a potential stumbling block for the novice encoder or decoder. the lowest

order bit in the code should coincide with the lowest available bit in the

first available byte in the codestream. for example, if you're starting with

5-bit compression codes, and your first three codes are, say, &lt;abcde&gt;,

&lt;fghij&gt;, &lt;klmno&gt;, where e, j, and o are bit#0, then your codestream will start

off like:</pre>

    <pre>       byte#0: hijabcde

       byte#1: .klmnofg</pre>

    <pre>      so the differences between straight lzw and gif lzw are: two additional

special codes and variable compression sizes. if you understand lzw, and you

understand those variations, you understand it all!

      just as sort of a p.s., you may have noticed that a compressor has a

little bit of flexibility at compression time. i specified a &quot;greedy&quot; approach

to the compression, grabbing as many characters as possible before outputting

codes. this is, in fact, the standard lzw way of doing things, and it will

yield the best compression ratio. but there's no rule saying you can't stop

anywhere along the line and just output the code for the current prefix,

whether it's already in the table or not, and add that string plus the next

character to the string table. there are various reasons for wanting to do

this, especially if the strings get extremely long and make hashing difficult.

if you need to, do it.

      hope this helps out.----steve blackstock</pre>

    </td>

  </tr>

</table>

</center></div>



<p align="center"><a href="../index.htm">返回</a></p>

</body>

</html>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -