📄 jpgalgo.txt

📁 jpeg格式的详细说明,有做这些方面的研究的朋友可以看看!
💻 TXT
📖 第 1 页 / 共 3 页
字号:
like (6,111001)
    45 , similar, will be coded as (6,101101)
    23  ->  (5,10111)
   -30  ->  (5,00001)
    -8  ->  (4,0111)
     1  ->  (1,1)

And now , we'll write again the string of pairs:

   (0,6), 111001 ; (0,6), 101101 ; (4,5), 10111; (1,5), 00001; (0,4) , 0111 ;
       (2,1), 1 ; (0,0)

The pairs of 2 values enclosed in bracket paranthesis, can be represented on a
byte because of the fact that each of the 2 values can be represented on a nibble
(the counter of previous zeroes is always smaller than 15 and so it is the
category of the numbers [numbers encoded in a JPG file are in range -32767..32767]).
In this byte, the high nibble represents the number of previous 0s, and the
lower nibble is the category of the new value different by 0.

The FINAL step of the encoding consists in Huffman encoding this byte, and then
writing in the JPG file, as a stream of bits, the Huffman code of this byte,
followed by the remaining bit-representation of that number.

For example, let's say that for byte 6 ( the equivalent of (0,6) ) we have a
Huffman code = 111000;
    for byte 69 = (4,5) (for example) we have 1111111110011001
             21 = (1,5)    ---  11111110110
             4  = (0,4)    ---  1011
             33 = (2,1)    ---  11011
              0 = EOB = (0,0) ---  1010

The final stream of bits written in the JPG file on disk for the previous example
of 63 coefficients (remember that we've skipped the first coefficient ) is
      111000 111001  111000 101101  1111111110011001 10111   11111110110 00001
         1011 0111   11011 1   1010


The encoding of the DC coefficient
-----------------------------------
DC is the coefficient in the quantized vector corresponding to the lowest
frequency in the image (it's the 0 frequency) , and (before quantization) is
mathematically = (the sum of 8x8 image samples) / 8 .
(It's like an average value for that block of image samples).
It is said that it contains a lot of energy present in the original 8x8 image
block. (Usually it gets large values).
The authors of the JPEG standard noticed that there's a very close connection
between the DC coefficient of consecutive blocks, so they've decided to encode
in the JPG file the difference between the DCs of consecutive 8x8 blocks
(Note: consecutive 8x8 blocks of the SAME image component, like consecutive
8x8 blocks for Y , or consecutive blocks for Cb , or for Cr)

Diff = DC  - DC
         i     (i-1)
So DC of the current block (DC  ) will be equal to :  DC  = DC    + Diff
                              i                         i     i-1

And in JPG decoding you will start from 0 -- you consider that the first
DC coefficient = 0 ;  DC  = 0
                        0
And then you'll add to the current value the value decoded from the JPG
(the Diff value)

SO, in the JPG file , the first coefficient = the DC coefficient is actually
the difference, and it is Huffman encoded DIFFERENTLY from the encoding of AC coefficients.

Here it is how it's done:
(Remember that we now code the Diff value)

Diff corresponds as you've seen before to a representation made by category and
it's bit coded representation.
In the JPG file it will be Huffman encoded only the category value, like this:

Diff = (category, bit-coded representation)
Then Diff will be coded as (Huffman_code(category) , bit-coded representation)

For example, if Diff is equal to -511 , then Diff  corresponds to
                    (9, 000000000)
Say that 9 has a Huffman code = 1111110
(In the JPG file, there are 2 Huffman tables for an image component: one for DC
and one for AC)

In the JPG file, the bits corresponding to the DC coefficient will be:
	       1111110 000000000
And,applied to this example of DC and to the previous example of ACs, for this
vector with 64 coefficients, THE FINAL STREAM OF BITS written in the JPG file
will be:

   1111110 000000000 111000 111001  111000 101101  1111111110011001 10111
       11111110110 00001 1011 0111   11011 1   1010

(In the JPG file , first it's encoded DC then ACs)


THE HUFFMAN DECODER (A brief summary) for the 64 coefficients (A Data Unit)
of an image component (For example Y)
-------------------------------------------------------------

So when you decode a stream of bits from the image in the JPG file, you'll do:

Init DC with 0.

1) First the DC coefficient decode :
	 a) Fetch a valid Huffman code (you check if it exists in the Huffman
                                           DC table)
         b) See at what category this Huffman code corresponds
         c) Fetch N = category bits  , and determine what value is represented
           by (category, the N bits fetched) = Diff
         d) DC + = Diff
         e) write DC in the 64 vector :      " vector[0]=DC "

2) The 63 AC coefficients decode :

------- FOR every AC coefficient UNTIL (EOB_encountered OR AC_counter=64)

       a) Fetch a valid Huffman code (check in the AC Huffman table)
       b) Decode that Huffman code : The Huffman code corresponds to
                   (nr_of_previous_0,category)
[Remember: EOB_encountered = TRUE if (nr_of_previous_0,category) = (0,0) ]

       c) Fetch N = category bits, and determine what value is represented by
              (category,the N bits fetched) = AC_coefficient
       d) Write in the 64 vector, a number of zeroes = nr_of_previous_zero
       e) increment the AC_counter with nr_of_previous_0
       f) Write AC_coefficient in the vector:
                  " vector[AC_counter]=AC_coefficient "
-----------------

Next Steps
-----------
So, now we have a 64 elements vector.We'll do the reverse of the steps presented
in this doc:

1) Dequantize the 64 vector : "for (i=0;i<=63;i++) vector[i]*=quant[i]"
2) Re-order from zig-zag the 64 vector into an 8x8 block
3) Apply the Inverse DCT transform to the 8x8 block

Repeat the upper process [ Huffman decoder, steps 1), 2) and 3)]  for every
8x8 block of every image component (Y,Cb,Cr).

4) Up-sample if it's needed
5) Level shift samples (add 128 to the all 8-bit signed values in the 8x8 blocks
resulting from the IDCT transform)
6) Tranform YCbCr to RGB

7--- And VOILA ... the JPG image


The JPEG markers and/or how it's organized the image information in the JPG file
(The Byte level)
--------------------------------------------------------------------------------
NOTE: The JPEG/JFIF file format uses Motorola format for words, NOT Intel format,
i.e. : high byte first, low byte last -- (ex: the word FFA0 will be written in
the JPEG file in the order : FF at the low offset , A0 at the higher offset)

The JPG standard specifies that the JPEG file is composed mostly of pieces called
"segments".
A segment is a stream of bytes with length <= 65535.The segment beginning is
specified with a marker.
A marker = 2 bytes beginning with 0xFF ( the C hexadecimal notation for 255),
and ending with a byte different by 0 and 0xFF.
Ex: 'FFDA' , 'FFC4', 'FFC0'.
Each marker has a meaning: the second byte (different by 0 and 0xFF) specifies
what does that marker.
For example, there is a marker which specifies that you should start the decoding
process , this is called (the JPG standard's terminology):
        SOS=Start Of Scan = 'FFDA'

Another marker called DQT = Define Quantization Table = 0xFFDB does what this
name says: specifies that in the JPG file, after the marker (and after 3 bytes,
more on this later) it will follow 64 bytes = the coefficients of the quantization
table.

If, during the processing of the JPG file, you encounter an 0xFF, then again a
a byte different by 0 (I've told you that the second byte for a marker is not 0)
and this byte has no marker meaning (you cannot find a marker corresponding to
that byte) then the 0xFF byte you encountered must be ignored and skipped.
(In some JPGS, sequences of consecutive 0xFF are for some filling purposes and
must be skipped)

You see that whenever you encounter 0xFF , you check the next byte and see if
that 0xFF you encountered has a marker meaning or must be skipped.
What happens if we actually need to encode the 0xFF byte in the JPG file
as an *usual* byte (not a marker, or a filling byte) ?
(Say that we need to write a Huffman code which begins with 11111111 (8 bits of
1) at a byte alignment)
The standard says that we simply make the next byte 0 , and write the sequence
'FF00' in the JPG file.
So when your JPG decoder meets the 2 byte 'FF00' sequence, it should consider
just a byte: 0xFF as an usual byte.

Another thing: You realise that these markers are byte aligned in the JPG file.
What happens if during your Huffman encoding and inserting bits in the JPG file's
bytes you have not finished to insert bits in a byte, but you need to write a
marker which is byte aligned ?
For the byte alignment of the markers, you SET THE REMAINING BITS UNTIL THE
BEGINNING OF THE NEXT BYTE TO 1, then you write the marker at the next byte.

A short explanation of some important markers found in a JPG file.
-------------------------------------------------------------------

SOI = Start Of Image = 'FFD8'
 This marker must be present in any JPG file *once* at the beginning of the file.
(Any JPG file starts with the sequence FFD8.)
EOI = End Of Image = 'FFD9'
  Similar to EOI: any JPG file ends with FFD9.

RSTi = FFDi (where i is in range 0..7)  [ RST0 = FFD0, RST7=FFD7]
     = Restart Markers
These restart markers are used for resync. At regular intervals, they appear
in the JPG stream of bytes, during the decoding process (after SOS)
(They appear in the order: RST0 -- interval -- RST1 -- interval -- RST2 --...
                      ...-- RST6 -- interval -- RST7 -- interval -- RST0 --...
)
(Obs: A lot of JPGs don't have restart markers)

The problem with these markers is that they interrupt the normal bit order in
the JPG's Huffman encoded bitstream.
Remember that for the byte alignment of the markers the remaining bits are set
to 1, so your decoder has to skip at regular intervals the useless filling
bits (those set with 1) and the RST markers.

-------
Markers...
-------
At the end of this doc, I've included a very well written technical explanation
of the JPEG/JFIF file format, written by Oliver Fromme, the author of the QPEG
viewer.
There you'll find a pretty good and complete definition for the markers.

But, anyway, here is a list of markers you should check:

SOF0 = Start Of Frame 0 = FFC0
SOS  = Start Of Scan    = FFDA
APP0 = it's the marker used to identify a JPG file which uses the JFIF
    specification       = FFE0
COM  = Comment          = FFFE
DNL  = Define Number of Lines    = FFDC
DRI  = Define Restart Interval   = FFDD
DQT  = Define Quantization Table = FFDB
DHT  = Define Huffman Table      = FFC4

The Huffman table stored in a JPG file
---------------------------------------
Here it is how JPEG implements the Huffman tree: instead of a tree, it defines
a table in the JPG file after the DHT (Define Huffman Table) marker.
NOTE: The length of the Huffman codes is restricted to 16 bits.

Basically there are 2 types of Huffman tables in a JPG file : one for DC and
one for AC (actually there are 4 Huffman tables: 2 for DC,AC of luminance
       and 2 for DC,AC of chrominance)

They are stored in the JPG file in the same format which consist of:
1) 16 bytes :

byte i contains the number of Huffman codes of length i (length in bits)
 i ranges from 1 to 16
                                         16
2) A table with the length (in bytes) =  sum nr_codes_of_length_i
                                         i=1

which contains at location [k][j]  (k in 1..16, j in 0..(nr_codes_with_length_k-1))
the BYTE value associated to the j-th Huffman code of length k.
(For a fixed length k, the values are stored sorted by the value of the Huffman
code)

From this table you can find the actual Huffman code associated to a particular
byte.
Here it is an example of how the actual code values are generated:

Ex:  (Note: The number of codes for a given length are here for this particular
      example to figure it out, they can have any other values)
SAY that,

         For length 1 we have nr_codes[1]=0, we skip this length
         For length 2 we have 2 codes  00
                                       01
         For length 3 we have 3 codes  100
                                       101
                                       110
         For length  4 we have 1 code  1110
         For length  5 we have 1 code  11110
         For length  6 we have 1 code  111110
         For length  7 we have 0 codes  -- skip
 (if we had 1 code for length 7,
          we would have                1111110)
         For length  8 we have 1 code  11111100 (You see that the code is still
                                                 shifted to left though we skipped
                                                 the code value for 7)
         .....
         For length 16, .... (the same thing)

I've told you that in the Huffman table in the JPG file are stored the BYTE values
for a given code.

For this particular example of Huffman codes:
Say that in the Huffman table in the JPG file on disk we have (after that 16 bytes
which contains the nr of Huffman codes with a given length):
    45 57 29 17 23 25 34 28
These values corressponds , given that particular lengths I gave you before ,
to the Huffman codes like this :

    there's no value for  code of length 1
    for codes of length 2 : we have 45 57
    for codes of length 3 : 3 values (ex : 29,17,23)
    for codes of length 4 : only 1 value (ex: 25)
    for codes of length 5 : 1 value ( ex: 34)
    ..
    for code of length 7, again no value, skip to code with length 8
    for code of length 8 : 1 value 28

IMPORTANT note:
  For codes of length 2:
      the value 45 corresponds to code 00
                57             to code 01
  For codes of length 3:
      the value 29 corresponds to code  100
                17       ----||---      101
                23       ----||---      110

  ETC...
(I've told you that for a given length the byte values are stored in the order
of increasing the value of the Huffman code.)

Four Huffman tables corresponding to DC and AC tables of the luminance, and
DC and AC tables for the chrominance, are given in an annex of the JPEG
standard as a suggestion for the encoder.
 The standard says that these tables have been tested with good compression
results on a lot of images and reccommends them, but the encoder can use any
other Huffman table. A lot of JPG encoders use these tables. Some of them offer
you an option: entropy optimization - if it's enabled they'll use Huffman
tables optimized for that particular image.

The JFIF (Jpeg Format Interchange File) file
---------------------------------------------
	The JPEG standard (that in the itu-1150.ps file) is somehow very general,
the JFIF implementation is a particular case of this standard (and it is, of course,
compatible with the standard) .
	  The JPEG standard specifies some markers reserved for applications
(by applications I mean particular cases of implementing the standard)
 Those markers are called APPn , where n ranges from 0 to 0xF ; APPn = FFEn
 The JFIF specification uses the APP0 marker (FFE0) to identify a JPG file which
uses this specification.
 You'll see in the JPEG standard that it refers to "image components".
These image components can be (Y,Cb,Cr) or (YIQ) or whatever.
 The JFIF implementations uses only (Y,Cb,Cr) for a truecolor JPG, or only Y for
a monochrome JPG.
 You can get the JFIF specification from www.jpeg.org

The sampling factors
--------------------
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -