📄 jpgalgo.txt

📁 jpeg格式的详细说明,有做这些方面的研究的朋友可以看看!
💻 TXT
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
A note about the JPEG decoding algorithm.
Copyright 1999 Cristi Cuturicu.

DISCLAIMER
...........
You get this file for free, so you cannot have any legal requests from me.
If you don't agree, read no more.
No warranty is provided with this doc, there might be bugs or errors in it
(although I've tried to avoid them), so use the information contained in this
file at your own risk.
This is NOT an official documentation, for further information please refer
to the JPEG ISO standard.
All product names mentioned in this file are trademarks or registered trademarks
of their respective owners.
You are free to distribute it, as long as you do not modify it.

First, a word about this doc
............................
This doc tries to explain the JPEG compression algorithm. I'm not an expert in
this field, I just needed this info for my own JPEG decoder.
Long ago, I wanted to write my own JPEG decoder, so I've been looking on the net
a good doc which could have explained to me the JPEG compression, and particularly
the JPG file format. And except the standard I couldn't find one.
The ISO-ITU JPEG standard = ISO standard 10918-1 or CCITT standard recommendation
T.81:
"Information Technology - Digital compression and coding of continuous-tone
still images - Requirements and guidelines")
Though this standard is quite complete, it has a lot of not interesting parts
in its 186 pages, and I had to dig in it, and then write my own JPG viewer,
to get from this standard the main stuff I needed :
    The Baseline Sequential DCT JPG compression.
So I thought that a short (but with enough details) doc might be useful to others.

Mainly because of the fact that the majority of the JPG files are
Baseline Sequential JPGS, this doc concerns only the Baseline Sequential JPG
compression and particularly the JFIF implementation of it.
It DOES NOT cover the JPG Progresive or Hierarchical compression.
(For more details about these read the itu-1150 standard.
 It can be found at www.wotsit.org or somewhere at www.jpeg.org/jpeg)

I've thought that it would be easier for the reader to understand the JPG
compression if I'll explain the steps of the JPG encoder.
(The decoder steps will be the inverse of the encoder's steps, but in reverse
order, of course)


THE JPEG ENCODER STEPS
----------------------

1) The afine transformation in colour space :  [R G B] -> [Y Cb Cr]
---------------------------------------------------------------------

(It is defined in the CCIR Recommendation 601)

(R,G,B are 8-bit unsigned values)

	| Y  |     |  0.299       0.587       0.114 |   | R |     | 0 |
	| Cb |  =  |- 0.1687    - 0.3313      0.5   | * | G |   + |128|
	| Cr |     |  0.5       - 0.4187    - 0.0813|   | B |     |128|

The new value Y = 0.299*R + 0.587*G + 0.114*B  is called the luminance.
It is the value used by the monochrome monitors to represent an RGB colour.
Physiologically, it represents the intensity of an RGB colour perceived by
the eye.
You see that the formula for Y it's like a weighted-filter with different weights
for each spectral component: the eye is most sensitive to the Green component
then it follows the Red component and the last is the Blue component.

The values Cb =  - 0.1687*R - 0.3313*G + 0.5   *B + 128
	   Cr =    0.5   *R - 0.4187*G - 0.0813*B + 128
are called the chromimance values and represent 2 coordinates in a system
which measures the nuance and saturation of the colour ([Approximately], these
values indicate how much blue and how much red is in that colour).
These 2 coordinates are called shortly the chrominance.

[Y,Cb,Cr] to [R,G,B] Conversion (The inverse of the previous transform)
--------------------------------
RGB can be computed directly from YCbCr ( 8-bit unsigned values) as follows:

R = Y                    + 1.402  *(Cr-128)
G = Y - 0.34414*(Cb-128) - 0.71414*(Cr-128)
B = Y + 1.772  *(Cb-128)


A note relating Y,Cb,Cr to the human visual system
---------------------------------------------------
The eye, particulary the retina, has as visual analyzers two kind of cells :
Cells for night view which perceive only nuances of gray ranging from intense
white to the darkest black and cells for the day view which perceive the color
nuance.
The first cells, given an RGB colour, detect a gray level similar to that given
by the luminance value.
The second cells, responsible for the perception of the colour nuance, are the
cells which detects a value related to that of the chrominance.


2) Sampling
------------

The JPEG standard takes into account the fact that the eye seems to be more
sensitive at the luminance of a colour than at the nuance of that colour.
(the white-black view cells have more influence than the day view cells)

So, on most JPGS, luminance is taken in every pixel while the chrominance is
taken as a medium value for a 2x2 block of pixels.
Note that it is not neccessarily that the chrominance to be taken as a medium
value for a 2x2 block , it could be taken in every pixel, but good compression
results are achieved this way, with almost no loss in visual perception of the
new sampled image.

A note : The JPEG standard specifies that for every image component (like, for
example Y) must be defined 2 sampling coefficients: one for the horizontal
sampling and one for vertical sampling.
These sampling coefficients are defined in the JPG file as relative to the
maximum sampling coefficient (more on this later).

3) Level shift
--------------
All 8-bit unsigned values (Y,Cb,Cr) in the image are "level shifted": they are
converted to an 8-bit signed representation, by subtracting 128 from their value.

4) The 8x8 Discrete Cosine Transform (DCT)
------------------------------------------

The image is break into 8x8 blocks of pixels, then for each 8x8 block is
applied the DCT transform. Note that if the X dimension of the original image
is not divisible by 8, the encoder should make it divisible, by completing the
remaining right columns (until X becomes a multiple of 8) with the right-most
column of the original image.
Similar, if the Y dimension is not divisible by 8, the encoder should complete
the remaining lines with the bottom-most line of the original image.
The 8x8 blocks are processed from left to right and from top to bottom.

A note: Since a pixel in the 8x8 block has 3 components (Y,Cb,Cr) the DCT
is applied separately to 3 blocks 8x8:
  The first 8x8 block is the block which contains the luminance of the pixels
   in the original 8x8 block
  The second 8x8 block is the block which contains the Cb value of the pixels
   in the original 8x8 block
  And, similar, the third 8x8 block contains the Cr values.

The purpose of the DCT transform is that instead of processing the original
samples, you work with the spatial frequencies present in the original image.
These spatial frequencies are very related to the level of detail present in an
image. High spatial frequencies corresponds to high levels of detail, while
lower frequencies corresponds to lower levels of detail.

The DCT transform is very similar to the 2D Fourier transform which shifts from
the time domain (the original 8x8 block) to the frequency domain (the new 8x8=
64 coefficients which represents the amplitudes of the spatial frequencies
analyzed)

The mathematical definition of Forward DCT (FDCT) and Inverse DCT (IDCT) is :

FDCT:
	   c(u,v)     7   7                 2*x+1                2*y+1
F(u,v) = --------- * sum sum f(x,y) * cos (------- *u*PI)* cos (------ *v*PI)
	     4       x=0 y=0                 16                   16

 u,v = 0,1,...,7

	  { 1/2 when u=v=0
 c(u,v) = { 1/sqrt(2) when  u=0, v!=0
          { 1/sqrt(2) when u!=0, v=0
	  {  1 otherwise


IDCT:
	   1     7   7                      2*x+1                2*y+1
f(x,y) =  --- * sum sum c(u,v)*F(u,v)*cos (------- *u*PI)* cos (------ *v*PI)
	   4    u=0 v=0                      16                   16

 x,y=0,1...7

Applying these formulas directly is computationally expensive, especially
when there have been developed faster algorithms for implementing forward or
inverse DCT. A notable one called AA&N leaves only 5 multiplies and 29 adds
to be done in the DCT itself. More info and an implementation of it can be 
found in the free software for JPEG encoders/decoders made by Independent JPEG 
Group (IJG), their C source can be found at www.ijg.org.

5) The zig-zag reordering of the 64 DCT coefficients
-----------------------------------------------------

So, after we performed the DCT transform over a block of 8x8 values, we have
a new 8x8 block.
Then, this 8x8 block is traversed in zig-zag like this :

(The numbers in the 8x8 block indicate the order in which we traverse the
bidimensional 8x8 matrix)
		  0, 1, 5, 6,14,15,27,28,
		  2, 4, 7,13,16,26,29,42,
		  3, 8,12,17,25,30,41,43,
		  9,11,18,24,31,40,44,53,
		 10,19,23,32,39,45,52,54,
		 20,22,33,38,46,51,55,60,
		 21,34,37,47,50,56,59,61,
		 35,36,48,49,57,58,62,63

As you see , first is the upper-left corner (0,0), then the value at (0,1),
then (1,0) then (2,0), (1,1), (0,2), (0,3), (1,2),  (2,1), (3,0) etc.

After we are done with traversing in zig-zag the 8x8 matrix we have now a vector
with 64 coefficients (0..63)
The reason for this zig-zag traversing is that we traverse the 8x8 DCT coefficients
in the order of increasing the spatial frequencies. So, we get a vector sorted
by the criteria of the spatial frequency:  The first value in the vector (at
index 0) corresponds to the lowest spatial frequency present in the image -
It's called the DC term. As we increase the index in the vector, we get values
corresponding to higher frequencies (The value at index 63 corresponds to the
amplitude of the highest spatial frequency present in the 8x8 block).
The rest of the DCT coefficients are called AC terms.


6) Quantization
----------------

At this stage, we have a sorted vector with 64 values corresponding to the
amplitudes of the 64 spatial frequencies present in the 8x8 block.

These 64 values are quantized: Each value is divided by a dividend specified
in a vector with 64 values --- the quantization table , then it's rounded to
the nearest integer.

 for (i = 0 ; i<=63; i++ )
   vector[i] = (int) (vector[i] / quantization_table[i] + 0.5)

Here is the example of the quantization table for luminance(Y) given in an
annex of the JPEG standard.(It is given in a form of an 8x8 block; in order to
obtain a 64 vector it should be zig-zag reordered)
 16 11 10 16 24  40  51  61
 12 12 14 19 26  58  60  55
 14 13 16 24 40  57  69  56
 14 17 22 29 51  87  80  62
 18 22 37 56 68  109 103 77
 24 35 55 64 81  104 113 92
 49 64 78 87 103 121 120 101
 72 92 95 98 112 100 103 99
This table is based upon "psychovisual thresholding" , it has "been used with
good results on 8-bit per sample luminance and chrominance images".
Most existing encoders use simple multiples of this example, but the values are
not claimed to be optimal (An encoder can use ANY OTHER quantization table)
The table is specified in the JPEG file with the DQT(Define Quantization Table)
marker.Most commonly there is one table for Y, and another one for the
chrominance (Cb and Cr).

The quantization process has the key role in the JPEG compression.
It is the process which removes the high frequencies present in the original
image -- in consequence the high detail.
We do this because of the fact that the eye is much more sensitive to lower
spatial frequencies than to higher frequencies, so we can remove, with very
little visual loss, higher frequencies.
This is done by dividing values at high indexes in the vector (the amplitudes
of higher frequencies) with larger values than the values by which are divided
the amplitudes of lower frequencies.
The bigger the values in the quantization table are, the bigger is the error
(in consequence the visual error) introduced by this lossy process, and the
smaller is the visual quality.

Another important fact is that in most images the colour varies slow from one
pixel to another, so most images will have a small quantity of high detail
-> a small amount (small amplitudes) of high spatial frequencies - but they have
a lot of image information contained in the low spatial frequencies.

In consequence in the new quantized vector, at high spatial frequencies, we'll
have a lot of consecutive zeroes.


7)  The Zero Run Length Coding (RLC)
-------------------------------

Now we have the quantized vector with a lot of consecutive zeroes. We can exploit
this by run length coding the consecutive zeroes.
IMPORTANT: You'll see later why, but here we skip the encoding of the first
 coefficient of the vector (the DC coefficient) which is coded a bit different.
(I'll present its coding later on this doc)
Let's consider the original 64 vector a 63 vector (it's the 64 vector without
the first coefficient)


Say that we have 57,45,0,0,0,0,23,0,-30,-16,0,0,1,0,0,0, 0 , 0 ,0 , only 0,..,0

Here it is how the RLC JPEG compression is done for this example :

(0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; EOB

As you see, we encode for each value different by 0 the number of consecutive
zeroes PRECEDING that value, then we add the value.
Another note : EOB is the short form for End of Block, it's a special coded
value (a marker). If we've reached in a position in the vector from which 
we have till the end of the vector only zeroes, we'll mark that position
with EOB and finish the RLC compression of the quantized vector.


[Note that if the quantized vector doesn't finishes with zeroes (has the last
element not 0) we'll not have the EOB marker.]

ACTUALLY, EOB has as an equivalent (0,0) and it will be (later) Huffman coded
like (0,0), so we'll encode :
 (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; (0,0)


Another MAJOR thing: Say that somewhere in the quantized vector
 we have: 57, eighteeen zeroes, 3, 0,0 ,0,0 2, thirty-three zeroes, 895, EOB

The JPG Huffman coding makes the restriction (you'll see later why) that
the number of previous 0's to be coded as a 4-bit value, so it can't overpass
the value 15 (0xF).

So, the previous example would be coded as :
    (0,57) ; (15,0) (2,3) ; (4,2) ; (15,0) (15,0) (1,895) , (0,0)

(15,0) is a special coded value which indicates that there follows 16 consecutive
zeroes.Note : 16 zeroes not 15 zeroes.

8) The final step === Huffman coding
-------------------------------------

First an IMPORTANT note : Instead of storing the actual value , the JPEG standard
specifies that we store the minimum size in bits in which we can keep that value
(it's called the category of that value) and then a bit-coded representation
of that value like this:

	     Values             Category        Bits for the value
		0                   0                   -
	      -1,1                  1                  0,1
	   -3,-2,2,3                2              00,01,10,11
     -7,-6,-5,-4,4,5,6,7            3    000,001,010,011,100,101,110,111
       -15,..,-8,8,..,15            4       0000,..,0111,1000,..,1111
      -31,..,-16,16,..,31           5     00000,..,01111,10000,..,11111
      -63,..,-32,32,..,63           6                   .
     -127,..,-64,64,..,127          7                   .
    -255,..,-128,128,..,255         8                   .
    -511,..,-256,256,..,511         9                   .
   -1023,..,-512,512,..,1023       10                   .
  -2047,..,-1024,1024,..,2047      11                   .
  -4095,..,-2048,2048,..,4095      12                   .
  -8191,..,-4096,4096,..,8191      13                   .
 -16383,..,-8192,8192,..,16383     14                   .
-32767,..,-16384,16384,..,32767    15                   .


In consequence for the previous example:
    (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-8) ; (2,1) ; (0,0)

let's encode ONLY the right value of these pairs, except the pairs that are
special markers like (0,0) or (if we would have) (15,0)


    57 is in the category 6 and it is bit-coded 111001 , so we'll encode it
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -