📄 jpgalgo.txt
字号:
Note: The following explanation covers the encoding of truecolor (3 components)
JPGS; for gray-scaled JPGs there is one component (Y) which is usually no
down-sampled at all, and does not require any inverse transformation like the
inverse (Y,Cb,Cr) -> (R,G,B). In consequence, the gray-scaled JPGS are the
simplest and easiest to decode: for every 8x8 block in the image you do the
Huffman decoding of the RLC coded vector then you reorder it from zig-zag,
dequantize the 64 vector and finally you apply to it the inverse DCT and add
128 (level shift) to the new 8x8 values.
I've told you that image components are sampled. Usually Y is taken every pixel,
and Cb, Cr are taken for a block of 2x2 pixels.
But there are some JPGs in which Cb , Cr are taken in every pixel, or some
JPGs where Cb, Cr are taken every 2 pixels (a horizontal sampling at 2 pixels,
and a vertical sampling in every pixel)
The sampling factors for an image component in a JPG file are defined in respect
(relative) to the highest sampling factor.
Here are the sampling factors for the most usual example:
Y is taken every pixel , and Cb,Cr are taken for a block of 2x2 pixels
(The JFIF specification gives a formula for sampling factors which I think that
works only when the maximum sampling factor for each dimension X or Y is <=2)
The JPEG standard does not specify the sampling factors , it's more general).
You see that Y will have the highest sampling rate :
Horizontal sampling factor = 2 = HY
Vertical sampling factor = 2 = VY
For Cb , Horizontal sampling factor = 1 = HCb
Vertical sampling factor = 1 = VCb
For Cr Horizontal sampling factor = 1 = HCr
Vertical sampling factor = 1 = VCr
Actually this form of defining the sampling factors is quite useful.
The vector of 64 coefficients for an image component, Huffman encoded, is called
DU = Data Unit (JPEG's standard terminology)
In the JPG file , the order of encoding Data Units is :
1) encode Data Units for the first image component:
for (counter_y=1;counter_y<=VY;counter_y++)
for (counter_x=1;counter_x<=HY;counter_x++)
{ encode Data Unit for Y }
2) encode Data Units for the second image component:
for (counter_y=1;counter_y<=VCb ;counter_y++)
for (counter_x=1;counter_x<=HCb;counter_x++)
{ encode Data Unit for Cb }
3) finally, for the third component, similar:
for (counter_y=1;counter_y<=VCr;counter_y++)
for (counter_x=1;counter_x<=HCr;counter_x++)
{ encode Data Unit for Cr }
For the example I gave you (HY=2, VY=2 ; HCb=VCb =1, HCr,VCr=1)
here it is a figure ( I think it will clear out things for you) :
YDU YDU CbDU CrDU
YDU YDU
( YDU is a Data unit for Y , and similar CbDU a DU for Cb, CrDU = DU for Cr )
This usual combination of sampling factors is referred as 2:1:1 for both
vertical and horizontal sampling factors.
And, of course, in the JPG file the encoding order will be :
YDU,YDU,YDU,YDU,CbDU,CrDU
You know that a DU (64 coefficients) defines a block of 8x8 values , so here
we specified the encoding order for a block of 16x16 image pixels
(An image pixel = an (Y,Cb,Cr) pixel [my notation]) :
Four 8x8 blocks of Y values (4 YDUs), one 8x8 block of Cb values (1 CbDU)
and one 8x8 block of Cr values (1 CrDU)
(Hmax = the maximum horizontal sampling factor , Vmax = the maximum vertical
sampling factor)
In consequence for this example of sampling factors (Hmax = 2, Vmax=2), the
encoder should process SEPARATELY every 16x16 = (Hmax*8 x Vmax*8) image pixels
block in the order mentioned.
This block of image pixels with the dimensions (Hmax*8,Vmax*8) is called, in
the JPG's standard terminology, an MCU = Minimum Coded Unit
For the previous example : MCU = YDU,YDU,YDU,YDU,CbDU,CrDU
Another example of sampling factors :
HY =1, VY=1
HCb=1, VCb=1
HCr=1, VCr=1
Figure/order : YDU CbDU CrDU
You see that here is defined an 8x8 image pixel block (MCU) with 3 8x8 blocks:
one for Y, one for Cb and one for Cr (There's no down-sampling at all)
Here (Hmax=1,Vmax=1) the MCU has the dimension (8,8), and MCU = YDU,CbDU,CrDU
For gray-scaled JPGs you don't have to worry about the order of encoding
data units in an MCU. For these JPGs, an MCU = 1 Data Unit (MCU = YDU)
In the JPG file, the sampling factors for every image component are defined
after the marker SOF0 = Start Of Frame 0 = FFC0
A brief scheme of decoding a JPG file
--------------------------------------
The decoder reads from the JPG file the sampling factors, it finds out the
dimensions of an MCU (Hmax*8,Vmax*8) => how many MCUs are in the whole image,
then decodes every MCU present in the original image (a loop for all these
blocks, or until the EOI marker is found [it should be found when the loop
finishes, otherwise you'll get an incomplete image]) - it decodes an MCU
by decoding every Data Unit in the MCU in the order mentioned before, and
finally, writes the decoded (Hmax*8 x Vmax*8) truecolor pixel block into the
(R,G,B) image buffer.
MPEG-1 video and JPEG
----------------------
The interesting part of the MPEG-1 specification (and probably MPEG-2) is that
it relies heavily on the JPEG specification.
It uses a lot of concepts presented here. The reason is that every 15 frames ,
or when it's needed, there's an independent frame called I-frame (Intra frame)
which is JPEG coded.
(By the way, that 16x16 image pixels block example I gave you, is called,in the
MPEG's standard terminology, a macroblock)
Except the algorithms for motion compensation, MPEG-1 video relies a lot on the
JPG specifications (the DCT transform , quantization, etc.)
Hope you're ready now to start coding your JPG viewer or encoder.
About the author of this doc
----------------------------
My name is Cristi Cuturicu.
I'm a student at University Politehnica in Bucharest (UPB), Department of
Computer Science.
I'm not an expert in compression, I made a JPEG encoder/decoder because I
needed it for a project.
You can contact me by e-mail:
cccrx@kermit.cs.pub.ro (school email)
or cristic22@yahoo.com (preferably)
A technical explanation of the JPEG/JFIF file format,
written by Oliver Fromme, the author of the QPEG viewer
-------------------------------------------------------
Legal NOTE: The legal rules mentioned in the Disclaimer in top of this file
apply also to the following informations so neither Oliver Fromme, neither I
can be held responsible for errors or bugs in the following informations.
The author of the following informations is:
Oliver Fromme
Leibnizstr. 18-61
38678 Clausthal
GERMANY
JPEG/JFIF file format:
~~~~~~~~~~~~~~~~~~~~~~
- header (2 bytes): $ff, $d8 (SOI) (these two identify a JPEG/JFIF file)
- for JFIF files, an APP0 segment is immediately following the SOI marker,
see below
- any number of "segments" (similar to IFF chunks), see below
- trailer (2 bytes): $ff, $d9 (EOI)
Segment format:
~~~~~~~~~~~~~~~
- header (4 bytes):
$ff identifies segment
n type of segment (one byte)
sh, sl size of the segment, including these two bytes, but not
including the $ff and the type byte. Note, not Intel order:
high byte first, low byte last!
- contents of the segment, max. 65533 bytes.
Notes:
- There are parameterless segments (denoted with a '*' below) that DON'T
have a size specification (and no contents), just $ff and the type byte.
- Any number of $ff bytes between segments is legal and must be skipped.
Segment types:
~~~~~~~~~~~~~~
*TEM = $01 usually causes a decoding error, may be ignored
SOF0 = $c0 Start Of Frame (baseline JPEG), for details see below
SOF1 = $c1 dito
SOF2 = $c2 usually unsupported
SOF3 = $c3 usually unsupported
SOF5 = $c5 usually unsupported
SOF6 = $c6 usually unsupported
SOF7 = $c7 usually unsupported
SOF9 = $c9 for arithmetic coding, usually unsupported
SOF10 = $ca usually unsupported
SOF11 = $cb usually unsupported
SOF13 = $cd usually unsupported
SOF14 = $ce usually unsupported
SOF14 = $ce usually unsupported
SOF15 = $cf usually unsupported
DHT = $c4 Define Huffman Table, for details see below
JPG = $c8 undefined/reserved (causes decoding error)
DAC = $cc Define Arithmetic Table, usually unsupported
*RST0 = $d0 RSTn are used for resync, may be ignored
*RST1 = $d1
*RST2 = $d2
*RST3 = $d3
*RST4 = $d4
*RST5 = $d5
*RST6 = $d6
*RST7 = $d7
SOI = $d8 Start Of Image
EOI = $d9 End Of Image
SOS = $da Start Of Scan, for details see below
DQT = $db Define Quantization Table, for details see below
DNL = $dc usually unsupported, ignore
SOI = $d8 Start Of Image
EOI = $d9 End Of Image
SOS = $da Start Of Scan, for details see below
DQT = $db Define Quantization Table, for details see below
DNL = $dc usually unsupported, ignore
DRI = $dd Define Restart Interval, for details see below
DHP = $de ignore (skip)
EXP = $df ignore (skip)
APP0 = $e0 JFIF APP0 segment marker, for details see below
APP15 = $ef ignore
JPG0 = $f0 ignore (skip)
JPG13 = $fd ignore (skip)
COM = $fe Comment, for details see below
All other segment types are reserved and should be ignored (skipped).
SOF0: Start Of Frame 0:
~~~~~~~~~~~~~~~~~~~~~~~
- $ff, $c0 (SOF0)
- length (high byte, low byte), 8+components*3
- data precision (1 byte) in bits/sample, usually 8 (12 and 16 not
supported by most software)
- image height (2 bytes, Hi-Lo), must be >0 if DNL not supported
- image width (2 bytes, Hi-Lo), must be >0 if DNL not supported
- number of components (1 byte), usually 1 = grey scaled, 3 = color YCbCr
or YIQ, 4 = color CMYK)
- for each component: 3 bytes
- component id (1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q)
- sampling factors (bit 0-3 vert., 4-7 hor.)
- quantization table number
Remarks:
- JFIF uses either 1 component (Y, greyscaled) or 3 components (YCbCr,
sometimes called YUV, colour).
APP0: JFIF segment marker:
~~~~~~~~~~~~~~~~~~~~~~~~~~
- $ff, $e0 (APP0)
- length (high byte, low byte), must be >= 16
- 'JFIF'#0 ($4a, $46, $49, $46, $00), identifies JFIF
- major revision number, should be 1 (otherwise error)
- minor revision number, should be 0..2 (otherwise try to decode anyway)
- units for x/y densities:
0 = no units, x/y-density specify the aspect ratio instead
1 = x/y-density are dots/inch
2 = x/y-density are dots/cm
- x-density (high byte, low byte), should be <> 0
- y-density (high byte, low byte), should be <> 0
- thumbnail width (1 byte)
- thumbnail height (1 byte)
- n bytes for thumbnail (RGB 24 bit), n = width*height*3
Remarks:
- If there's no 'JFIF'#0, or the length is < 16, then it is probably not
a JFIF segment and should be ignored.
- Normally units=0, x-dens=1, y-dens=1, meaning that the aspect ratio is
1:1 (evenly scaled).
- JFIF files including thumbnails are very rare, the thumbnail can usually
be ignored. If there's no thumbnail, then width=0 and height=0.
- If the length doesn't match the thumbnail size, a warning may be
printed, then continue decoding.
DRI: Define Restart Interval:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- $ff, $dd (DRI)
- length (high byte, low byte), must be = 4
- restart interval (high byte, low byte) in units of MCU blocks,
meaning that every n MCU blocks a RSTn marker can be found.
The first marker will be RST0, then RST1 etc, after RST7
repeating from RST0.
DQT: Define Quantization Table:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- $ff, $db (DQT)
- length (high byte, low byte)
- QT information (1 byte):
bit 0..3: number of QT (0..3, otherwise error)
bit 4..7: precision of QT, 0 = 8 bit, otherwise 16 bit
- n bytes QT, n = 64*(precision+1)
Remarks:
- A single DQT segment may contain multiple QTs, each with its own
information byte.
- For precision=1 (16 bit), the order is high-low for each of the 64 words.
DAC: Define Arithmetic Table:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current software does not support arithmetic coding for legal reasons.
JPEG files using arithmetic coding can not be processed.
DHT: Define Huffman Table:
~~~~~~~~~~~~~~~~~~~~~~~~~~
- $ff, $c4 (DHT)
- length (high byte, low byte)
- HT information (1 byte):
bit 0..3: number of HT (0..3, otherwise error)
bit 4 : type of HT, 0 = DC table, 1 = AC table
bit 5..7: not used, must be 0
- 16 bytes: number of symbols with codes of length 1..16, the sum of these
bytes is the total number of codes, which must be <= 256
- n bytes: table containing the symbols in order of increasing code length
(n = total number of codes)
Remarks:
- A single DHT segment may contain multiple HTs, each with its own
information byte.
COM: Comment:
~~~~~~~~~~~~~
- $ff, $fe (COM)
- length (high byte, low byte) of the comment = L+2
- The comment = a stream of bytes with the length = L
SOS: Start Of Scan:
~~~~~~~~~~~~~~~~~~~
- $ff, $da (SOS)
- length (high byte, low byte), must be 6+2*(number of components in scan)
- number of components in scan (1 byte), must be >= 1 and <=4 (otherwise
error), usually 1 or 3
- for each component: 2 bytes
- component id (1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q), see SOF0
- Huffman table to use:
- bit 0..3: AC table (0..3)
- bit 4..7: DC table (0..3)
- 3 bytes to be ignored (???)
Remarks:
- The image data (scans) is immediately following the SOS segment.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -