📄 readme.txt

📁 傅立叶变换和小波变换是图像压缩的重要工具。该代大戏是利用小波变换进行图像压缩。
💻 TXT
字号:

"Wave5" aka "WaveVideo"
by Charles Bloom
code September 1999
readme October 2004

Here it is, 5 years after I wrote WaveVideo, and I'm finally releasing it!

WaveVideo is a proof-of-concept for wavelet video. At the time I wrote
it, it was the only functional real-time (30 fps) wavelet video codec. I
was running on a 300 Mhz AMD K6 CPU, doing full-screen (320x240) wavelet
video in real-time (30 fps). Since then, wavelet video has become
mainstream, in Bink and DivX, and other places.

WaveVideo is NOT intended as a real functional codec. There are many
ways it's just not optimal, mainly in the poor encoding of the LL band
with DPCM, and the complete lack of motion compensation or any
inter-frame delta coding at all (!!). That said, the quality and
compression is still surprisingly good.

WaveVideo is intended for very-high-compression (eg. modem/phone lines)
and very low CPU powers. On modern processors (3 GHZ+), it can do
640x480 video at 60 fps, easily.

Here in 2004, WaveVideo still has some good nuggets of code. The Wavelet
Video core is still the state of the art in terms of fast wavelet
decoders. It uses an unusual data routing path that decodes and
transforms lines in sequential order, which is extremely good for cache.
Most Wavelet decoders perform very badly on machines with small cache,
but WaveVideo does very well even on machines with only 32k of data
cache.

Obviously, being from '99, WaveVideo does not make any use of modern
instruction sets like SSE2 which could make it even faster.

WaveVideo makes use of a crappy legacy DirectDraw display layer, which
has some crash bugs and poor performance.  To do proper timing, run in
no-display mode (-n).

WaveVideo options :

Wave (Video) v0.6 by Charles Bloom, Sept-Dec of 1999
  compiled Oct  3 2004,10:54:38
to encode : wave [options] <from> <to> [w] [h]
to decode : wave <wave stream>
encode options :
  k# : set target comp K # (per frame) [4]
  l# : set target comp bytes # (per frame) [4096]
  e# : set target rmse # <use -e or -k or -l>
  f# : set frame rate (fps) [20]
  o# : offset in bytes into the raw stream
decode options :
  s  : use fullscreen mode [window]
  p  : start paused [play]
  n  : no display (for timing)
  rN : dump to raw file name N
  bN : dump to bmp sequence name N


The input raw files should be RGB raw frames. eg. a movie that's 384*288
pixels and 368 frames, should be a RAW file of 384*288*3*368 bytes.


This is an ancient article I wrote on ways I would improve WaveVideo :

http://www.cbloom.com/news/wave_video.html


I always thought WaveVideo or something similar could be used for very
cheap Video TeleConferencing (VTC). Since the CPU needs are so minimal
by today's standards, you can easily run WaveVideo bi-directional on
something like the Sony PSP or Nokia N-Gage at something like 160x120x15
fps.

====================================================================================

Notes on the amount of memory used in cache :

			/****************
			*
			*	This row-flow method uses a minimum amount of "hot" memory -
			*		only 6k cache is used for a 256-wide image
			*
				// <>

				On larger cache machines, we may do better by adding some rows
					to the DCB and TCB so that we can do :
						1. decode several rows to the DCB
						2. transform to several rows in the TCB
						3. do vtransform several times on the TCB's
					this gives us better register coherence & lets us stay
						in one function longer (big deal on K7)

				---------------

				we do :
					on 3 bands :
						decode row
						dequant row
						(touches 3 DCB rows = 3*width)
					de-htransform up & down
						(touches the LL row = width)
						(touched 2 target TCB rows = 4*width)
					de-vtransform
						(touches 2 target rows = 2*width)

					total = 10*subb_width floats

					= 5*next_width*4 bytes

					<= 5k for a 256x256 image

				total touched *= subb_height

					= 10 * next_width * next_height bytes
					= 10 bytes per image pel

				at all times :
					the 3 DCB rows should be in cache	(3*4/2 * next_width)
					the 2*2 TCB rows should be in cache (4*4 * next_width)
					= 5.5 K (for 256 width)
				the only hot memory is :
					one LL row in						(4/2 * next_width)
					two LL rows out						(4*2 * next_width)
					= 2.5 K (for 256 width)
					or 1/4 of the image bytes in additional overhead, per level
				adding all levels :
			*		(1/4) + (1/16) + .. = 4/3 of image bytes
			*		= 87k total overhead touched (on 256x256)
			*
			******************/

====================================================================================

Some rough notes :

Basically WaveVideo works in-place on an image.  It is a full-image wavelet transform,
it is not block-based at all.  It is also a true Wavelet transform (CDF22 and Daubechies 9/7
are supported), not just a trivial Haar wavelet.

The decode works with a few circular row buffers.  The size of the buffers needed depends on
the span of the transform; eg. Haar needs 2 row, CDF22 needs 3 rows, D97 needs 7 rows.

Very good speed can be acheived with a horizontal CDF22 transform and a vertical Haar transform.

I've got some nice hand-written diagrams of the memory flow, but nothing digitial.

Basically you first decode the wavelet coefficients for the needed rows into the temp LH,HL,HH
bands.  You have the previous LL band already.  You never actually decode the full image, you
decode rows as you need them.  Then you do the horizontal inverse transform on the LL+LH and
the HL+HH to make a L and H temp vertical rows; you build these into temp circular row buffers
until you have enough to do inverse vertical transforms, and then you do that into the output
image.

Each level of wavelet inverse transform works like this :

[coded stream] -> Coefficient decoder -> [wavelet coefficients for LH,HL,HH bands in temp buffers]
	[previously decoded LL band] + [wavelet coefficients for LH,HL,HH bands] ->
		horizontal inverse transform -> [temp L and H vertical rows] ->
		vertical inverse transform -> [decoded image rows]

the decoded rows are thus streamed out in order.  So, we are streaming in the coded bits and
the previous LL band, and streaming out a decoded image.

On the final level, we also do a YUV->RGB transform (the coded stream is in YUV).

====================================================================================
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -