📄 ifc.h
字号:
would not be nearly so amenable to implementation. The rounding policy
is that RND(x) is the closest integer to x, with x/2 always rounded up
to the next larger integer, regardless of the sign of x.
To complete our description of lifting it is important, once again,
to define the boundary extension policy. Lifting implementations
offer tremendous flexibility in designing boundary extension policies,
since reversibility is guaranteed by the lifting procedure itself.
However, the most natural extension policies for lifting implementations
are not generally equivalent to the symmetric extension policy described
above for convolution kernels. To ensure that non-reversible filters
can be implemented interchangeably via convolution or lifting, the
boundary extension policy for lifting is defined to be that policy which
is equivalent to the symmetric extension policy for the equivalent
convolution kernel. When the lifting procedure is reversible, the
policy must be equivalent to the symmetric extension policy for the
equivalent linearized convolution kernel.
********************/
/* ========================================================================= */
/* ------------- Data Normalization and Dynamic Range Policies ------------- */
/* ========================================================================= */
/********************
An important system-wide issue which must be understood concerns the
dynamic range of the sample values which are passed from object to
object. In VM3A, the implementation precision is adjustable, depending
upon the value assigned to the IMPLEMENTATION_PRECISION macro at compile
time. Perhaps more importantly, all aspects of the system are designed
so as to ensure that particular instances which are compiled with
different IMPLEMENTATION_PRECISION values can inter-operate in the most
satisfactory way. Thus, a bit-stream which was generated with an
implementation of the compressor working with 16-bit data precision
can be correctly decompressed using a decompressor working with 32-bit
data precision. More interestingly, a bit-stream which was generated
with 32-bit precision can be correctly decompressed using a decompressor
working with 16-bit data precision, up until the point where this
is insufficient to represent the dynamic range of the data being
decompressed; even then, a decompressor compiled with lower
implementation precision will generally discard only least significant
bits from high dynamic range sample values.
In order to enable this behaviour and to fully support the
generality which might be expected from an image compression standard
expected to last well into the second millenium, we must be careful
to define how data values are to be interpreted in a manner which is
independent of the implementation precision.
Non-Reversible Systems
----------------------
We begin by discussing non-reversible decompositions; these may
distinguished from reversible (i.e. lossless) decompositions by
the fact that the number of least significant bits which might be
kept by a particular implementation of the decomposition is unbounded.
By contrast, in reversible decompositions, the least significant bits
play an important well defined role.
For non-reversible transforms, the original I-bit image samples are
level shifted to a nominal range of -2^{I-1} to 2^{I-1} and then
shifted up by (P-(I+G)) bits to fit within the P-bit implementation
precision, where P is the value of the IMPLEMENTATION_PRECISION macro
and G is the number of guard bits, which is typically equal to 2, but
may be explicitly set during compression using the `-Fguard_bits'
argument. The Wavelet transform kernels are then normalized so that
the low-pass analysis filters always have a DC gain of 1.0 and the
high-pass analysis filters always have a Nyquist gain of 1.0. This
means that the nominal range of subband samples in every subband
should be -2^{P-G-1} to 2^{P-G-1}. Unlike the original image samples,
however, this is only a nominal range and small excusions beyond this
range can occur. It is, of course, not too difficult to calculate the
maximum excursions by direct evaluatation of the BIBO (Bounded Input
Bounded Output) gain of the linear system; however, we prefer instead
to simply provide a single number of guard bits, G, which is assumed
to be sufficient to accommodate the largest reasonable excursions
without overflow. There are various reasons for selecting the guard
bit approach rather than direct BIBO gain computation. The BIBO gain
computed at the compressor and decompressor might be different, since
there is no guarantee that they would perform the arithmetic
computations with the same precision; it is certainly cheaper to send
a single parameter, G, than to send the results of the BIBO gain
computation for each subband. Also, in non-reversible systems it
might be desirable to select a less conservative number of guard bits,
so long as the risk of overflow is minimal in practice.
In the same way, quantization indices within VM3A are also
adjusted to have a nominal dynamic range of P-G bits. Specifically,
we express relative quantization step sizes in terms of an integer
exponent, E, a mantissa, M, as S = 2^{-E}*(1+2^{-M_bits}*M),
where M_bits is the number of bits used to represent the mantissa.
Relative quantization step sizes are the absolute step sizes which
should be used if the input samples were normalized to a nominal
dynamic range of 1.0 (i.e. -0.5 to 0.5). Consequently, one might
expect the quantization indices to have a nominal dynamic range of
1/S, which yields E bits, since 1 <= (1+2^{-M_bits}*M) < 2.
However, for a variety of reasons which are carefully explained with
the definitions of `forward_info__get_quant_info' and
`reverse_info__get_quant_info', the actual quantization indices
which are pushed to the encoder (or pulled from the decoder), are
shifted up by L = P-G-E bits, so that the quantized sample indices
also have a nominal dynamic range of P-G bits and the G bits should
be sufficient to avoid overflow. L is identified as the number of
`extra_lsbs' in the call to `forward_info__get_quant_info' or
`reverse_info__get_quant_info', as appropriate. These L extra LSB's
of the quantized sample indices are never coded, but they do carry
useful information to the encoder or from the decoder, as appropriate.
The nature and purpose of this extra information is clearly explained
in the comments appearing with the defintions of the relevant
interface functions.
From the above discussion, it should be evident that the
value of G must be included in the header of the bit-stream. There is
no reason why the forward or inverse transform need use G guard bits
in their implementation. In fact, the concept is only strictly relevant
for fixed-point implementations of non-reversible decompositions.
However, G is an excellent choice for the number of guard bits to use
in a fixed-point implementation of the decomposition. Most hardware
and many software implementations of the transform should use
fixed-point computations for efficiency.
Reversible Systems
------------------
In reversible systems, many of the considerations which formed our
policy for non-reversible systems are different. Most particularly,
the least significant bits are critical and we do not have any
control over normalization of the filtering kernels. Nevertheless,
we do our best to ensure that the coding and quantization stages, at
least, may remain ignorant of these distinctions.
In reversible systems, we do not pre-shift the image sample values
to conform to some particular concept of a nominal dynamic range.
Instead, we map the least signicant bit of the input image samples to the
least significant bit of the P-bit sample values used by the
reversible transform. The nominal range of sample values generally
grows as they move through the decomposition, but the absolute
quantization step size which must be used to ensure lossless
compression is always exactly 1.
Exactly as in the non-reversible case, the samples which are pushed
to the encoder or pulled from the decoder are shifted up by L bits,
where L = P-G-E and E is a value which is sent in the global header of
the bit-stream for each subband. In many ways it plays an analogous
role to the relative step size exponent in the non-reversible case, but
it is best understood simply as a constant which is assigned during
compression in order to ensure a good match between the available
dynamic range, P, and the up-shifted sample values. The compressor
is free to perform a BIBO gain calculation to determine a value for
E in each subband which will ensure that overflow cannot occur; however,
it is also free to arrive at a value in any way it likes.
It should be apparent that the value of G plays no important role
in the reversible case, so long as the compressor and decompressor can
agree on a fixed value (e.g. 0). However, for the moment, we send
the value of G in the bit-stream for consistency with the non-reversible
case.
Mixed Systems and Colour Transforms
-----------------------------------
The VM supports sufficient flexibility for some components to have
a reversible decomposition while others have a non-reversible
decomosition; moreover, this can change from tile to tile. This poses
no difficulty, except in the case where one of the colour transforms
defined as intrinsic to the standard is in use. These two transforms
are the RCT (reversible colour transform) and the YCbCr colour transform.
We will refer to the latter as the NRCT as the principles are the same
for any non-reversible transform. Currently we consider colour
transforms ill defined unless image components have the same bit-depth
and are all transformed reversibly (in the case of an RCT) or
non-reversibly (in the case of an NRCT).
********************/
/* ========================================================================= */
/* ------------------------- Blocks, Tiles and Frames ---------------------- */
/* ========================================================================= */
/********************
The terms "block", "tile" and "frame" all suggest sub-division into
rectangular partitions of some sort within the image. These
terms each have well-defined and distinct interpretations which are
described here. We encourage the use of these terms only within the
confines of these very specific interpretations so as to avoid
confusion.
Tiles
-----
The term "tile" is used to identify a rectangular region of
the image which is compressed indedendently of any other tile. This is
the strongest concept of independence. In fact, in principle, there
is no reason why each tile cannot be compressed using a completely
different algorithm. To ensure independence, the following operations
must all be performed independently on tiles, as though they were
separate images:
1) colour transforms in the `component_mix' and `component_demix'
objects;
2) the Wavelet decomposition and reconstruction operations performed
by the `analysis' and `synthesis' objects;
3) all quantization operations performed by the `quantizer' and
`dequantizer' objects;
4) all entropy coding operations performed by the `encoder'
and `decoder' objects; and
5) all ROI computations performed by the `forward_roi' and
`reverse_roi' objects.
Tile boundaries must be carefully observed in all the objects mentioned
above. In particular, it is clear that code-blocks and frames must
not span tile boundaries.
All tiles except those which abut the image boundaries must have
the same size, which is identified as the nominal tile size. The
nominal tile size is arbitrary (i.e. it need not be a power of 2
as in earlier versions of the VM). However, it should be noted that
tile sizes which are not multiples of 2^L where L is the number of
resolution levels in the Wavelet transform will cause the Wavelet
transform conventions (HP first vs. LP first) to alternate from tile
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -