📄 changes_from_vm5_1.txt
字号:
CHANGES IN VM5.2 FROM VM5.1
=============================
_____________________________________________________________________________
More Flexible Coordinate Reference System
_____________________________________________________________________________
* The current version of the VM supports image representations which include
-- tiling with arbitrary size tiles
-- both the high-pass first and low-pass first WT conventions
-- cropping of images in the compressed domain with minimal transcoding
-- geometric manipulation (flipping and transpose) of images in the
compressed domain with minimal transcoding
All of these capabilities are supported by means of a more general set of
conventions for generating tile, block and frame partitions and for
identifying the locations of boundaries for the Wavelet transform. The
original conventions are the default, in which case full compatibility with
earlier bit-streams is preserved. The modifications were required in order
to support the most efficient SSO-DWT implementation, which was accepted in
Seoul, but never fully implemented. The more general conventions also
implicitly improve compressed domain editing capabilities.
* The implementation is best understood in terms of two concepts. The first
of these concepts is that of the canvas. The canvas is defined on the high
resolution grid of Ricoh's syntax and its extent is identical to that
indicated by the existing syntax markers which identify image dimensions.
In fact, the canvas is identical to the image, unless a new CME marker is
found (only included if user is opting to use the new flexibility) which
indicates the non-negative displacement of the upper left hand corner of
the image relative to that of the canvas. The DWT is only periodically
shift invariant so that it is important to define how it is aligned with
respect to the image. With the generalized conventions, the DWT is
aligned with respect to the canvas, not to the image itself. Of course,
symmetric extension is applied at the image boundaries, which need not be
identical to the canvas boundaries. The lower and right hand sides of the
canvas coincide with those of the image while the upper and left hand sides
of the canvas need not. It is best to think of the difference as
representing a "missing" image, whose dimensions are the difference between
the canvas and image dimensions in the high resolution grid. The canvas
and "missing" image dimensions may be propagated to any image component at
any resolution level by dividing by F*2^k where F is the relevant component
sub-sampling factor in the Ricoh syntax and k is the number of discarded
resolution levels; fractions are rounded up to the next integer for both
the canvas and the "missing" image. Thus, when the "missing" image
dimensions are zero, we obtain the previous set of conventions. When
non-zero, we obtain a new set of Wavelet transform conventions. Also, when
cropping a compressed image, one need only transcode those code-blocks
which lie near the cropped image boundaries, in which case the "missing"
image dimensions should be interpreted as the amount by which the image has
been cropped from the top and left.
* The second concept required to fully support all included technology as it
was originally proposed is that of an arbitrary partitioning reference
point. The partitioning reference point is defined on the high resolution
grid again and represents the anchor point for the tile partitioning. The
partitioning reference point must lie on the canvas and may not lie to the
right or below the upper left hand corner of the image on the canvas.
Moreover, the partitioning reference point may not lie so far to the left
or above the image that any image tile becomes empty. The partitioning
reference point plays a useful role during compressed domain cropping, but
it is motivated primarily by the need to be able to anchor the tile
partition at the top left hand corner of the image, even when this point is
not identical to the origin of the canvas. In particular, by setting the
top left hand corner of the image to the point (1,1) on the canvas, the
high-pass first convention will be applied for the DWT instead of the
low-pass convention in all resolution levels, since the absolute coordinate
of the first pixel in the image is odd at all resolution levels when this
happens (this is because the "missing" image always measures 1x1). By
setting the partitioning reference point also to (1,1), the tile partition
will be aligned with the image, rather than the canvas.
* The code-block and frame partitions are anchored on the tile partition
induced by the partitioning reference point mentioned above. This allows
arbitrary tile sizes to be fully supported on the high resolution grid. In
fact, the tile dimensions on the high resolution grid need not even be
divisible by any of the component sub-sampling factors. Meaningful
behaviour is obtained in every case, even under resolution scalability,
since the tile coordinates on the high resolution grid are traced down into
each resolution level of each image component by treating the tile in
exactly the same way as the canvas with a "missing" image. Specifically,
let A denote the coordinate of the first pixel in the tile (in some
dimension) and let B denote the coordinate of the first pixel in the next
tile (in the same dimension). The B may be understood as the dimension of
a "mini-canvas" and A the dimension of the corresponding "missing" image
component. To obtain the corresponding coordinates in any resolution level
of any image component, we simply divide both A and B by F*2^k, rounding up
to the nearest integer, where F is the component sub-sampling factor and k
is the number of discarded resolution levels.
* For a more detailed description of the above conventions, see the special
topic entitled "Blocks, Tiles and Frames" in "ifc.h". Also see the usage
statement for `-Frev' and `-Fpref'.
_____________________________________________________________________________
Changes in Object Switching
_____________________________________________________________________________
* The quantizer and dequantizer objects previously switched between deadzone,
mask and TCQ implementations in a generic object initialize function. Now
this switching is handled within the compress.c and decompress.c files.
This method of switching is actually quite simple, allows transparent
capability for experimenters to change the size of these objects, and
should provide an example for how to easily integrate new implementations
for any object.
_____________________________________________________________________________
Changes in Bit-Stream Syntax
_____________________________________________________________________________
* A new CWT implementation for passing user defined wavelet kernels was
integrated. This change passes kernel information directly through the
encoded bit-stream instead of indirectly with the path-name of a user
kernel file.
* Several memory initializations were added for ROI to avoid potential
problems in debugging mode. These problems caused ROI to not work at all
on PC platforms.
* Minor changes for both the COD marker and multi-resolution definitions were
made to resolve discrepancies between the VM document and code.
* Added the -Bprint_part2 option to see which Part II features are included
in the bit-stream.
_____________________________________________________________________________
Changes in Rate Allocation
_____________________________________________________________________________
* The Lagrangian Rate Allocator (LRA) in VM5.1 was not allowed to work work
with the -Ftiles option due to a bug in the TCQ code. This combination is
now allowed, however, the current implementation uses the same quantization
step size for the same subband in each tile. Thus, the number of step
sizes is equal to the number of subbands in any of the tiles.
_____________________________________________________________________________
Bug Fixes
_____________________________________________________________________________
* A minor bug in the `-Fweight' implementation with `-Flra' was fixed. In
VM5.1, the energy weights were multiplied just once by each of the factors
in the `-Fweight' file. Now the square of these factors is used.
* A major bug for TCQ's 16 bit implementation mode was fixed. This bug led
to segmentation faults in the decoder for the vm5_expand_16 program.
Several "long" type variables were changed type "ifc_int" in order to fix
this bug. The TCQ implementation should better fit the rules laid out in
ifc.h. Along with this bug fix, several other problems were addressed
which relate to the compatibility between implementations of differing
precision. All of the changes for these problems were isolated to the
tcq_*.c files in the quantization directory.
* The TCQ implementation passes trellis start states and trained codewords
to the decoder by shifting this info into the LSB's of quantized indices
passed to the entropy encoder. The entropy encoder determines the number
of bit-planes which should be encoded using the step-sizes used for
quantization. In VM5.1, the step-sizes used to determine the number of
bit-planes was twice that actually used for TCQ. This was a potential bug.
The original intent was to base the number of bit-planes on half the step
size used by TCQ, thereby telling the entropy encoder that an extra
bit-plane should be coded to include the required TCQ information.
However, the entropy coder determines the number of bit-planes assuming
use of a scalar quantizer and the dynamic range of TCQ quantized indices
is roughly 1-bit less than that for scalar quantizers for a given step
size. Therefore, the entropy coder in VM5.2 now uses the same step-sizes
as those used for TCQ in determining the number of bit-planes.
* A minor LRA bug in the compress.c function was fixed. This bug would
allow the first rate iteration bit-stream to be output to file if the base
step size of 1.0 led to an achieved bit-rate which met the desired rate.
This completely avoids LRA advantages. Therefore, the LRA code in
compress.c was modified to force further iterations even if the first
iteration met the desired rate tolerance.
* Minor changes were put into the estimate_layer_threshold() function of
ebcot_send_bits.c to avoid potential divide by zero problems. Such
problems occasionally occurred on non-PC platforms and resulted in a
failed assertion near the end of that routine. These changes should not
affect the behaviour of generic scalability on most imagery, but,
progressive decodes on small images may be slightly different.
* Changes were put in place to correct several error resilience problems.
The current code should handle errors in both packet heads and packet
bodies much better and such errors should not cause the decoder to crash.
All of the error resilience related changes are isolated to
ebcot_receive_bits.c and ebcot_decoder.h.
* A bug regarding the scaling factor of wavelet coefficients before the
non-linear transducer function was found and fixed for the visual masking
quantizer (`-Qmask'). Performance was improved with these changes,
especially when the CSF weighting table is flat. See the mask_quant.c and
mask_dequant.c files for these changes.
* In VM4.0 through VM5.1, it was possible to get an encoded file which was
too large for the rate specified with `-r'. The bit-stream object should
now correctly truncate the output stream to meet the desired rate. A
warning message is also printed when the output stream object must truncate
the data passed through the VM rate control mechanism.
* Fixed several bugs for multi-component imagery were fixed. These bugs
included:
1. Tiling with multi-components of different resolutions
2. Frame encoding with multi-components of different resolutions
3. Visual progressive encoding with multi-components
4. Visual masking with multi-components
* Fixed random access of tiled bit-streams without decoding packet headers.
This fix basically writes the IET marker in tile headers, but only when the
encoder `-Cwriteiet' flag is set.
* The CHANGES_FROM_VM5_0.TXT file failed to state that Point Symmetric
Extension was added back to VM5.1. It's implementation, however, had a
rather significant bug which pass the PSE flag to the decoder. This bug is
now fixed and PSE should be now be fully operable.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -