written 6.1 years ago by |
JPEG 2000 (JP2) is an image compression standard and coding system. It was created by the Joint Photographic Experts Group committee in 2000 with the intention of superseding their original discrete cosine transform-based JPEG standard (created in 1992) with a newly designed, wavelet-based method.
JPEG 2000 code streams are regions of interest that offer several mechanisms to support spatial random access or region of interest access at varying degrees of granularity. It is possible to store different parts of the same picture using different quality.
The code stream obtained after compression of an image with JPEG 2000 is scalable in nature, meaning that it can be decoded in a number of ways; for instance, by truncating the code stream at any point, one may obtain a representation of the image at a lower resolution, or signal-to-noise ratio – see scalable compression
The simplified structures of the encoder and decoder of JPEG-2000 are shown in Figure below. Assume that we have a multiple-component image. The major processing steps of the encoder are: component transformation, tiling, wavelet transformation, quantization, coefficient bit modeling, arithmetic coding, and rate-distortion optimization. The role of the decoder is to reverse the steps performed by the encoder, except the rate-distortion optimization step.
Color components transformation
Initially images have to be transformed from the RGB color space to another color space, leading to three components that are handled separately. There are two possible choices:
Irreversible Color Transform (ICT) uses the well known YCBCR color space. It is called "irreversible" because it has to be implemented in floating or fix-point and causes round-off errors.
Reversible Color Transform (RCT) uses a modified YUV color space that does not introduce quantization errors, so it is fully reversible. Proper implementation of the RCT requires that numbers are rounded as specified that cannot be expressed exactly in matrix form. The transformation is:
Tiling
After color transformation, the image is split into so-called tiles, rectangular regions of the image that are transformed and encoded separately. Tiles can be any size, and it is also possible to consider the whole image as one single tile. Once the size is chosen, all the tiles will have the same size (except optionally those on the right and bottom borders).
Wavelet transform
These tiles are then wavelet transformed to an arbitrary depth, in contrast to JPEG 1992 which uses an 8×8 block-size discrete cosine transform. JPEG 2000 uses two different wavelet transforms: Irreversible: the CDF 9/7 wavelet transform. It is said to be "irreversible" because it introduces quantization noise that depends on the precision of the decoder.
Reversible: a rounded version of the biorthogonal CDF 5/3 wavelet transform. It uses only integer coefficients, so the output does not require rounding (quantization) and so it does not introduce any quantization noise. It is used in lossless coding. The wavelet transforms are implemented by the lifting scheme or by convolution.
Quantization
After the wavelet transform, the coefficients are scalar-quantized to reduce the number of bits to represent them, at the expense of quality. The output is a set of integer numbers which have to be encoded bit-by-bit. The parameter that can be changed to set the final quality is the quantization step: the greater the step, the greater is the compression and the loss of quality. With a quantization step that equals 1, no quantization is performed (it is used in lossless compression).
Coding
The quantized sub-bands are split further into precincts, rectangular regions in the wavelet domain. They are typically selected in a way that the coefficients within them across the sub-bands form approximately spatial blocks in the (reconstructed) image domain, though this is not a requirement. Precincts are split further into code blocks. Code blocks are located in a single sub-band and have equal sizes—except those located at the edges of the image. The encoder has to encode the bits of all quantized coefficients of a code block, starting with the most significant bits and progressing to less significant bits by a process called the EBCOT scheme. EBCOT here stands for Embedded Block Coding with Optimal Truncation. In this encoding process, each bit plane of the code block gets encoded in three so-called coding passes, first encoding bits (and signs) of insignificant coefficients with significant neighbors (i.e., with 1-bits in higher bit planes), then refinement bits of significant coefficients and finally coefficients without significant neighbors.
Compression ratio
Compared to the previous JPEG standard, JPEG 2000 delivers a typical compression gain in the range of 20%, depending on the image characteristics. Higher-resolution images tend to benefit more, where JPEG-2000's spatial-redundancy prediction can contribute more to the compression process. In very low-bitrate applications, studies have shown JPEG 2000 to be outperformed by the intra-frame coding mode of H.264. Good applications for JPEG 2000 are large images, images with low-contrast edges — e.g., medical images.
Computational complexity and performance
JPEG2000 is much more complicated in terms of computational complexity in comparison with JPEG standard. Tiling, color component transform, discrete wavelet transform, and quantization could be done pretty fast, though entropy codec is time consuming and quite complicated. EBCOT context modelling and arithmetic MQ-coder take most of the time of JPEG2000 codec.
On CPU the main idea of getting fast JPEG2000 encoding and decoding is closely connected with AVX/SSE and multithreading to process each tile in separate thread. The fastest JPEG2000 solutions utilize power both CPU and GPU to get high performance benchmarks.