written 7.7 years ago by |
i. H.264 is an advanced video codec developed by the ISO and the ITU as a replacement for the existing video compression standards H.261, H.262, and H.263.
ii. H.264 has the main components of its predecessors but they have been extended and improved.
iii. The only new component in H.264 is a (wavelet based) filter, developed specifically to reduce artefacts caused by the fact that individual macroblocks are compressed separately.
iv. The input to a video encoder is a set of video frames (where a frame may consist of progressive or interlaced video).
v. Each frame is encoded separately, and the encoded frame is referred to as a coded picture.
vi. There are several types of frames, mostly I, P, and B and the order in which the frames are encoded may be different from the order in which they have to be displayed.
vii. A frame is broken up into slices, and each slice is further partitioned into macroblocks.
i. In H.264, slices and macroblocks also have types.
ii. An I slice may have only I-type macroblocks, a P slice may have P and I macroblocks, and a B slice may have B and I macroblocks (there are also slices of types SI and SP).
iii. Figure shows the main steps of the H.264 encoder.
iv. A frame F’n is predicted (each macroblock in the frame is predicted by other macroblocks in the same frame or by macroblocks from other frames).
v. The predicted frame P is subtracted from the original F’n to produce a difference D’n, which is transformed (in the box labeled T), filtered (in Q), reordered, and entropy encoded.
vi. What is new in the H.264 encoder is the reconstruction path.
vii. The main part of the encoder is its forward path. The next video frame to be compressed is denoted by Fn.
viii. The frame is partitioned into macroblocks of 16×16 pixels each and each macroblock is encoded in intra or inter mode. In either mode, a prediction macroblock P is constructed based on a reconstructed video frame.
ix. In the intra mode, P is constructed from previously-encoded samples in the current frame n. These samples are decoded and reconstructed, becoming uF’n in the figure.
x. In the inter mode, P is constructed by motion-compensated prediction from one or several reference frames.
xi. The prediction for each macroblock may be based on one or two frames that have already been encoded and reconstructed (these may be past or future frames). The prediction macroblock P is then subtracted from the current macroblock to produce a residual or difference macroblock Dn.
xii. This macroblock is transformed and quantized to produce a set X of quantized transform coefficients.
xiii. The coefficients are reordered in zigzag and entropy encoded into a short bit string. This string together with side information for the decoder becomes the compressed stream which is passed to a network abstraction layer (NAL) for transmission outside the computer or proper storage.
xiv. NAL consists of units, each with a header and a raw byte sequence payload (RBSP) that can be sent as packets over a network or stored as records that constitute the compressed file.
i. The decoder inputs a compressed bit stream from the NAL.
ii. The first two steps (entropy decoding and reordering) produce a set of quantized coefficients X.
iii. Once these are rescaled and inverse transformed they result in a D’n identical to the D’n of the encoder).
iv. Using the header side information from the bit stream, the decoder constructs a prediction macroblock P, identical to the original prediction P created by the encoder.
v. In the next step, P is added to D’n to produce uF’n. In the final step, uF’n is filtered to create the decoded macroblockF’n.
vi. The reconstruction path in the encoder has an important task. It ensures that both encoder and decoder use identical reference frames to create the prediction P.
vii. It is important for the predictions P in encoder and decoder to be identical, because any changes between them tend to accumulate and lead to an increasing error or “drift” between the encoder and decoder.