Explain MP-III audio compression standard with a neat block diagram.

57views

written 7.9 years ago by

teamques10 ★ 69k

The principle of MPEG audio compression is quantization. The values being quantized however are not the audio samples but numbers (called signals) taken from the frequency domain of the sound.
The fact that the compression ratio (or equivalently bit rate) is known to the encoder means that the encoder knows at any time how any bits it can allocate to the quantized signals.
Thus the (adaptive) bit allocation algorithm is an important part of the encoder. This algorithm uses the known bit rate and the frequency spectrum of the most recent audio samples to determine the size of the quantized signals such that the quantization noise (the difference between an original signal and a quantized one) will be inaudible.
The psychoacoustic models use the frequency of the sound that is being compressed but the input stream consists of audio samples not sound frequencies.

enter image description here

The frequency has to be computed from the samples. This is why the first step in MPEG audio encoding is a discrete Fourier transform, where a set of S12 consecutive audio samples is transformed to the frequency domain.
Since the number of frequencies can be huge, they are grouped into 32 equal width frequency sub bands (Layer III uses different numbers but the same principle).
For each sub band, a number is obtained that indicates the intensity of the sound at the sub-band’s frequency range. These numbers (called signals) are then quantized. The coarseness of the quantization in each sub-band and by the number of bits still available to the encoder.
The masking threshold is computed for each sub band using psychoacoustic model. MPEG uses psychoacoustic models to implement frequency masking and temporal masking.
Each model describes how loud sound masks other sounds that happen to be close to it in frequency or in time. The model partitions the frequency range into 24 critical bands and specifies how masking effects apply within each band.
The masking effects depend of course on the frequency and amplitude of the tones. When the sound is decompressed and played, the user (listener) may select any playback amplitude which is why the psychoacoustic model has to be designed for the worst case.
The masking effects also depend on the nature of the source of the sound being compressed. The two psychoacoustic models employed by MPEG are based on experimental work done by researchers over many years.
The decoder must be fast, since it may have to decode the entire movie (video and audio) at real time so it must be simple. As a result it does not use any psychoacoustic model or bit allocation algorithm.
The compressed stream must therefore contain all the information that the decoder needs for de-quantizing the signals.
This information must be written by the encoder on the compressed stream and it constitutes overhead that should be subtracted from the number of remaining available bits.
The auxiliary data is user-definable and would normally consist of information related to specific applications. This data is optional.

ADD COMMENT EDIT