written 6.2 years ago by |
Extended Huffman Coding:
In applications where the alphabet size is large, pmax is generally quite small, and the amount of deviation from the entropy, especially in terms of a percentage of the rate, is quite small.
However, in cases where the alphabet is small and the probability of occurrence of the different letters is skewed, the value of pmax can be quite large and the Huffman code can become rather inefficient when compared to the entropy.
To overcome this inefficiency we use adaptive Huffman coding, the same can be illustrated with the help of following example:
Consider a source that puts out iid letters from the alphabet A = {a1, a2, a3} with the probability model P(a1) = 0.8, P(a2) = 0.02, and P(a3) = 0.18. The entropy for this source is 0.816 bits/symbol. A Huffman code for this source is shown in Table below
TABLE 1: The Huffman code.
Letter | Codeword |
---|---|
a1 | 0 |
a2 | 11 |
a3 | 10 |
The average length for this code is 1.2 bits/symbol. The difference between the average code length and the entropy, or the redundancy, for this code is 0.384 bits/symbol, which is 47% of the entropy. This means that to code this sequence we would need 47% more bits than the minimum required.
Now for the source described in the above example, instead of generating a codeword for every symbol, we will generate a codeword for every two symbols. If we look at the source sequence two at a time, the number of possible symbol pairs, or size of the extended alphabet, is 32 = 9. The extended alphabet, probability model, and Huffman code for this example are shown in Table below
TABLE 2: The extended alphabet and corresponding Huffman code.
Letter | Probability | Code |
---|---|---|
a1a1 | 0.64 | 0 |
a1a2 | 0.016 | 10101 |
a1a3 | 0.144 | 11 |
a2a1 | 0.016 | 10100 |
a2a2 | 0.0004 | 10100101 |
a2a3 | 0.0036 | 1010011 |
a3a1 | 0.1440 | 100 |
a3a2 | 0.0036 | 10100100 |
a3a3 | 0.0324 | 10100 |
The average codeword length for this extended code is 1.7228 bits/symbol. However, each symbol in the extended alphabet corresponds to two symbols from the original alphabet.
Therefore, in terms of the original alphabet, the average codeword length is 1.7228/2 = 0.8614 bits/symbol.
This redundancy is about 0.045 bits/symbol, which is only about 5.5% of the entropy.
Advantage of extended Huffman coding
We see that by coding blocks of symbols together we can reduce the redundancy of Huffman codes.