written 7.7 years ago by |
In adaptive Huffman coding both the compression and the decompression start with an empty Huffman tree.
No symbols are assigned codes and every new symbol is treated as a leaf node with the same weight.
As new symbols are added, the tree is also updated such that the updated tree is also a Huffman tree.
The first symbol is written on the output stream as it is. This symbol is then added to the tree and a code is assigned to it.
The next time this symbol occurs, its current code is written on the output stream and its frequency is incremented by
Each time the symbols are processed, it has to be checked whether the tree satisfies the Huffman properties.
The Huffman properly is that if we scan the tree, the frequency of occurrence of symbols should decrease from right to left and from top to bottom i.e. the symbol on top right position will have the highest frequency and the one at the bottom left will have the lowest frequency.
This property is called as sibling property of Huffman tree.
Updating the Huffman tree:
The process of updating the tree starts always at the current node which is a leaf S with f as it frequency of occurrence.
Every iteration has three steps:
• Compare S to its successors in the tree from left to right and bottom to top. If the immediate successor has frequency (f + 1) or more then the nodes are still in sorted order and swapping is not required. Otherwise some successors of S have identical frequency f or smaller frequency. In such a case S should be swapped with the last node in this group.
• Increment the frequency from f to f + 1. Also increase the frequency of all its parents.
• If S becomes the root then the process stops otherwise the process repeats with the parent of node S.
Drawbacks of Adaptive Huffman coding:
Count overflow:
• The frequency counts are accumulated and this field can overflow. Normally the width of this field is 16 bits and can store a count up to 65535.
• The count of the root is monitored every time it is incremented.
• When the maximum count limit is reached, all the weights are revealed with an integer division by 2.
• This is actually done by performing an integer division only on the leaf nodes and updating the tree again.
• Sometimes it leads to violation of Huffman property and the tree needs to be updated again.
Code overflow:
• Code overflow when many symbols are added to the tree and the tree grows longer.
• The compressor has to find out the code for an input symbol S in the tree by linear search method.
• If S is found in the tree, the compressor moves from node S back to root thus building the code bit by bit.
• These bits have to be accumulated as they are transmitted in the reverse order.
• When the tree gets longer, the codes get longer and if the field size is exceeded, the program malfunctions.
• Another drawback of the Huffman coding is that the codes generated contain integer no. of bits which adds redundancy to the data.