written 2.7 years ago by |
Huffman encoding is an entropy encoding algorithm developed by David A Huffman that is widely used as lossless data comparison technique.
The Huffman coding algorithm uses a variable-length code table to encode a source character, where the variable-length code table is derived in a particular way based on the estimated probability of occurrence for each possible value of the source character.
The algorithm works by creating a binary tree of nodes that are stored in a regular array.
The size of this array depends on the number of nodes in the tree.
A node can either be a leaf node or an internal node. initially, all the nodes in the tree are at the leaf level and store the source character and its frequency of occurrence.
While the internal nodes are used to store the weight and contains links to ots child nodes, the external node contains the actual character.
Conventionally , a '0' represents following the left child and a '1' represents following the right child.
A finished tree has n leaf nodes will have n - 1 internal nodes.
Algorithm:
Step 1: create a leaf node for each character, add the character and its weight or frequency of occurrence to the priority queue.
Step 2 : Repeat step 3 and 5 while the total number of nodes in the queue is greater than 1.
Step 3: Remove 2 nodes that have the lowest weight (or highest priority) and with weight equal to the sum of the 2 nodes weight.
step 4: Add the newly created node to the queue.
Example: Creating Huffman tree for "Maharashtra"
Frequency.
m = 01
a = 04
h = 02
r = 01
s = 01
t = 01
Huffman code for "Maharashtra" = 01011101110011011101000011