written 6.2 years ago by | modified 6.2 years ago by |
Huffman algorithm or Huffman coding is an entropy encoding algorithm.
It is used widely for data compression (like Winzip Compression-Winzip doesn’t use it but!)
Huffman coding is used in JPEG compression.
The key idea behind Huffman coding is to encode the most common characters using shorter strings of bits than those used for less common source characters.
It works by creating a binary tree stored in an array.
We also need to know the external path length (sum of all paths from root to external node) and internal path length (sum of all paths from root to internal node).
The Huffman Algorithm
Step 1: Create a leaf node for each character. Add the character and its weight or frequency of occurrence to the priority queue.
Step 2: Repeat Steps 3 to 5 while the total number of nodes in the queue is greater than 1
Step 3: Remove two nodes that have the lowest weight (or highest priority)
Step 4: Create a new internal node by merging these two nodes as children and with weight equal to the sum of the two nodes' weights.
Step 5: Add the newly created node to the queue.
Example