- Huffman algorithm or Huffman coding is an entropy encoding algorithm.
- It is used widely for data compression (like Winzip Compression-Winzip doesn’t use it but!)
- Huffman coding is used in JPEG compression.
- The key idea behind Huffman coding is to encode the most common characters using shorter strings of bits than those used for less common source characters.
- It works by creating a binary tree stored in an array.
- We also need to know the external path length (sum of all paths from root to external node) and internal path length (sum of all paths from root to internal node).
- The Huffman Algorithm
- Step 1: Create a leaf node for each character. Add the character and its weight or frequency of occurrence to the priority queue.
- Step 2: Repeat Steps 3 to 5 while the total number of nodes in the queue is greater than 1
- Step 3: Remove two nodes that have the lowest weight (or highest priority)
- Step 4: Create a new internal node by merging these two nodes as children and with weight equal to the sum of the two nodes' weights.
- Step 5: Add the newly created node to the queue.
So coding for C=00, D=01, A=100, B=101, C=11