written 2.9 years ago by | • modified 2.9 years ago |
Huffman Coding
- Huffman Coding is a Greedy Algorithm that uses the frequency or probability of characters for generating codes.
- Here, we use the frequency of characters to create a Huffman tree and find out the Huffman code for each character in the given string CONNECTION.
- It allows a Lossless Compression of data.
- It generates variable-length codes for all the different characters.
- To do this it uses variable-length encoding.
- This process is called as Huffman Encoding.
- The Time Complexity of Huffman Coding becomes $O(N log N)$, where N is the number of unique characters in the given string.
Prefix Rule:
- To prevent any ambiguities while decoding, Huffman Coding creates a rule called as a Prefix Rule.
- It ensures that the code assigned to any character is not a prefix of the code assigned to any other character.
Phases of Huffman Encoding:
Phase 1 - Build a Huffman Tree from the input characters of the string.
Phase 2 - Allocate the Huffman Code to each unique character by traversing the Huffman Tree.
Working of the Huffman Coding
The given String is CONNECTION
- The given string contains a total of 10 characters.
- As Huffman Coding technique is used to compress the string into a smaller size.
- To do this it first creates a Huffman Tree using the frequencies of each character and then generates code for each character.
Phase 1 – Huffman Tree Generation
Step 1 –
- Calculate the frequency of each character in the given string CONNECTION.
- To do this make each unique character of the given string as a leaf node.
- Leaf node of a character shows the frequency occurrence of that unique character. This is shown in the below figure.
Characters | C | O | N | E | T | I |
---|---|---|---|---|---|---|
Frequency | 2 | 2 | 3 | 1 | 1 | 1 |
Step 2 –
- Arrange the characters in Increasing order or Ascending order of their frequencies.
Therefore,
E | T | I | C | O | N |
---|---|---|---|---|---|
1 | 1 | 1 | 2 | 2 | 3 |
Step 3 –
Consider the first two nodes of the characters having minimum frequencies,
- Create a new internal node.
- The frequency of this new node is the sum of the frequencies of those two character nodes.
Step 4 –
- Keep repeating Step – 3 until all the nodes form a single tree.
- The tree obtained is the required Huffman Tree for the given string CONNECTION.
The figure shown below shows the step-by-step generation of the Huffman Tree for the given string CONNECTION:
Step 5 –
- Allocate weights to all the edges of the generated Huffman Tree based on the below rules.
Rule 1 - If assign weight ‘0’ to the left edges, then assign weight ‘1’ to the right edges.
Rule 2 - If assign weight ‘1’ to the left edges, then assign weight ‘0’ to the right edges.
- Any of the above two rules can be adopted.
- But must ensure that the same rule is also adopted at the time of decoding too, that is adopted at the time of encoding.
- Here, we use Rule 1 which means allocate weight ‘0’ to the left edges and weight ‘1’ to the right edges.
Now, the final Huffman Tree with allocated weights to the edges look as follows for the given String CONNECTION:
Phase 2 – Huffman Code Calculation
- To find the Huffman Code for each unique character of the given string, traverse the Huffman Tree from the root node to the leaf node of that character.
Therefore, the Huffman code for each unique character is as follows:
Character | Huffman Code |
---|---|
E | 1010 |
T | 1011 |
I | 100 |
C | 00 |
O | 01 |
N | 11 |