written 6.2 years ago by | • modified 6.2 years ago |
Shannon-Fano Algorithm
The Shannon-Fano algorithm was independently developed by Shannon at Bell Labs and Robert Fana at MIT. To illustrate the algorithm, let's suppose the symbols to be coded are the characters in the word HELLO. The frequency count of the symbols is
The encoding steps of the Shannon-Fano algorithm can be presented in the following
Top-down manner:
Sort the symbols according to t~e frequency count of their occurrences.
Recursively divide the symbols into two parts, each with approximately the same number of counts, until an parts contain only one symbol. A natural way of implementing the above procedure is to build a binary tree. As a convention, let's assign bit 0 to its left branches and 1 to the right branches.
Initially, the symbols are sorted as LHEO. As Figure 7.3 shows, the first division yields two parts: (a) L with a count of2, denoted as L:(2); and (b) H, E and 0 with a total countof 3, denoted as H,E,0:(3). The second division yields H:(l) and E,O:(2). The last divisionis E:(1) and 0:(1).
Huffman Coding
In contradistinction to Shannon-Fano, which is top-down, the encoding steps of the Huffman algorithm are described in the following bottom-up manner. Let's use the same example word, HELLO. A similar binary coding tree will be used as above, in which the left branches are coded 0 and right branches 1. A simple list data structure is also used.
Algorithm:
Initialization; put all symbols on the list sorted according to their frequency counts.
Repeat until the list has only one symbol left.
From the list, pick two symbols with the lowest frequency counts. Form a Huffman subtree that has these two symbols as child nodes and create a parent node for them.
Assign the sum of the children's frequency counts to the parent and insert it into the list, such that the order is maintained.
Delete the children from the list.
Assign a codeword for each leaf based on the path from the root.
Adaptive Huffman Coding
The Huffman algorithm requires prior statistical knowledge about the information source, and such information is often not available. This is particularly true in multimedia applications, where future data is unknown before its arrival, as for example in live (or streaming) audio and video.
Procedures for Adaptive Huffman Coding
lnitial_code assigns symbols with some initially agreed-upon codes, withoutany prior knowledge of the frequency counts for them. For example, some conventional code such as ASCII may be used for coding character symbols.
update_tree is a procedure for constructing an adaptive Huffman tree. It basically does two things: it increments the frequency counts for the symbols (including any new ones), and updates the configuration of the tree.
The Huffman tree must always maintain its sibling property that is, all nodes(internal and leaf) are arranged in the order of increasing counts. Nodes are numbered in order from left to right, bottom to top.
If the sibling property is about to be violated, a swap procedure is invoked to update the tree by rearranging the nodes. When a swap is necessary, the fatthest node with count N is swapped with the node whose count has just been increased to N + 1. Note that if the node with count N is not a leaf-node ~ it is the root of a subtree - the entire subtree will go with it during the swap.
The encoder and decoder must use exactly the same lnitial_code and update_tree routines.