## Monday, March 2, 2015

### Fano: The Transmission of Information (1949)

There are three different coding methods associated with Claude Shanon's name:
These are not the same. Even more confusingly, Shannon-Fano-Elias coding is in fact much closer in kind to Shannon coding than it is to Shanon-Fano coding, making the names even more misleading. Moreover, neither Fano nor Elias published any conventional research articles on the topic, and Elias may not even have invented Elias coding. I'll try to sort out these issues now.

### Downwards Interval Approximations

Shannon coding is the name for the data compression algorithm that Shannon exhibited as his second and more constructive proof of the source coding theorem (his Theorem 9, page 17 in the Bell Labs transcription). It derives the code words from binary approximations to cumulative probabilities.

The process is the following:
1. Sort the event probabilities so that $p_1 \geq p_2 \geq \ldots \geq p_n$;
2. Let $F_m = \sum_{i=1}^{m-1}p_i$ be the cumulative probability of the most probable events (up to, but not including, the $m$th).
3. Let $K_m=\lceil{-\log_2{p_m}}\rceil$.
4. Let the code for event number $m$ be the first $K_m$ binary digits of $F_m$ (i.e., round $F_m$ down to the nearest multiple of $2^{-K_m}$).
As an example, with the probabilities
$$p = (.4, .3, .2, .1),$$Shannon coding produces the codewords
$$(00, 01, 101, 1110).$$Shanon coding only produces a prefix-free code when the probabilities are in fact sorted. If we think of the probabilities $p_1,p_2,\ldots,p_n$ as a stack of intervals filling out the space between 0 and 1, then the rounding-off corresponds to approximating $[F_m,F_{m+1}]$ by an interval whose left-hand boundary is a binary number smaller than $F_m$, and whose right-hand boundary is a binary number of the same precision which lies below $F_{m+1}$. If wider intervals follow narrower ones, the corresponding increase in coarseness can cause overlaps between the approximations.

### Inner Interval Approximations

Shannon-Fano-Elias coding is a variation on this theme which does not require the codewords to be sorted. According to Cover and Thomas' textbook (Chapter 5), it was developed by Peter Elias but never published. However, this other textbook claims that Elias denied this, and that the idea perhaps came from a talk Shannon gave at the MIT.

At any rate, the idea behind the method is the following:
1. Let $F^*_m = \sum_{i=1}^{m-1}p_i+\frac{1}{2}p_m$ be the point halfway between $F_m$ and $F_{m+1}$.
2. Let $K_m=\lceil{-\log_2{p_m}}\rceil + 1$.
3. Let the code for event number $m$ be the first $K_m$ binary digits of $F^*_m$.
Since these codewords are one bit longer than the ones used in Shannon coding, we can dispense with the requirement of using sorted probabilities. Instead, we use a binary fraction that is guaranteed to lie completely inside the interval between $F_m$ and $F_{m+1}$, even if the binary expansion continues with $1111\ldots$ after our selected truncation point.

With the example above, $(.4, .3, .2, .1),$ this gives the codewords
$$(001, 100, 1100, 11110).$$These are exactly one bit longer than the Shannon codewords.

### Recursive Batch Splitting

Shannon-Fano coding is a different approach to the problem. Instead of using binary approximation, it uses a recursive, top-down method to construct a binary tree:
1. If there is only one event on the list, return the empty codeword;
2. Otherwise, sort the event probabilities so that $p_1 \geq p_2 \geq \ldots \geq p_n$;
3. Of the $n-1$ possible ways of splitting this list up into a head and a tail, select the one that makes the two lists most equal in probability.
4. Recursively use Shannon-Fano coding to assign codewords to each of those two lists;
5. Prepend a 0 to all the codewords in the first list and a 1 to the codewords on the second.
For the example above, this gives
$$(0, 10, 110, 111).$$Notice that the codeword lengths here differ from the ones produced by the Shannon code.

Shannon-Fano coding was invented by Robert Fano at some point in the 1940s and described in his 1949 technical report "Transmission of Information" (published by the MIT). This report is hard to come by, but an akwardly transcribed .pdf without pictures is available here.

An number of excerpts and scanned figures from the report are available in these teaching materials. I'll just paste the three relevant figures here so that they don't get lost in the desert of the internet: