The paper is concerned quite a lot with the question of sufficient statistics. An interesting illustration in the last paragraph (§6) concerns the problem of how to choose the best non-sufficient statistic when we must (for some extrinsic reason).
More specifically, the set up is the following: You are observing samples from a categorical distribution with four different categories, and you want to distinguish the two following hypotheses:
x | 1 | 2 | 3 | 4 |
Pr(x | A) | .25 | .25 | .25 | .25 |
Pr(x | B) | .50 | .30 | .10 | .10 |
However, you are not actually allowed to see the original data set; instead, you have to lump the categories into two groups, and you will then be told which group the observation fell. (This corresponds to using an indicator function as your statistic.)
In order to solve this exercise, they quantify the informativeness of the statistic in terms of the symmetrized divergence between the two hypotheses:
J(A, B) == D(A || B) + D(B || A).
Using base 10 logarithms, the divergence from the uniform to the skewed distribution (A to B) is .1039, and the divergence from the skewed to the uniform (B to A) is .0947. The symmetrized divergence J(A, B) is thus about .1986, and the minimum is .0947.
Grouping the categories in various ways, we can also find the following divergences:
Grouping | D(A || B) | D(A || B) | J(A, B) | min |
{}, {1, 2, 3, 4} | 0 | 0 | 0 | 0 |
{1}, {2, 3, 4} | .5068 | .0635 | .1193 | .0635 |
{2}, {1, 3, 4} | .0027 | .0028 | .0055 | .0027 |
{3}, {2, 3, 4} | .0401 | .0315 | .0716 | .0315 |
{4}, {2, 3, 4} | .0401 | .0315 | .0716 | .0315 |
{1, 2}, {3, 4} | .0969 | .0837 | .1806 | .0837 |
{1, 3}, {2, 4} | .0089 | .0087 | .0176 | .0087 |
{1, 4}, {2, 3} | .0089 | .0087 | .0176 | .0087 |
In the paper, Kullback and Leibler report that the best grouping is {{1}, {2, 3, 4}}, which indicates that they might have used D(A || B) as their objective function. This is a bit strange, since everything they say in the text indicates that they are thinking exclusively in terms of the symemtrized divergence J. I don't know what to conclude from that.