## Friday, May 4, 2012

### Topsøe: "Game Theoretical optimization inspired by information theory" (2009)

I finally got around to reading Flemming's recent paper on information theory. It's a short introduction to his and Peter Harremoës' approach to the topic, which involves a game-theoretical interpretation of coding.

### The Nature-Observer Game

The basic theoretical framework is the following: Nature and Observer play a zero-sum game. Nature picks a world from the set X, and Observer picks an action from the set Y. For each strategy profile (x,y), three quantities are then defined:
• H(x), the entropy of x
• D(x,y), the (Kullback-Leibler) divergence of x from y
• Φ(x,y), the complexity of the pair (x,y)
The three quantities are related through the equality
Complexity = Entropy + Divergence
which also implies that the smallest complexity that Observer can achieve in a world x is the entropy H(x).

The point of the game is for nature to produce maximal complexity, and for Observer to produce minimal complexity. By Von Neumann and Morgenstern's minimax theorem, this means that a strategy profile (x*,y*) is an equilibrium for the game if the following three quantities coincide (p. 559):
supx infy Φ(x,y) = Φ(x*,y*) = infy supx Φ(x,y)
The leftmost term here designates the optimal outcome for Nature (highest complexity given averse reponses), while the rightmost term designates the optimal outcome for Observer (lowest complexity given averse responses).

Notice that infy Φ(x,y) = H(x), since D(x,y) is assumed to be positive. This means that Nature in effect can be seen as an entropy-maximizer when playing optimally. Further, Topsøe defines R(y) = supx Φ(x,y) to be the risk associated with a given action y, so Observer can then be described as a risk-minimizer.

### Information Transmission

The paper also defines a notion of information transmission rate (p. 556), but I am not quite sure about its applications. But this is the idea behind the concept, in bullet-point form:
• Assume that α is a (prior) probability distribution on the set of worlds.
• Construct an expected case scenario by summing all worlds with the probabilities of α as weights.
• Let y be a best response to this expected case.
• For each world x, let the surprisal of x be the divergence between x and y. This can be seen as a motivated quantitative measure of how different x is from the expected case.
• Find the weighted average of the surprisals, using the probabilities of α as weights.
This average surprise level is the information transmission rate.

Notice that if there is a single action which is an optimal response to all worlds, then the information transmission rate is 0. This reflects the fact that the state of the world would not inform the actions of Observer at all in such a situation. He would simply use the same strategy whatever happened.

Reversely, if the information transmission rate is very high, this means that an insensitive, on-average response pattern will be costly for the Observer (in terms of complexity). If we reinterpret the action of choosing a best response to a certain world as an act of inferring the state of the world (describing it correctly), then the surprisal can further be seen as Observer's ability to distinguish states of the world. On such an interpretation, the best-response function can be seen as a signaling function in a Bayesian game.