Notebooks on Language: complexity

Showing posts with label complexity. Show all posts

Monday, December 17, 2012

Wrighton: Elementary Principles of Probability and Information (1973)

This is a strange little book. It is essentially an introduction to information theory, but with a bunch of digressions into all sorts of philosophical and methodological issues. It's not always completely clear to me where Wrighton is going with his arguments, but some of them are quite thought-provoking.

The Reshuffled Hierarchy of Science

The book begins with a discussion of what probability is, using the philosophy of Giambattista Vico as a starting point.

A core claim of Vico's philosophy is, according to Wrighton, that the sciences should not be sorted on a scale with mathematics and theoretical physics in one end, and social and human science in the other. Rather, one should categorize them according to the artificiality of their objects:

Mathematics retains a special position, since in mathematics Man creates the object of his study, which therefore he wholly understands. Likewise, Man may hope to acquire an understanding of comparable depth within the humanities; for he has created his own history, his own language and his own social environment. The Natural Sciences suffer a demotion, relative to the ideal of certainty, and revert to the status which Bacon accorded them; experiment and observation provide, and must always provide the true foundation of physics, since, as Vico puts it, Man can never wholly understand what God has created; physical theory primarily reflects, or epitomises, man's increasing control over his physical environment, rather than an independent reality. (p. 3)

A crude way of putting the same point would be that math is a human science. Whatever is conceptual, mental, or cultural falls on one end of of the scale, and the study of natural phenomena falls on the other.

Probability as Deliberately Created Uncertainty

Once we have reshuffled the sciences like this, we have to decide whether we categorize probability theory as a study of "Man's creation" or of "God's creation." Here, Wrighton squarely comes down on the side of the first team:

It is sometimes said to be a mere empirical fact that when a coin is tossed it comes down heads with probability one half; and that a suitably-devised machine could toss coins so that they came down heads every time. The suggestion is based on a total misconception. A coin comes down heads with probability on half because we arrange that it should do so: the empirically-verifiable consequences cannot be other than they are. If an operator, within the terms of our instructions to him, were to train himself to toss a coin so that it always came down heads, we should have to regard our instructions as misconceived, and would either have to raise the minimum angular momentum assigned or supply him with a smaller or lighter coin: it is a matter of making the task implicitly assigned to the operator sufficiently difficult. Thus we cannot think of a random event without somehow involving a human being in its generation. (p. 3; his emphasis)

So "real" probability is "artificial" probability; a random experiment is an experiment that allows us to say that something went wrong if it is predictable. Only metaphorically can we transfer this artificially created complexity to natural systems.

Points of Contact

I find this idea interesting for two reasons:

First, it turns information theory upside-down so that system complexity becomes more fundamental than probability. This is an interesting idea also championed by Edwin Jaynes, and which I have been exposed to through Flemming Topsøe.

And second, it relates the philosophical problems of probability theory to the thorny issues surrounding the notions of repetition, identity, rule-following, and induction. It is probability fair to say that one can't solve any of these problems without solving the others as well.

Friday, May 4, 2012

Topsøe: "Game Theoretical optimization inspired by information theory" (2009)

I finally got around to reading Flemming's recent paper on information theory. It's a short introduction to his and Peter Harremoës' approach to the topic, which involves a game-theoretical interpretation of coding.

The Nature-Observer Game

The basic theoretical framework is the following: Nature and Observer play a zero-sum game. Nature picks a world from the set X, and Observer picks an action from the set Y. For each strategy profile (x,y), three quantities are then defined:

H(x), the entropy of x
D(x,y), the (Kullback-Leibler) divergence of x from y
Φ(x,y), the complexity of the pair (x,y)

The three quantities are related through the equality

Complexity = Entropy + Divergence

which also implies that the smallest complexity that Observer can achieve in a world x is the entropy H(x).

The point of the game is for nature to produce maximal complexity, and for Observer to produce minimal complexity. By Von Neumann and Morgenstern's minimax theorem, this means that a strategy profile (x*,y*) is an equilibrium for the game if the following three quantities coincide (p. 559):

sup_x inf_y Φ(x,y) = Φ(x*,y*) = inf_y sup_x Φ(x,y)

The leftmost term here designates the optimal outcome for Nature (highest complexity given averse reponses), while the rightmost term designates the optimal outcome for Observer (lowest complexity given averse responses).

Notice that inf_y Φ(x,y) = H(x), since D(x,y) is assumed to be positive. This means that Nature in effect can be seen as an entropy-maximizer when playing optimally. Further, Topsøe defines R(y) = sup_x Φ(x,y) to be the risk associated with a given action y, so Observer can then be described as a risk-minimizer.

Information Transmission

The paper also defines a notion of information transmission rate (p. 556), but I am not quite sure about its applications. But this is the idea behind the concept, in bullet-point form:

Assume that α is a (prior) probability distribution on the set of worlds.
Construct an expected case scenario by summing all worlds with the probabilities of α as weights.
Let y be a best response to this expected case.
For each world x, let the surprisal of x be the divergence between x and y. This can be seen as a motivated quantitative measure of how different x is from the expected case.
Find the weighted average of the surprisals, using the probabilities of α as weights.

This average surprise level is the information transmission rate.

Notice that if there is a single action which is an optimal response to all worlds, then the information transmission rate is 0. This reflects the fact that the state of the world would not inform the actions of Observer at all in such a situation. He would simply use the same strategy whatever happened.

Reversely, if the information transmission rate is very high, this means that an insensitive, on-average response pattern will be costly for the Observer (in terms of complexity). If we reinterpret the action of choosing a best response to a certain world as an act of inferring the state of the world (describing it correctly), then the surprisal can further be seen as Observer's ability to distinguish states of the world. On such an interpretation, the best-response function can be seen as a signaling function in a Bayesian game.

Subscribe to: Posts ( Atom )