## Wednesday, March 20, 2013

### Bernardo: "Expected Information as Expected Utility" (1979)

Following a suggestion by Dennis Lindley (1956), this paper suggests that the appropriate measure of the expected value of an experiment X with respect to a target parameter Θ is the mutual information between the two, that is,
I(X;Θ)  =  H(Θ) – H(X,Θ).
Bernado calls this the "expected useful information" contained in the experiment (§2).

### Proper Scoring Rules

The paper also contains a uniqueness theorem about so-called proper scoring rules (§3–4).

A "scoring rule" is a scheme for rewarding an agent (the "scientist") who reports probability distributions to you. It may depend on the distribution and on the actual observed outcome. For instance, a feasible rule is to pay the scientist p(x) dollars for the density function p if in the event that x occurred.

That function, however, would under many common rationality assumptions give the scientist an incentive to misreport his or her actual probability estimates. We consequently define a "proper" scoring rule as one that is hack-proof in the sense that the best course of action under that rule is to report your actual probability estimates.

An example of a proper scoring rule is –log p(x), but apparently, there are others. Barnardo refers to Robert Buehler amd I. J. Good's papers in Foundations of Statistical Inference (1971) for further examples. Unfortunately, that book seems to be a bit difficult to get a hold of.

### Nice and Proper Scoring Rules

The theorem that Bernardo proves is the following: The only proper scoring rule which are both smooth and local (as defined below) are functions of the logarithmic form
u(p,x)  =  a log p(x) + b(x)
where a is a constant, and b is a real-valued function on the sample space.

As a corollary, the scientist's optimal expected payoff is H(X) + B/a, where B is the average value of the function b under the scientist's subjective probabilities. It also follows that the the optimal course of action for the scientist under this scheme will be to provide the maximum amount of information that is consistent with his or her beliefs.

So what does "smooth" and "local" mean?

Bernardo doesn't define "smooth," but usually in real analysis, a smooth function is one that can be differentiated indefinitely often. However, Bernardo refers to the physics textbook by Harold and Bertha Jeffreys (1972) for a definition. I don't know if they use the word the same way.

A scoring rule u is "local" if the reward that the scientist receives in event the event of x only depends on x and on the probability that he or she assigned to x. In other words, a local scoring rule u can be rewritten in terms of a function v whose first argument is a probability rather than a probability distribution:
u(p,x)  =  v(w,x),
where w = q(x), the reported probabilityof x (which does not necessarily equal the actual subjective probability p).

### How To Prove This Theorem

I haven't thought too hard about the proof, but here's the gist that I got out of it: First, you use the the method of Lagrange multipliers to show that the w-derivative of the function
v(w,x) p(x) dx  –  λ ( w dx – 1)
is zero for all w when a function q is optimal. You then conclude that q = p fulfills this condition, since u was assumed to be a proper scoring rule. You then have a differential equation on your hands, and you go on to discover that its only solutions are of the postulated form.