Showing posts with label reference class problem. Show all posts
Showing posts with label reference class problem. Show all posts

Wednesday, April 2, 2014

Carnap and Jeffrey: Studies in Inductive Logic and Probability, Vol. 1 (1971)

Carnap at his desk; from carnap.org.
This is an anthology edited by Rudolf Carnap and philosopher Richard C. Jeffrey (not to be confused with physicist Harold Jeffreys).

The majority of the book is dedicated to two essays on probability which Carnap intended to be a substitute for the (never realized) second volume of the Logical Foundations of Probability (1950). Carnap's idea is that rational belief should be understood as the result of probabilistic conditioning on a special kind of "nice" prior.

An Inconsistent Axiomatization of Rationality

In order to demarcate the realm of rational belief, Carnap has to specify the set of permitted starting states of the system and its update rules. He does so by means of the following four "rationality assumptions":
  1. Coherence — You must conform to the axioms of probability; or in terms of gambling, you may not assign positive utility to any gamble that guarantees a strict loss.
  2. Strict Coherence — You may not assign an a priori probability of 0 to any event; or equivalently, you may not assign positive utility to a gamble that renders a strict loss possible and a weak loss necessary.
  3. Belief revision depends only on the evidence — Your beliefs at any time must be determined completely by your prior beliefs and your evidence (nothing else). Assuming axiom 1 is met, this comes down to producing new beliefs by conditioning.
  4. Symmetry — You must assign the same probability to propositions of the same logical form, i.e., F(x) and F(y).
These axioms are inconsistent in a number of cases, and Carnap does not seem to realize. The problems are that
  • Many infinite sets cannot be equipped with a proper, regular, and symmetric distribution. For instance, there is no "uniform distribution on the integers";
  • There may be interdependent propositional functions in the language, and a prior that renders one symmetric might render another asymmetric. Consider for instance F(x) = "the box has a side-length between x and x + 1" and G(x) = "the box has a volume between x and x + 1".
Maybe Carnap had a vague idea about the first problem — at least he seems to assume that the sample space is finite throughout the first essay ("Inductive Logic and Rational Decisions," cf. pp. 7 and 14).

In the second essay, however, he explicitly says that there are countably many individuals in the language, so it would seem that he owes us a proper, coherent, and regular distribution on the integers ("A Basic System of Inductive Logic, Part I," ch. 9, p. 117).

Both Jaynes and Jeffreys made attempts at tackling the second problem by choosing priors that would decrease the tension between two descriptions. Jeffreys, for instance, showed that a probability density function of the form f(t) = 1/t (restricted to some positive interval) makes it irrelevant whether a normal distribution is described in terms of its variance or its precision parameter. Jaynes, by an essentially identical argument, "solved" Bertrand's paradox by choosing a prior that minimizes the discrepancy between a side-length description and a volume-description.

What is a Rationality Assumption?

Carnap knows that probability theory has to be founded on something other than probability theory to make sense and explains that "the reasons for our choice of the axioms are not purely logical." (p. 26; his emphasis).

Rather, they are game-theoretic: In order to argue against the use of some a priori probability measure (or "M-function"), Carnap must show why somebody starting from this prior
…, in a certain possible knowledge situation, would be led to an unreasonable decision. Thus, in order to give my reasons for the axiom, I move from pure logic to the context of decision theory and speak about beliefs, actions, possible losses, and the like. (p. 26)
That sounds circular, but the rest of his discussion seems to indicate that he is thinking about worst-case (or minimax) decision theory, which makes sense.

"Reduced to one"

What does not make sense, however, is his unfounded faith that there are always reasons to prefer one M-function over another:
Even on the basis of all axioms that I would accept at the present time , the number of admissible M-functions, i.e., those that satisfy all accepted axioms, is still infinite; but their class is immensely smaller than that of all coherent M-functions [i.e., all probability measures]. There will presumably be further axioms, justified in the same way by considerations of rationality. We do not know today whether in this future development the number of admissible M-functions will always remain infinite or will become finite and possibly even be reduced to one. Therefore, at the present time I do not assert that there is only one rational Cr0-function [= initial credence = credence at time 0]. (p. 27)
But clearly, he hopes so.

Carnap the Moralist

Interestingly, Carnap makes a very direct connection between moral character and epistemic habits. This comes out most clearly in a passage in which he explains that rationality is a matter of belief revision rather than belief:
When we wish to judge the morality of a person, we do not simply look at some of his acts; we study rather his character, the system of his moral values, which is part of his utility function. Observations of single acts without knowledge of motives give little basis for judgment. Similarly if we wish to judge the rationality of a person's beliefs, we should not look simply at his present beliefs. Information on his beliefs without knowledge of the evidence out of which they arose tells us little. We must rather study the way way in which the person forms his beliefs on the basis of evidence. In other words, we should study his credibility function, not simply his present credence function. (p. 22)
The "Reasonable Man" (to use the 18th century terminology) is thus the man who updates his beliefs in a responsible, careful, and modest fashion. Lack of reason is the stubborn rejection of norms of evidence, a refusal to surrender to the "truth cure."

As an illustration of what he has in mind, Carnap considers an urn example in which a person X observes a majority of black balls being drawn (E), and Y observes a majority of white balls (E'). He continues:
Let H be the prediction that the next ball drawn will be white. Suppose that for both X and Y the credence of H is 2/3. Then we would judge this same credence value 2/3 of the proposition H as unreasonable for X, but reasonable for Y. We would condemn a credibility function Cred as nonrational if Cred(H | E) = 2/3; while the result Cred(H | E') = 2/3 would be no ground for condemnation. (p. 22)
So although he elsewhere argues that rationality is a matter of risk minimization, he nevertheless falls right into the moralistic language of "grounds for condemnation."

Do the Robot

A similar formulation appears earlier, as he discusses the axiom that belief revision is based on evidence only. For a person satisfying this criterion, Carnap explains,
… changes in his credence function are influenced only by his observational results, but not by any other factors, e.g., feelings like his hopes or fears concerning a possible future event H, feelings that in fact often influence the beliefs of all actual human beings. (pp. 15–16)
 Like Jaynes, he defends this idealization by reference to a hypothetical design problem:
Thinking about the design of a robot might help us in finding rules of rationality. Once found, these rules can be applied not only in the construction of a robot but also in advising human beings in their effort to make their decisions as rational as their limited abilities permit. (p. 17)
Another way of saying the same thing is that we should first describe the machine that we would want to do the job, and then tell people how to become more like that machine.

Wednesday, March 26, 2014

von Mises: Probability, Statistics and Truth (1951), ch. 1

Richard von Mises; from Wikipedia.
Richard von Mises was an important proponent of the frequentist philosophy of probability.

In his book Probability, Statistics and Truth, he militates against the use of the word "probability" for anything other than indefinitely repeatable experiments with converging relative frequencies (pp. 10–12). He also compares probabilities to physical constants like the velocity of a molecule (p. 21) and asserts that the law of large numbers is an empirical generalization comparable to physical laws like the conservation of energy (pp. 16, 22, and 26).

Reference Class Relativism

A consequence of this frequentist notion of probability is that specific events do not have probabilities. Only infinite classes of comparable events can have probabilities.

For instance, when your coin comes up heads at 10 o'clock, that's a different event from the coin coming up heads at 11 o'clock in infinitely many ways. Only because you choose which properties from the situation to select can you identify the two events as equivalent.

As a kind of argument for this reference class relativism, von Mises asserts that a specific person has a different probability of dying depending on the reference class (e.g., people over 40, men over 40, male smokers over 40, etc). We thus have to explicitly select the reference class before we can talk about "the" probability.

He comments:
One might suggest that a correct value of the probability of death for Mr. X may be obtained by restricting the collective to which he belongs as far as possible, by taking into consideration more and more of his individual characteristics. There is, however, no end to this process, and if we go further and further into the selection of the members of the collective, we shall be left finally with this individual alone. (p. 18)
Even from a frequentist perspective, I'm not sure this makes sense. The fact that we have narrowed down our reference class so much that there is only a single real person left in it should not change the fact that we still have an intensional definition of the class. In so far we do, we should be able to apply that definition to the outcome of any sequence of candidates, like an infinite sequence of people or experiments. In reality, it is only data sparsity that keeps use from going "further and further."

So I think von Mises has a theoretical choice to make: Either, he must require that reference classes be actually infinite, or he must merely require that they be potentially infinite.

"Randomness" and Insensitivity to Subsequence Selection

Von Mises spends a large part of the lecture elaborating a notion of "randomness" which is intended to capture the difference between asymptotically i.d.d. sequences and not asymptotically i.d.d. sequences with the same limiting frequencies. He does so by adding the requirement that the limiting frequencies are independent of subsequence selection.

A possibly more intuitive way of stating that definition would be in terms of a Topsøe-style game: A structure-finding player is tasked to pick infinitely many places in a sequence based on past data and is rewarded when the empirical frequencies fails to converge to a given distribution; a structure-hiding player is tasked to select the sequence and is rewarded when the frequencies do converge to the given distribution.

If the structure-hider then introduces any systematic dependence between the experiments, the structure-finder can exploit these regularities to outgamble the structure-hider. Thus, only asymptotically i.d.d. sequences are part of an equilibrium.

I haven't checked the details, but this game seems to be the same as that suggested by Shafer and Vovk, although (if I remember correctly), they only consider fair (that is, maximum-entropy) i.i.d. coins, not arbitrary biases. But at any rate, coin flipping is, like distributions on a finite set, one of the cases in which there is a maximum entropy distribution even in the absence of an externally given mean.