Notebooks on Language: July 2012

Monday, July 30, 2012

Judea Pearl: Causality (2000)

This book serves two puposes: It's a textbook in Bayesian statistics, and it launches a theory of causality which contrasts with Pearl's own earlier position on the topic.

The Meaning of Causation

His new theory introduces a meaningful distinction between causality and correlation by internalizing the concept of an intervention. The idea is that information only propagates downstream after an intervention, while it propagates both upstream and downstream after an observation.

For example, if I observe that the street is wet, I consider both rain and wet shoes more likely. On the other hand, if I make the streets wet (say, by emptying a bucket of water), my subjective probability of rain remains unchanged, while I still consider wet shows more likely.

In section 7.2.1, he applies this idea to the example of price regulation. Price and demand are mutually dependent, but observing a price at a specific level and controlling the price does not have the same effect on the demand.

The Stuff of Semantics

The methods that Pearl discusses have a surprisingly logical flavor, given that the book sells itself as a kind of applied statistics. He is quite keen on presenting his probability calculus as a semantics defined on a set of logical expressions (or queries).

The structures that are used to evaluate sentences in this logic are causal models (arrow-drawings expressing a hypothesis about what affects what) and, more specifically, settings of specific variables in such models. This corresponds roughly to worlds and propositional truth values in modal logic.

Within a causal model (a bunch or arrows connecting the variables in various ways), one can thus be right or wrong about specific probability assignments. But above that, a causal model can also be consistent or inconsistent with a specific probability distribution.

Consider for instance this Bayes net:

With the probability tables below, this is consistent with the distribution P, but not with the distribution Q:

x	y	z	P(x,y,z)	Q(x,y,z)
0	0	0	1/4	1/3
0	0	0	–	–
0	0	0	–	–
0	0	0	1/4	–
0	0	0	–	–
0	0	0	1/4	1/3
0	0	0	–	–
0	0	0	1/4	1/3

Very frequently, we are in practice interested in answering questions within a fixed causal model rather than saying things that are true for any probability distribution.

The Meaning of Counterfactuals

As I said above, Pearl's logical language allows both for regular conditioning, X | Y, and for intervention, do(Y).

While the former is defined as usual, the latter has an original meaning: The result of updating with do(Y = y) is that we delete all incoming arrows to the node Y and replace it by the value Y. This contrasts with simply conditioning on Y, in which case the causal skeleton remains intact.

Interestingly, Pearl uses his new operator to define the meaning of counterfactual statements. "If X were the case, then Y" is defined in terms of the intervention do(Y), while "Y given X" is defined in terms of conventional conditioning.

For some reason, he requires conditioning to take place before intervention (cf., e.g., p. 206). This makes a mathematical difference in some cases, but I'm not sure whether that difference has any interesting philosophical or linguistic counterpart. An example where there is a difference is the Bayes net below, supposing that X is a coin flip, Y = X, and Z = Y:

In this causal model, we have

P( "Given Z, X if Y were true" ) = 1.

However,

P( "If Y were true, X given Z" ) = ½.

Maybe that's a desirable quality of a probabilistic logic. I don't know.

Inside and Outside the Model

Counterfactuals have often been criticized in the literature for being empirically meaningless and hence a pollutant in science. However, the semantics introduced by Pearl gives them a specific meaning and allows us in principle to evaluate them. But it's interesting that he notes that they convey information more about the underlying causal model than about empirical values of the variables (see p. 219).

A speaker uttering a counterfactual will thus not communicate something about (irrelevant, non-existent) states of affairs, but about how the world works. In a sense, a sentence like "If you had ... " is then a statement about everything that is not mentioned in the sentence.

This is quite important, also because of the framing problem that Pearl hardly even recognizes the existence of: Any particular evaluation of a counterfactual statement presupposes a causal model, including a split between endogenous variables (whose value is determined by other variables) and exogenous variables (whose values are stochastic).

But it's still quite ambiguous what variables we change and what variable we put in the model in the first place when we understand a sentence like "If I were you, I wouldn't ... " It makes sense that it should convey a model of reality as it does appear a little more cautious than a direct recommendation ("You shouldn't ... "). It's like showing someone your watch rather than telling them what time it is.