Notebooks on Language: methodology

Showing posts with label methodology. Show all posts

Tuesday, April 15, 2014

Fisher: "Statistical Methods and Scientific Induction" (1955)

Ronald Fisher; image from Wikimedia Commons.

In this brief paper, Sir Ronald Fisher militates against what he sees as wrong and absurd interpretations of the notion of a statistical test.

The Ideology of Statistics

The core of his argument is that a test only gives positive information when yields a significant difference and thus warrants the rejection of a hypothesis — an absence of a significant difference does not mean "accept." He contends that

… this difference in point of view originated when Neyman, thinking that he was correcting and improving my own early work on tests of significance, … in fact reinterpreted them in terms of that technological and commercial apparatus which is known as an acceptance procedure. (p. 69)

And although acceptance procedures might be good enough for commerce, they have no place in science:

I am casting no contempt on acceptance procedures, and I am thankful, whenever I travel by air, that the high level of precision and reliability required can really be achieved by such means. But the logical differences between such an operation and the work of scientific discovery by physical or biological experimentation seem to me so wide that the analogy between them is not helpful, and the identification of the two sorts of operation is decidedly misleading. (pp. 69–70)

Then comes the juicy part:

I shall hope to bring out some of the logical differences more distinctly, but there is also, I fancy, in the background an ideological difference. Russians are made familiar with the ideal that research in pure science can and should be geared to technological performance, in the comprehensive organized effort of a five-year plan for the nation. How far, within such a system, personal and individual inferences from observed facts are permissible we do not know, but it may be safer, and even, in such a political atmosphere, more agreeable, to regard one's scientific work simply as a contributary element in a great machine, and to conceal rather than to advertise the selfish and perhaps heretical aim of understanding for oneself the scientific situation. In the U.S. also the great importance of organized technology has I think made it easy to confuse the process appropriate for drawing correct conclusions, with those aimed rather at, let us say, speeding production, or saving money. There is therefore something to be gained by at least being being able to think of our scientific problems in a language distinct from that of technological efficiency. (p. 70)

So there you have it: In the technological regime of either of the two Cold War superpowers, "learning," "inference," and private, inner thought are taboo, according to Fisher. Presumably we are to contrast this with the aims of British science going back to Newton.

The Three Issues

Fisher singles out three phrases that he finds particularly offensive in scientific statistics:

"Repeated sampling from the same distribution"
Errors of the "second kind"
"Inductive behaviour"

I'll discuss these one by one.

1. "Repeated sampling from the same distribution"

The issue with the first one is not completely clear to me, but here is what I make of his discussion (pp. 71–72): Suppose you are performing a test to see whether the mean of some population has a specific value; suppose further that the standard deviation of that population is unknown, but that you have estimated it based on the available sample.

The problem then is, if I understand Fisher correctly, that the test depends on the standard deviation being constant and known, but in reality, it is an unknown quantity that you have estimated by a maximum likelihood method. This is, strictly speaking, illegitimate, since any estimate should be based on a numerous and representative sample; but since the standard deviation is a property of samples of size N, you should really have M samples of a sample of size N in order to have some data to estimate from. But clearly, this sets a far too high standard for the amount of data required.

It's a convoluted argument, but I think it makes sense from a rigorously frequentist standpoint: If parameters are consistently interpreted as frequencies, then the only legitimate statistical procedure for learning about an unknown quantity t is to obtain a large number of samples dependent on t and then wait for the law of large numbers to kick in.

Strictly speaking, this means that the amount of data points you need in order to estimate all the parameters in a model will grow exponentially in the number of parameters. That sounds sort of crazy, but if you do not allow yourself to have any model in the absence of data, you really have to wait for the data to overwhelm your initial ignorance before you can say that you have a model of the situation. That takes time.

2. Errors of the "Second Kind"

Errors of the first kind are false negatives: Cases in which, for instance, a population in fact has mean m, but nevertheless exhibits a sample average so far away from m that the hypothesis is rejected. Such errors have a frequentist interpretation, because the likelihoods given m are well-defined even in the absence of a prior distribution over m.

Errors of the second kind are false positives: Some other mean m' different from m produces a sample average so close to m that the false hypothesis of a mean of m is confirmed. This kind of error has no frequentist interpretation, because it requires the alternative hypotheses m' to have prior probabilities, and because it requires that there be a loss function associated with accepting the hypothesis of m when the true mean m' is close to m.

Jerzy Neyman in the classroom, 1973; image from Wikimedia Commons.

Fisher is not willing to assume any of those two instruments. He writes:

It was only when the relation between a test of significance and its corresponding null hypothesis was confused with an acceptance procedure that it seemed suitable to distinguish errors in which the hypothesis is rejected wrongly, from errors in which it is "accepted wrongly" as the phrase does. (p. 73)

Such language is not just scientifically irresponsible, he thinks — it also misunderstands the private states of mind present in the head of a scientist:

The fashion of speaking of a null hypothesis as "accepted when false", whenever a test of significance gives us no strong reason for rejecting it, and when in fact it is in some way imperfect, shows real ignorance of the research worker's attitude, by suggesting that in such a case he has come to an irreversible decision. (p. 73; Fisher's emphasis)

Of course, neither positive nor negative decisions are immune to revision as more data comes in (cf. p. 76), so Fisher prefers to depict the scientist's attitude as one of cautious learning in the face of data. This contrasts with the forced-choice nature of acceptance procedures:

In an acceptance procedure, on the other hand, acceptance is irreversible, whether the evidence for it was strong or weak. It is the result of applying mechanically rules laid down in advance; no thought is given to the particular case, and the tester's state of mind, or his capacity for learning, is inoperative.

By contrast, conclusions drawn by a scientific worker from a test of significance are provisional, and involve an intelligent attempt to understand the experimental situation. (pp. 73–74; Fisher's emphasis).

Note again the insistence on private states of mind as the hallmark of scientific rationality.

3. "Inductive Behaviour"

The last issue Fisher has with Neyman's brand of statistics is shelves under the heading above, but it is really about an issue of linguistics: Neyman contends (according to Fisher's summary — there is no direct reference) that statements like

There is 5% probability that the sample average deviates strongly from the mean

have a meaningful and well-defined interpretation (in terms of likelihood). On the other hand,

There is 5% probability that the mean deviates strongly from the sample average

is meaningless, because the mean is not a random variable.

Fisher disagrees, not because he is a fan of prior probability distributions on the parameters, but because he thinks that such statements could only ever refer to likelihoods. To make this point vivid, he considers (I am changing the example a bit here) a statement of the form

Pr(m < x) = 5%,

where m is a parameter and x is an observation, and he contrasts this with

Pr(m < 17) = 5%.

If one of these statements has a meaning, he says, clearly the other one must have a meaning too, unless we want to "deny the syllogistic process of making a substitution" (p. 75). But Neyman contends that the probability of a statement of the second kind should be "necessarily either 0 or 1" (p. 75), so that only the former probability (the likelihood given the mean) is well-defined.

Fisher comments:

The paradox is rather childish, for it requires that we should wilfully misinterpret the probability statement so as to pretend that the population to which it refers is not defined by our observations and their precision, but is absolutely independent of them. (p. 75)

By this he means that the reference class (the "population") is defined arbitrarily by our experimental set-up. And as he says about populations earlier in the paper, "no one of them has objective reality, all being products of the statistician's imagination" (p. 71).

An Englishman's Duty

In the conclusion, Fisher comes back to the ethical standards of statistics:

As an act of construction the hypothesis is not altogether impersonal, for the scientist's personal capacity for theorizing comes into it; moreover, the criteria by which it is approved require a certain honesty, or integrity, in their application. (p. 75)

Again, he explains that decision-theoretic methods (such as Bayesian statistics) have no business in scientific inference, since the goal is not optimal decisions, but the attainment of truth:

Finally, in inductive inference we introduce no cost functions for faulty judgments … In fact, scientific research is not geared to maximize the profits of any particular organization, but is rather an attempt to improve public knowledge undertaken as an act of faith to the effect that, as more becomes known, or more surely known, the intelligent pursuit of a great variety of aims, by a great variety of men, and groups of men, will be facilitated. We make no attempt to evaluate these consequences, and do not assume that they are capable of evaluation in any sort of currency.

… We aim, in fact, at methods of inference which should be equally convincing to all rational minds, irrespective of any intentions they may have in utilizing the knowledge inferred.

We have the duty of formulating, of summarizing, and of communicating our conclusions, in intelligible form, in recognition of the right or other free minds to utilize them in making their own decisions. (p. 77)

We could hardly have it more explicit: The difference in statistical paradigm is one of ethics.

Wednesday, February 29, 2012

Bruno Latour: What Is the Style of Matters of Concern? (2008)

This little booklet has been lying around the office here at Oude Turfmarkt since I moved in here, and since I love Bruno Latour, I thought I should have a look at it some time. Today was as good a day as any.

The booklet consists of two lectures that Latour delivered in Amsterdam as the Spinoza lectures of 2005. I have so far only read the first one. The upshot of the lecture is that he wants science and science studies to create knowledge without splitting up the world in the a cold realm of nature-in-itself and a comfy but scientifically irrelevant world of nature-for-us.

Two Heroes Going With the Flow
Latours argument is driven by a central metaphor: Instead of coping with the gap between language and world by building a bridge from one bank to the other, he wants us to "go with the flow" down the river that separates the two realms (pp. 14-15). In this way, he hopes that we will be able to overcome the "bifurcation of nature" (a phrase he borrows from Whitehead) into primary and secondary qualities.

In order to meet this challenge, he invokes another one of his heroes, the French 19th-century sociologist Gabriel Tarde. Tarde was very explicit about employing a methodology "almost the exact opposite of [...] Monsieur Durkheim's" (quoted on page 19). That is, instead of privileging the bird's-eye, whole-sale view of a society or aggregate, he privileges the view from the inside, the meaningful detail.

Fleshy Signals in Biology and Computer Science
I wonder whether all of this might have an application to the philosophy of information. Latour, with Tarde, hints at some consequences that this new thinking might have on evolutionary thought (pp. 16-17): The "information" reproduced in a particular shift of generations is under a constant pressure to produce difference, that is, decide on a content/form distinction.

This might be elaborated into a more general theory in which the metaphysical conception of pure, abstract information yields to some kind of contextual or naturalized conception of "content" or "meaning," perhaps akin to that of biosemiotics.

Subscribe to: Posts ( Atom )