## Saturday, May 16, 2015

### Neyman and Pearson: "On the Problem of the Most Efficient Tests of Statistical Hypotheses" (1933)

The Wikipedia page on the Neyman-Pearson lemma provides a statement and proof of the theorem which deviates quite a lot from the corresponding formulations in the article in which it originally appeared (pp. 300–301). I've been thinking a bit about how best to state and prove this theorem, and here's what I've come up with.

### Dichotomies and Tests

Suppose two probability measures $P_0$ and $P_1$ on a set $\Omega$ are given, and that we are given a data set $X=x$ drawn from $\Omega$ according to one of these two distributions. Our goal is now to define a function $T: \Omega\rightarrow \{0,1\}$ which will return, for any data set, a guess at which of the two hypotheses the data came from.

Such a test will thus itself be a random variable $T=T(X)$. The quality of this test can be measured in terms of two error probabilities,
$$P_0(T=1) \qquad \textrm{and} \qquad P_1(T=0).$$Minimizing these two sources of error will generally be conflicting goals, and can be seen by considering the constant functions $T=0$ and $T=1$.

### Likelihood Ratio Tests and the Neyman-Pearson Lemma

One family of tests we always have at our disposal are the likelihood ratio tests. A likelihood ratio test at threshold $r$ is the indicator variable that returns the value 1 when the evidence supports hypothesis 1 by a factor of more than $r$:
$$R(x) \; = \; \mathbb{I}\left( \frac{P_1(x)}{P_0(x)} \geq r \right).$$The content of the Neyman-Pearson lemma is that these likelihood ratio tests are the only optimal ones, in a certain sense.

Specifically, suppose any other test $T$ is given. The lemma then states that if
$$P_0(T=1) \; \leq \; P_0(R=1) \qquad \Longrightarrow \qquad P_1(T=0) \; \geq \; P_1(R=0).$$In other words, when the probability of an error of the first kind is held below a fixed level, we achieve the lowest rate of errors of the other kind by choosing a likelihood ration test. You cannot achieve a lower error rate under $P_0$ without increasing your error rate under $P_1$.

### From $1\times 2$ to $2 \times 2$

The best way of proving this is to directly compare the regions of $\Omega$ on which the two tests disagree. There are two of these regions, $T=1 \wedge R=0$ and $T=0 \wedge R=1$. The difference in the measure of these two regions under the two hypotheses determine the difference in error rates for the two tests.

A paraphrase of the lemma could then be that if
$$P_0(T=1 \wedge R=0) \; \leq \; P_0(T=0 \wedge R=1),$$then we also have
$$P_1(T=0 \wedge R=1) \; \geq \; P_1(T=1 \wedge R=0).$$This alternative formulation can be translated back into the original form by adding back in the corner regions $T=1 \wedge R=1$ and $T=0 \wedge R=0$ on which the two tests give the same result.

### Ratio Conversions

Using this formulation, a proof strategy suggests itself: Looking at cases according to the value of $R$. Specifically, if an event $A\subseteq \Omega$ satisfies the condition
$$A \; \subseteq \; \{x:\;R(x)=0\} \; = \; \left\{ x:\; \frac{P_1(x)}{P_0(x)} \; < \; r\right\},$$then $\frac{1}{r}P_1 < P_0$ on $A$. We can therefore obtain a lower bound on the $P_0$-measure of $A$ by performing the integration using the smaller measure $\frac{1}{r}P_1$ instead:
$$\frac{1}{r} P_1(A) \; < \; P_0(A).$$Similarly, if
$$A \; \subseteq \; \{ R=1 \} \; = \; \left\{ \frac{P_1}{P_0} \; \geq \; r\right\},$$then $\frac{1}{r}P_1 \geq P_0$ on $A$, and
$$P_0(A) \; \leq \; \frac{1}{r} P_1(A).$$Since the two sets $R=0$ and $R=1$ form a partition of $\Omega$, we can split any set $A$ up according to its overlap with these cells and thus translate bounds on $P_0$ into bounds on $P_1$.

### The Extended Sandwich

Now, by applying these considerations to the condition
$$P_0(T=1 \wedge R=0) \; \leq \; P_0(T=0 \wedge R=1),$$we obtain the result that
$$\frac{1}{r} P_1(T=1 \wedge R=0) \; \leq \; \frac{1}{r} P_1(T=0 \wedge R=1).$$By cancelling the $\frac{1}{r}$, we thus get the result.

I like this formulation of the lemma, because it makes it vivid what it is that's going on: When somebody splits up $\Omega$ into $\{T=0\}$ and $\{T=1\}$, we use our ratio test $R$ to further split these up into two subregions each. This allows us to use the translation between the two measures, and thus to compare the $P_0$ error rates with the $P_1$ error rates.

It also gives better intuitions about the potential differences between the tests $R$ and $T$: The more the set $T=1$ overlaps with the set $R=1$, the more the two tests are alike, and the more squeezed will the sandwich of inequalities be.