Definitions
More precisely, a time shift T is said to be mixing (p. 12) ifP(A∩T−nB)→P(A)P(B)for all sets A and B. The process is thus mixing if information about the present is almost entirely independent of the far future. (A sample path ω belongs to the set T−nB if ω will visit B after n time shifts forward.) Billsingley doesn't comment on the choice of the term "mixing," but I assume it should be read as an ill-chosen synonym for "shuffling" in this context.We will now be interested in the behavior of the visiting frequenciesPn(A):=1nn−1∑k=0IA(Tkω),where IA is the indicator function of some bundles of sample paths. This quantity is a random variable with a distribution defined by the distribution on the sample paths ω.
Note that the traditional notion of visiting frequency (i.e., how often is the ace of spades on top of the deck?) is a special case of this concept of visiting frequency, in which the bundle A is defined entirely in terms of the 0th coordinate of the sample path ω=(…,ω−2,ω−2,ω0,ω1,ω2,…). In the general case, the question of whether ω∈A could involve information stored at arbitrary coordinates of ω (today, tomorrow, yesterday, or any other day).
Theorem and Proof Strategy
The mixing theorem now says the following: Suppose that P is a probability measure on the set of sample paths, and suppose that the time shift T is mixing and measure-preserving with respect to this measure. ThenPn(A)→P(A)in probability.The proof of this claim involves two things:
- Showing that E[Pn(A)]=P(A);
- Showing that Var[Pn(A)]→0 for n→∞.
Identity in Mean
The time shift T is assumed to preserve the measure P. We thus have E[f(ω)]=E[f(Tω)]=E[f(T2ω)]=⋯for any measurable function f. It follows thatE[IA(ω)]=E[IA(Tω)]=E[IA(T2ω)]=⋯and that these are all equal to P(A). By the linearity of expectations, we therefore get that E[Pn(A)]=P(A).This proves that Pn(A) at least has the right mean. Convergence to any other value than P(A) is therefore out of the question, and it now only remains to be seen that the variance of Pn(A) also goes to 0 as n goes to infinity.
Vanishing Variance: The Idea
Consider therefore the varianceVar[Pn(A)]=E[(Pn(A)−P(A))2].In order to expand the square under this expectation, it will be helpful to break the term P(A) up into n equal chunks and then write the whole thing as a sum:Pn(A)−P(A)=1nn−1∑k=0(IA(Tkω)−P(A)).If we square of this sum of n terms and take the expectation of this expasion, we will get a sum of n2 cross-terms, each of which are expected values of the formCov(i,j):=E[(IA(Tiω)−P(A))×(IA(Tjω)−P(A))].By expanding this product and using that E[IA(Tkω)]=P(A), we can find thatCov(i,j)=E[IA(Tiω)×IA(Tjω)]−P(A)2.The random variable IA(Tiω)×IA(Tjω) takes the value 1 on ω if and only if ω will visit A after both i and j time steps. Hence,Cov(i,j)=P(T−iA∩T−jA)−P(A)2,and we can writeVar[Pn(A)]=1n2n−1∑i=0n−1∑j=0Cov(i,j).The goal will now be to show the sum of these n×n converges to 0 as we increase n.Vanishing Variance: The Steps
Since T is assumed to be measure-preserving, the value of Cov(i,j) is completely determined by the distance between i and j, not their absolute values. We thus haveCov(i,j)=Cov(n+i,n+j)for all i, j, and n.By the mixing assumption, Cov(0,j)→0 for j→∞. Together with the observation about |i−j|, this implies that Cov(i,j)→0 whenever |i−j|→∞. If we imagine the values of Cov(i,j) tabulated in a huge, square grid, the numbers far away from the diagonal are thus tiny. We will now sum up the value of these n2 numbers Cov(i,j) by breaking them into n classes according to the value of d=|i−j|.
To do so, we upper-bound the number of members in each class by 2n: There are for instance n terms with |i−j|=0, and 2(n−1) terms with |i−j|=1. (This upper-bound corresponds graphically to covering the n×n square with a tilted n√2×n√2 square, except for some complications surrounding d=0.)
We thus haveVar[Pn(A)]=1n2n−1∑i=0n−1∑j=0Cov(i,j)≤2nn−1∑d=0Cov(0,d).This last quantity is an average of a list of numbers that converge to 0; this average itself must therefore also converge to 0 as n→∞. This proves thatVar[Pn(A)]→0]for n→∞, and we can conclude that Pn(A)→P(A) in probability.
No comments :
Post a Comment