Limit Theorems

Convergence of sequences of random variables, the law of large numbers, and the central limit theorem

1. Modes of Convergence

There are several senses in which a sequence of random variables \(X_n\) can converge to a limit \(X\):

Almost sure (a.s.): \(\mathbb{P}(\lim_{n\to\infty} X_n = X) = 1\). Pathwise convergence.
In probability: \(\mathbb{P}(|X_n - X| > \varepsilon) \to 0\) for all \(\varepsilon > 0\).
In distribution (weak): \(F_{X_n}(x) \to F_X(x)\) at all continuity points of \(F_X\).
In mean square (\(L^2\)): \(\mathbb{E}[(X_n - X)^2] \to 0\).

Hierarchy: a.s. convergence \(\Rightarrow\) convergence in probability \(\Rightarrow\) convergence in distribution.

2. Weak Law of Large Numbers

WLLN

Let \(X_1, X_2, \ldots\) be i.i.d. with \(\mathbb{E}[X_i] = \mu < \infty\). Then the sample mean

\[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{\,p\,} \mu \quad \text{as } n \to \infty\]

Proof sketch via Chebyshev's inequality (when \(\text{Var}(X) < \infty\)):

\[\mathbb{P}(|\bar{X}_n - \mu| > \varepsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\varepsilon^2} = \frac{\text{Var}(X)}{n\varepsilon^2} \to 0\]

Strong Law of Large Numbers

Under the same conditions, \(\bar{X}_n \xrightarrow{\text{a.s.}} \mu\). The strong law requires more careful proof (e.g., Borel-Cantelli) but gives the stronger almost-sure statement.

3. Central Limit Theorem

Central Limit Theorem (CLT)

Let \(X_1, X_2, \ldots\) be i.i.d. with \(\mathbb{E}[X_i] = \mu\) and \(\text{Var}(X_i) = \sigma^2 < \infty\). Then:

\[\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} = \frac{S_n - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1)\]

The CLT says the standardised sum converges in distribution to a standard normal — regardless of the underlying distribution of \(X_i\). This is why the normal distribution appears everywhere.

Practical Approximation

For large \(n\):

\[\bar{X}_n \approx \mathcal{N}\!\left(\mu, \frac{\sigma^2}{n}\right), \qquad S_n = \sum_{i=1}^n X_i \approx \mathcal{N}(n\mu, n\sigma^2)\]

Rule of thumb: approximation is reasonable when \(n \geq 30\) for symmetric distributions; larger \(n\) needed for skewed distributions.

Figure 1 — CLT in action: the distribution of the sample mean becomes bell-shaped as n increases, regardless of the original distribution.

4. Delta Method

If \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2)\) and \(g\) is differentiable at \(\mu\) with \(g'(\mu) \neq 0\), then:

Delta Method

\[\sqrt{n}\bigl(g(\bar{X}_n) - g(\mu)\bigr) \xrightarrow{d} \mathcal{N}\bigl(0,\, [g'(\mu)]^2 \sigma^2\bigr)\]

The delta method propagates asymptotic normality through smooth transformations. It is used extensively to find approximate distributions of functions of estimators.

5. Moment Generating Functions

The MGF of \(X\) is \(M_X(s) = \mathbb{E}[e^{sX}]\) (when it exists). Key properties:

\(M_X^{(k)}(0) = \mathbb{E}[X^k]\) — the \(k\)-th moment.
If \(X, Y\) independent: \(M_{X+Y}(s) = M_X(s)\,M_Y(s)\).
Uniqueness: if two distributions have the same MGF on an interval around 0, they are identical.

MGFs provide an elegant proof of the CLT via Taylor expansion of the log-MGF.

6. Order Statistics

Given i.i.d. \(X_1, \ldots, X_n\) with CDF \(F\) and PDF \(f\), let \(X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}\) be the sorted values. The PDF of the \(k\)-th order statistic is:

\[f_{X_{(k)}}(x) = \frac{n!}{(k-1)!(n-k)!}\,[F(x)]^{k-1}\,[1-F(x)]^{n-k}\,f(x)\]

Special cases:

Minimum \(X_{(1)}\): \(F_{X_{(1)}}(x) = 1 - [1-F(x)]^n\). The minimum of \(n\) i.i.d. Expo(\(\lambda\)) is Expo(\(n\lambda\)).
Maximum \(X_{(n)}\): \(F_{X_{(n)}}(x) = [F(x)]^n\).
For Uniform(0,1): \(X_{(k)} \sim \text{Beta}(k, n-k+1)\).