Convergence Theory — Statistics From Scratch

Three modes of convergence, the hierarchy between them, Slutsky's theorem, and the Continuous Mapping Theorem

1. Introduction

When we say an estimator "converges to the truth", we need to be precise about what convergence means. A sequence of random variables \(T_n\) can converge to a limit \(T\) in several distinct senses — and these are not equivalent. Understanding the hierarchy is essential for knowing when asymptotic arguments are valid.

2. Three Types of Convergence

Almost Sure (a.s.) Convergence

Definition

\(T_n \xrightarrow{a.s.} T \iff \mathbb{P}\!\left(\left\{\omega : T_n(\omega) \xrightarrow{n\to\infty} T(\omega)\right\}\right) = 1\)

The sequence converges for almost every realisation of the randomness. This is the strongest mode. The Strong Law of Large Numbers states \(\bar{X}_n \xrightarrow{a.s.} \mu\).

Convergence in Probability

Definition

\(T_n \xrightarrow{\mathbb{P}} T \iff \mathbb{P}(|T_n - T| > \epsilon) \xrightarrow{n\to\infty} 0 \quad \forall\, \epsilon > 0\)

For any fixed tolerance \(\epsilon\), the probability that \(T_n\) misses \(T\) by more than \(\epsilon\) goes to zero. The Weak Law of Large Numbers gives convergence in probability. Weaker than a.s. convergence — the bad events can occur, just with vanishing probability.

Convergence in Distribution

Definition

\(T_n \xrightarrow{(d)} T \iff \mathbb{E}[f(T_n)] \xrightarrow{n\to\infty} \mathbb{E}[f(T)]\) for all bounded continuous \(f\)

Equivalently, the CDFs converge: \(F_{T_n}(t) \to F_T(t)\) at all continuity points of \(F_T\). This is the weakest mode — it only requires that the distribution of \(T_n\) approaches that of \(T\), not that \(T_n\) and \(T\) are close as random variables. The CLT gives convergence in distribution.

3. The Hierarchy

Figure 1 — Convergence hierarchy. The converse implications do not hold in general.

4. Slutsky's Theorem

Slutsky's theorem lets us combine convergent sequences — essential whenever we replace unknown parameters with consistent estimates.

Slutsky's Theorem

Let \(T_n \xrightarrow{(d)} T\) and \(U_n \xrightarrow{\mathbb{P}} u\) (a constant). Then:

\(T_n + U_n \xrightarrow{(d)} T + u\)
\(T_n U_n \xrightarrow{(d)} Tu\)
\(T_n / U_n \xrightarrow{(d)} T/u\) if \(u \neq 0\)

Classic use: in the CLT, \(\sqrt{n}(\bar{X}_n - \mu)/\sigma \xrightarrow{(d)} \mathcal{N}(0,1)\). We don't know \(\sigma\), but if \(\hat{\sigma}_n \xrightarrow{\mathbb{P}} \sigma\), then by Slutsky: \(\sqrt{n}(\bar{X}_n - \mu)/\hat{\sigma}_n \xrightarrow{(d)} \mathcal{N}(0,1)\) still holds.

5. Continuous Mapping Theorem

Continuous Mapping Theorem (CMT)

If \(g : \mathbb{R} \to \mathbb{R}\) is continuous and \(T_n \xrightarrow{a.s./\mathbb{P}/(d)} T\), then:

\(g(T_n) \xrightarrow{a.s./\mathbb{P}/(d)} g(T)\)

The same mode of convergence is preserved under continuous transformations. This is why we can say things like "if \(\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta^*\), then \(\hat{\theta}_n^2 \xrightarrow{\mathbb{P}} (\theta^*)^2\)" without any extra work.

Example

From the CLT: \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{(d)} \mathcal{N}(0,\sigma^2)\). Applying \(g(x) = x^2/\sigma^2\) (continuous) via CMT gives:

6. Arithmetic of Convergent Sequences

When \(T_n \xrightarrow{a.s./\mathbb{P}} T\) and \(U_n \xrightarrow{a.s./\mathbb{P}} U\):

Operation	Limit
\(T_n + U_n\)	\(T + U\)
\(T_n \cdot U_n\)	\(T \cdot U\)
\(T_n / U_n\)	\(T / U\) (if \(U \neq 0\))

Note: for convergence in distribution, these arithmetic rules do not apply unless one sequence converges to a constant — that is where Slutsky's theorem specifically helps.

7. Why It Matters

Every asymptotic result in statistics — CLT-based confidence intervals, consistency of MLE, asymptotic normality of test statistics — rests on one of these three modes. Knowing the hierarchy prevents errors like applying Slutsky when only in-distribution convergence is available, or assuming that convergence in distribution implies pointwise approximation of random variables.