Three modes of convergence, the hierarchy between them, Slutsky's theorem, and the Continuous Mapping Theorem
When we say an estimator "converges to the truth", we need to be precise about what convergence means. A sequence of random variables \(T_n\) can converge to a limit \(T\) in several distinct senses — and these are not equivalent. Understanding the hierarchy is essential for knowing when asymptotic arguments are valid.
\(T_n \xrightarrow{a.s.} T \iff \mathbb{P}\!\left(\left\{\omega : T_n(\omega) \xrightarrow{n\to\infty} T(\omega)\right\}\right) = 1\)
The sequence converges for almost every realisation of the randomness. This is the strongest mode. The Strong Law of Large Numbers states \(\bar{X}_n \xrightarrow{a.s.} \mu\).
\(T_n \xrightarrow{\mathbb{P}} T \iff \mathbb{P}(|T_n - T| > \epsilon) \xrightarrow{n\to\infty} 0 \quad \forall\, \epsilon > 0\)
For any fixed tolerance \(\epsilon\), the probability that \(T_n\) misses \(T\) by more than \(\epsilon\) goes to zero. The Weak Law of Large Numbers gives convergence in probability. Weaker than a.s. convergence — the bad events can occur, just with vanishing probability.
\(T_n \xrightarrow{(d)} T \iff \mathbb{E}[f(T_n)] \xrightarrow{n\to\infty} \mathbb{E}[f(T)]\) for all bounded continuous \(f\)
Equivalently, the CDFs converge: \(F_{T_n}(t) \to F_T(t)\) at all continuity points of \(F_T\). This is the weakest mode — it only requires that the distribution of \(T_n\) approaches that of \(T\), not that \(T_n\) and \(T\) are close as random variables. The CLT gives convergence in distribution.
Additional properties:
Slutsky's theorem lets us combine convergent sequences — essential whenever we replace unknown parameters with consistent estimates.
Let \(T_n \xrightarrow{(d)} T\) and \(U_n \xrightarrow{\mathbb{P}} u\) (a constant). Then:
Classic use: in the CLT, \(\sqrt{n}(\bar{X}_n - \mu)/\sigma \xrightarrow{(d)} \mathcal{N}(0,1)\). We don't know \(\sigma\), but if \(\hat{\sigma}_n \xrightarrow{\mathbb{P}} \sigma\), then by Slutsky: \(\sqrt{n}(\bar{X}_n - \mu)/\hat{\sigma}_n \xrightarrow{(d)} \mathcal{N}(0,1)\) still holds.
If \(g : \mathbb{R} \to \mathbb{R}\) is continuous and \(T_n \xrightarrow{a.s./\mathbb{P}/(d)} T\), then:
\(g(T_n) \xrightarrow{a.s./\mathbb{P}/(d)} g(T)\)
The same mode of convergence is preserved under continuous transformations. This is why we can say things like "if \(\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta^*\), then \(\hat{\theta}_n^2 \xrightarrow{\mathbb{P}} (\theta^*)^2\)" without any extra work.
From the CLT: \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{(d)} \mathcal{N}(0,\sigma^2)\). Applying \(g(x) = x^2/\sigma^2\) (continuous) via CMT gives:
\[\frac{n(\bar{X}_n - \mu)^2}{\sigma^2} \xrightarrow{(d)} \chi^2_1\]This is the foundation of the chi-squared test.
When \(T_n \xrightarrow{a.s./\mathbb{P}} T\) and \(U_n \xrightarrow{a.s./\mathbb{P}} U\):
| Operation | Limit |
|---|---|
| \(T_n + U_n\) | \(T + U\) |
| \(T_n \cdot U_n\) | \(T \cdot U\) |
| \(T_n / U_n\) | \(T / U\) (if \(U \neq 0\)) |
Note: for convergence in distribution, these arithmetic rules do not apply unless one sequence converges to a constant — that is where Slutsky's theorem specifically helps.
Every asymptotic result in statistics — CLT-based confidence intervals, consistency of MLE, asymptotic normality of test statistics — rests on one of these three modes. Knowing the hierarchy prevents errors like applying Slutsky when only in-distribution convergence is available, or assuming that convergence in distribution implies pointwise approximation of random variables.