Law of Large Numbers, Central Limit Theorem, Hoeffding's Inequality, and the Gaussian Distribution
Statistics and probability are deeply intertwined but answer different questions. Probability starts from a known data-generating process and asks: what outcomes should we expect? Statistics starts from observed data and asks: what process could have generated this?
This notebook covers the foundational probability results that underpin all of statistical inference — how sample averages behave, how they concentrate around their mean, and why the Gaussian distribution appears everywhere.
Let \(X_1, X_2, \ldots, X_n\) be i.i.d. random variables with \(\mathbb{E}[X] = \mu\) and \(\text{Var}(X) = \sigma^2\). The sample mean is:
\[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\]\(\bar{X}_n \xrightarrow{\mathbb{P}} \mu \quad \text{as } n \to \infty\)
The sample average converges in probability to the true mean. No matter how noisy individual observations are, averaging enough of them drives the error to zero. This is why we can estimate population parameters from samples at all.
The LLN tells us the sample mean converges to μ. The CLT tells us how fast and in what shape.
\(\sqrt{n}\,\frac{\bar{X}_n - \mu}{\sigma} \xrightarrow{(d)} \mathcal{N}(0,1) \quad \text{as } n \to \infty\)
Equivalently, \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{(d)} \mathcal{N}(0, \sigma^2)\). The normalised sample mean is approximately Gaussian regardless of the original distribution of \(X_i\). This is why Gaussian assumptions are so pervasive — they emerge naturally from averaging.
Practical implication: for large \(n\), we can treat \(\bar{X}_n\) as approximately \(\mathcal{N}(\mu, \sigma^2/n)\), enabling confidence intervals and hypothesis tests without knowing the true distribution.
The CLT is asymptotic. Hoeffding gives a finite-sample guarantee on how far \(\bar{X}_n\) can deviate from \(\mu\).
Let \(X_1, \ldots, X_n\) be i.i.d. with \(\mathbb{E}[X]=\mu\) and \(X \in [a,b]\) almost surely. Then for all \(\epsilon > 0\):
\[\mathbb{P}\!\left(|\bar{X}_n - \mu| \geq \epsilon\right) \leq 2\exp\!\left(\frac{-2n\epsilon^2}{(b-a)^2}\right)\]
Key observations:
A random variable \(X \sim \mathcal{N}(\mu, \sigma^2)\) has probability density:
\[f_{\mu,\sigma^2}(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]with \(\mathbb{E}[X] = \mu\) and \(\text{Var}(X) = \sigma^2\).
The quantile of order \(1-\alpha\) of a random variable \(X\) is the number \(q_\alpha\) such that \(\mathbb{P}(X \leq q_\alpha) = 1-\alpha\). For \(Z \sim \mathcal{N}(0,1)\): \(\mathbb{P}(|Z| > q_{\alpha/2}) = \alpha\). Common values: \(q_{0.025} \approx 1.96\), \(q_{0.05} \approx 1.645\).
These three results form the backbone of all frequentist statistics: