Probability Foundations

Law of Large Numbers, Central Limit Theorem, Hoeffding's Inequality, and the Gaussian Distribution

1. Introduction

Statistics and probability are deeply intertwined but answer different questions. Probability starts from a known data-generating process and asks: what outcomes should we expect? Statistics starts from observed data and asks: what process could have generated this?

This notebook covers the foundational probability results that underpin all of statistical inference — how sample averages behave, how they concentrate around their mean, and why the Gaussian distribution appears everywhere.

2. The Law of Large Numbers

Let \(X_1, X_2, \ldots, X_n\) be i.i.d. random variables with \(\mathbb{E}[X] = \mu\) and \(\text{Var}(X) = \sigma^2\). The sample mean is:

\[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\]
Weak Law of Large Numbers

\(\bar{X}_n \xrightarrow{\mathbb{P}} \mu \quad \text{as } n \to \infty\)

The sample average converges in probability to the true mean. No matter how noisy individual observations are, averaging enough of them drives the error to zero. This is why we can estimate population parameters from samples at all.

μ n=1 n→∞ X̄ₙ
Figure 1 — The sample mean concentrates around μ as n grows. The band of variability collapses.

3. The Central Limit Theorem

The LLN tells us the sample mean converges to μ. The CLT tells us how fast and in what shape.

Central Limit Theorem

\(\sqrt{n}\,\frac{\bar{X}_n - \mu}{\sigma} \xrightarrow{(d)} \mathcal{N}(0,1) \quad \text{as } n \to \infty\)

Equivalently, \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{(d)} \mathcal{N}(0, \sigma^2)\). The normalised sample mean is approximately Gaussian regardless of the original distribution of \(X_i\). This is why Gaussian assumptions are so pervasive — they emerge naturally from averaging.

Practical implication: for large \(n\), we can treat \(\bar{X}_n\) as approximately \(\mathcal{N}(\mu, \sigma^2/n)\), enabling confidence intervals and hypothesis tests without knowing the true distribution.

n=1 n=5 n=30 n=100 Distribution of X̄ₙ converges to Gaussian
Figure 2 — As n increases, the distribution of the sample mean approaches a Gaussian regardless of the original distribution.

4. Hoeffding's Inequality

The CLT is asymptotic. Hoeffding gives a finite-sample guarantee on how far \(\bar{X}_n\) can deviate from \(\mu\).

Hoeffding's Inequality

Let \(X_1, \ldots, X_n\) be i.i.d. with \(\mathbb{E}[X]=\mu\) and \(X \in [a,b]\) almost surely. Then for all \(\epsilon > 0\):

\[\mathbb{P}\!\left(|\bar{X}_n - \mu| \geq \epsilon\right) \leq 2\exp\!\left(\frac{-2n\epsilon^2}{(b-a)^2}\right)\]

Key observations:

5. The Gaussian Distribution

A random variable \(X \sim \mathcal{N}(\mu, \sigma^2)\) has probability density:

\[f_{\mu,\sigma^2}(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]

with \(\mathbb{E}[X] = \mu\) and \(\text{Var}(X) = \sigma^2\).

Key Properties

Quantiles

The quantile of order \(1-\alpha\) of a random variable \(X\) is the number \(q_\alpha\) such that \(\mathbb{P}(X \leq q_\alpha) = 1-\alpha\). For \(Z \sim \mathcal{N}(0,1)\): \(\mathbb{P}(|Z| > q_{\alpha/2}) = \alpha\). Common values: \(q_{0.025} \approx 1.96\), \(q_{0.05} \approx 1.645\).

μ α/2 α/2 -qα/2 qα/2 1 - α
Figure 3 — The central 1−α probability mass lies between −qα/2 and qα/2.

6. Why It Matters

These three results form the backbone of all frequentist statistics: