The standard families — their stories, parameters, properties, and relationships
Named distributions are recurring patterns that appear across many applications. Each has a story — a natural process that generates it. Learning the story is more important than memorising formulas: it tells you when to use each distribution.
\(X \sim \mathcal{N}(\mu, \sigma^2)\). The universal distribution by the CLT. PDF:
\[f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]Key: any linear combination of independent Normals is Normal. Standardise: \(Z = (X-\mu)/\sigma \sim \mathcal{N}(0,1)\).
\(X \sim \text{Expo}(\lambda)\). Waiting time until first event in a Poisson process with rate \(\lambda\).
\[f(x) = \lambda e^{-\lambda x},\; x \geq 0 \qquad F(x) = 1 - e^{-\lambda x}\] \[\mathbb{E}[X] = 1/\lambda, \quad \text{Var}(X) = 1/\lambda^2\]Memorylessness: \(\mathbb{P}(X > s+t \mid X > s) = \mathbb{P}(X > t)\). The only continuous memoryless distribution.
\(X \sim \text{Gamma}(a, \lambda)\). Waiting time until the \(a\)-th event in a Poisson process.
\[\mathbb{E}[X] = a/\lambda, \quad \text{Var}(X) = a/\lambda^2\]\(\text{Expo}(\lambda) = \text{Gamma}(1, \lambda)\). If \(X_1, \ldots, X_n\) are i.i.d. \(\text{Expo}(\lambda)\), then \(X_1 + \cdots + X_n \sim \text{Gamma}(n, \lambda)\).
\(X \sim \text{Beta}(a, b)\). Supported on \([0,1]\); natural model for probabilities.
\[\mathbb{E}[X] = \frac{a}{a+b}, \quad \text{Var}(X) = \frac{ab}{(a+b)^2(a+b+1)}\]Conjugate prior for the Binomial: if \(p \sim \text{Beta}(a,b)\) and we observe \(x\) successes in \(n\) trials, then \(p \mid X=x \sim \text{Beta}(a+x, b+n-x)\).
\(X \sim \text{Unif}(a,b)\). Equal density across \([a,b]\): \(f(x) = 1/(b-a)\).
\[\mathbb{E}[X] = (a+b)/2, \quad \text{Var}(X) = (b-a)^2/12\]\(X \sim \text{Pois}(\lambda)\). Number of events in a fixed interval when events occur at constant rate \(\lambda\).
\[p_X(k) = \frac{e^{-\lambda}\lambda^k}{k!}, \; k=0,1,2,\ldots \qquad \mathbb{E}[X] = \text{Var}(X) = \lambda\]Sum: If \(X \sim \text{Pois}(\lambda_1)\), \(Y \sim \text{Pois}(\lambda_2)\) independent, then \(X+Y \sim \text{Pois}(\lambda_1+\lambda_2)\).
Binomial-Poisson approximation: \(\text{Bin}(n,p) \approx \text{Pois}(np)\) when \(n\) is large and \(p\) is small.
\(X \sim \text{NBin}(r,p)\). Number of failures before the \(r\)-th success. \(\text{Geo}(p) = \text{NBin}(1,p)\).
\[p_X(k) = \binom{k+r-1}{k}(1-p)^k p^r, \quad \mathbb{E}[X] = \frac{r(1-p)}{p}\]\(X \sim \text{HGeom}(w, b, n)\). Drawing \(n\) objects without replacement from \(w\) white and \(b\) black objects. \(X\) = number of white drawn.
\[\mathbb{E}[X] = \frac{nw}{w+b}\]Like Binomial but sampling without replacement — \(\text{HGeom}\) accounts for dependence between draws.
\(\vec{X} \sim \text{Mult}_k(n, \vec{p})\). Generalisation of Binomial to \(k\) categories. \(n\) items each independently fall into category \(j\) with probability \(p_j\).
\[\mathbb{P}(\vec{X} = \vec{n}) = \frac{n!}{n_1!\cdots n_k!}\,p_1^{n_1}\cdots p_k^{n_k}, \quad \sum n_j = n\]Marginals: \(X_i \sim \text{Bin}(n, p_i)\). Any subset of categories lumped together is still Multinomial.
\(\vec{X} \sim \mathcal{N}_d(\vec{\mu}, \Sigma)\). Any linear combination of components is Normal. Any subvector is MVN. Uncorrelated components of an MVN are independent (unique to Normal!).
Joint PDF: \(f(\vec{x}) = \frac{1}{(2\pi)^{d/2}|\Sigma|^{1/2}}\exp\!\left(-\frac{1}{2}(\vec{x}-\vec{\mu})^\top\Sigma^{-1}(\vec{x}-\vec{\mu})\right)\).
| Relationship | Details |
|---|---|
| Bernoulli → Binomial | Sum of \(n\) i.i.d. Ber(\(p\)) = Bin(\(n,p\)) |
| Geometric → Neg. Binomial | Sum of \(r\) i.i.d. Geo(\(p\)) = NBin(\(r,p\)) |
| Expo → Gamma | Sum of \(n\) i.i.d. Expo(\(\lambda\)) = Gamma(\(n,\lambda\)) |
| Binomial → Poisson | Bin(\(n,p\)) → Pois(\(np\)) as \(n\to\infty\), \(p\to0\) |
| Beta-Gamma | If \(X \sim \text{Gamma}(a,\lambda)\), \(Y \sim \text{Gamma}(b,\lambda)\) indep., then \(X/(X+Y) \sim \text{Beta}(a,b)\) |
| Normal → Chi-Square | \(Z_1^2+\cdots+Z_n^2 \sim \chi^2_n\) for i.i.d. \(Z_i \sim \mathcal{N}(0,1)\) |