Parametric tests, t-tests, Wald's test, and the Likelihood Ratio test
Hypothesis testing provides a principled way to make binary decisions from data. We formulate two competing hypotheses about the parameter \(\theta^*\) and use the data to decide between them.
\(H_0: \theta^* \in \Theta_0\) — the null hypothesis (status quo)
\(H_1: \theta^* \in \Theta_1\) — the alternative hypothesis
A test is a statistic \(\psi: E^n \to \{0,1\}\) where \(\psi = 1\) means reject \(H_0\) and \(\psi = 0\) means fail to reject \(H_0\). The test takes the form \(\psi = \mathbf{1}[T_n > c]\) for a test statistic \(T_n\) and threshold \(c\). The rejection region is \(R_\psi = \{x : \psi(x) = 1\}\).
| H₀ true | H₁ true | |
|---|---|---|
| Reject H₀ | Type I error (false positive) | Correct |
| Fail to reject H₀ | Correct | Type II error (false negative) |
Let \(X_1,\ldots,X_n \stackrel{iid}{\sim} \mathcal{N}(\mu, \sigma^2)\) with both \(\mu\) and \(\sigma^2\) unknown.
\(H_0: \mu = \mu_0\) vs \(H_1: \mu \neq \mu_0\). The test statistic is:
\[T_n = \sqrt{n}\,\frac{\bar{X}_n - \mu_0}{\sqrt{S_n}} \sim t_{n-1} \quad \text{under } H_0\]where \(S_n = \frac{1}{n-1}\sum(X_i - \bar{X}_n)^2\). The Student's T distribution \(t_{n-1}\) has heavier tails than Gaussian, accounting for uncertainty in estimating \(\sigma\). Reject if \(|T_n| > q_{\alpha/2}\) of \(t_{n-1}\).
Compare means of two independent samples \(X_1,\ldots,X_n \sim \mathcal{N}(\mu_x,\sigma_x^2)\) and \(Y_1,\ldots,Y_m \sim \mathcal{N}(\mu_y,\sigma_y^2)\):
\[T = \frac{\bar{X}_n - \bar{Y}_m - (\Delta_0)}{\sqrt{\hat{\sigma}_x^2/n + \hat{\sigma}_y^2/m}} \approx t_N\]where \(N\) is given by the Welch-Satterthwaite approximation: \(N = \frac{(\hat{\sigma}_x^2/n + \hat{\sigma}_y^2/m)^2}{\hat{\sigma}_x^4/[n^2(n-1)] + \hat{\sigma}_y^4/[m^2(m-1)]}\).
If \(Z_1,\ldots,Z_d \stackrel{iid}{\sim} \mathcal{N}(0,1)\), then \(V = Z_1^2 + \cdots + Z_d^2 \sim \chi^2_d\).
\(\mathbb{E}[V] = d\), \(\text{Var}(V) = 2d\). Key: \(nS_n^2/\sigma^2 \sim \chi^2_{n-1}\) for Gaussian data.
If \(Z \sim \mathcal{N}(0,1)\), \(V \sim \chi^2_d\), independent: \(T = Z/\sqrt{V/d} \sim t_d\).
As \(d \to \infty\), \(t_d \to \mathcal{N}(0,1)\). For small \(d\), \(t_d\) has heavier tails.
Wald's test is based on the MLE and the Fisher information, making it applicable beyond Gaussians.
Test \(H_0: \theta^* = \theta_0\) vs \(H_1: \theta^* \neq \theta_0\). Test statistic:
\[T_n = n\,(\hat{\theta}_n^{\text{MLE}} - \theta_0)^\top I(\hat{\theta}_n^{\text{MLE}})\,(\hat{\theta}_n^{\text{MLE}} - \theta_0) \xrightarrow{(d)} \chi^2_d \quad \text{under } H_0\]
Reject at level \(\alpha\) if \(T_n > q_\alpha(\chi^2_d)\).
In one dimension, this simplifies to the familiar \(Z\)-test: \(\psi = \mathbf{1}\{|(\hat{\theta}_n - \theta_0)/\sqrt{1/(nI(\hat{\theta}_n))}| > q_{\alpha/2}\}\).
The LRT compares the likelihood at the MLE against the likelihood constrained to \(H_0\).
Let \(\hat{\theta}_n\) be the unconstrained MLE and \(\tilde{\theta}_n = \arg\max_{\theta \in \Theta_0} \ell_n(\theta)\) the constrained MLE. Then:
\[T_n = 2\!\left(\ell_n(\hat{\theta}_n) - \ell_n(\tilde{\theta}_n)\right)\]
If \(H_0\) constrains \(d\) free parameters to specific values (rank of constraint = \(d\)):
\[T_n \xrightarrow{(d)} \chi^2_d \quad \text{under } H_0\]
The LRT is often preferred over Wald's test because it uses the likelihood directly at both points, is invariant to reparametrisation, and can have better finite-sample performance.
A crucial point: \(H_0\) and \(H_1\) play asymmetric roles. We design tests to control Type I error (false rejection of \(H_0\)) at level \(\alpha\). Failing to reject \(H_0\) does not mean \(H_0\) is true — it only means the data did not provide sufficient evidence against it. The burden of proof lies with \(H_1\).