Hypothesis Testing

Parametric tests, t-tests, Wald's test, and the Likelihood Ratio test

1. The Framework

Hypothesis testing provides a principled way to make binary decisions from data. We formulate two competing hypotheses about the parameter \(\theta^*\) and use the data to decide between them.

Setup

\(H_0: \theta^* \in \Theta_0\) — the null hypothesis (status quo)

\(H_1: \theta^* \in \Theta_1\) — the alternative hypothesis

A test is a statistic \(\psi: E^n \to \{0,1\}\) where \(\psi = 1\) means reject \(H_0\) and \(\psi = 0\) means fail to reject \(H_0\). The test takes the form \(\psi = \mathbf{1}[T_n > c]\) for a test statistic \(T_n\) and threshold \(c\). The rejection region is \(R_\psi = \{x : \psi(x) = 1\}\).

2. Errors and Power

H₀ trueH₁ true
Reject H₀Type I error (false positive)Correct
Fail to reject H₀CorrectType II error (false negative)
c H₀ H₁ α
Figure 1 — The critical value c sets the rejection region. The shaded area (right tail of H₀) is the Type I error rate α.

3. Student's t-test

Let \(X_1,\ldots,X_n \stackrel{iid}{\sim} \mathcal{N}(\mu, \sigma^2)\) with both \(\mu\) and \(\sigma^2\) unknown.

One-Sample (Two-Sided)

\(H_0: \mu = \mu_0\) vs \(H_1: \mu \neq \mu_0\). The test statistic is:

\[T_n = \sqrt{n}\,\frac{\bar{X}_n - \mu_0}{\sqrt{S_n}} \sim t_{n-1} \quad \text{under } H_0\]

where \(S_n = \frac{1}{n-1}\sum(X_i - \bar{X}_n)^2\). The Student's T distribution \(t_{n-1}\) has heavier tails than Gaussian, accounting for uncertainty in estimating \(\sigma\). Reject if \(|T_n| > q_{\alpha/2}\) of \(t_{n-1}\).

Two-Sample t-test

Compare means of two independent samples \(X_1,\ldots,X_n \sim \mathcal{N}(\mu_x,\sigma_x^2)\) and \(Y_1,\ldots,Y_m \sim \mathcal{N}(\mu_y,\sigma_y^2)\):

\[T = \frac{\bar{X}_n - \bar{Y}_m - (\Delta_0)}{\sqrt{\hat{\sigma}_x^2/n + \hat{\sigma}_y^2/m}} \approx t_N\]

where \(N\) is given by the Welch-Satterthwaite approximation: \(N = \frac{(\hat{\sigma}_x^2/n + \hat{\sigma}_y^2/m)^2}{\hat{\sigma}_x^4/[n^2(n-1)] + \hat{\sigma}_y^4/[m^2(m-1)]}\).

4. Chi-Squared and Student's T Distributions

Chi-Squared Distribution

If \(Z_1,\ldots,Z_d \stackrel{iid}{\sim} \mathcal{N}(0,1)\), then \(V = Z_1^2 + \cdots + Z_d^2 \sim \chi^2_d\).

\(\mathbb{E}[V] = d\), \(\text{Var}(V) = 2d\). Key: \(nS_n^2/\sigma^2 \sim \chi^2_{n-1}\) for Gaussian data.

Student's T Distribution

If \(Z \sim \mathcal{N}(0,1)\), \(V \sim \chi^2_d\), independent: \(T = Z/\sqrt{V/d} \sim t_d\).

As \(d \to \infty\), \(t_d \to \mathcal{N}(0,1)\). For small \(d\), \(t_d\) has heavier tails.

5. Wald's Test

Wald's test is based on the MLE and the Fisher information, making it applicable beyond Gaussians.

Wald's Test

Test \(H_0: \theta^* = \theta_0\) vs \(H_1: \theta^* \neq \theta_0\). Test statistic:

\[T_n = n\,(\hat{\theta}_n^{\text{MLE}} - \theta_0)^\top I(\hat{\theta}_n^{\text{MLE}})\,(\hat{\theta}_n^{\text{MLE}} - \theta_0) \xrightarrow{(d)} \chi^2_d \quad \text{under } H_0\]

Reject at level \(\alpha\) if \(T_n > q_\alpha(\chi^2_d)\).

In one dimension, this simplifies to the familiar \(Z\)-test: \(\psi = \mathbf{1}\{|(\hat{\theta}_n - \theta_0)/\sqrt{1/(nI(\hat{\theta}_n))}| > q_{\alpha/2}\}\).

6. Likelihood Ratio Test

The LRT compares the likelihood at the MLE against the likelihood constrained to \(H_0\).

Likelihood Ratio Test Statistic

Let \(\hat{\theta}_n\) be the unconstrained MLE and \(\tilde{\theta}_n = \arg\max_{\theta \in \Theta_0} \ell_n(\theta)\) the constrained MLE. Then:

\[T_n = 2\!\left(\ell_n(\hat{\theta}_n) - \ell_n(\tilde{\theta}_n)\right)\]

Wilk's Theorem

If \(H_0\) constrains \(d\) free parameters to specific values (rank of constraint = \(d\)):

\[T_n \xrightarrow{(d)} \chi^2_d \quad \text{under } H_0\]

The LRT is often preferred over Wald's test because it uses the likelihood directly at both points, is invariant to reparametrisation, and can have better finite-sample performance.

7. The Asymmetry of Hypothesis Testing

A crucial point: \(H_0\) and \(H_1\) play asymmetric roles. We design tests to control Type I error (false rejection of \(H_0\)) at level \(\alpha\). Failing to reject \(H_0\) does not mean \(H_0\) is true — it only means the data did not provide sufficient evidence against it. The burden of proof lies with \(H_1\).