Confidence Intervals & Delta Method

How to quantify estimation uncertainty and propagate it through transformations

1. Introduction

A point estimate gives one number; a confidence interval gives a range that communicates uncertainty. The Delta method lets us derive confidence intervals for transformed parameters when we know the asymptotic distribution of the original estimator.

2. Confidence Intervals

Confidence Interval (Level 1−α)

A confidence interval \(\mathcal{I}\) of level \(1-\alpha\) for \(\theta\) is a random interval such that:

\[\mathbb{P}_\theta(\theta \in \mathcal{I}) \geq 1-\alpha \quad \forall\,\theta \in \Theta\]

Common misconception: "There is a 95% probability that θ lies in this interval." Wrong — θ is fixed (not random). The correct interpretation: if we repeated the experiment many times and computed a CI each time, at least 95% of those intervals would contain θ.

Figure 1 — Many CIs computed from repeated experiments. Most contain θ* (5/6 shown = 83%). With enough repetitions, the fraction approaches 1−α.

3. Constructing Asymptotic CIs

An asymptotic CI of level \(1-\alpha\) satisfies \(\lim_{n\to\infty}\mathbb{P}_\theta(\theta \in \mathcal{I}) \geq 1-\alpha\).

Standard Construction (CLT-based)

If \(\sqrt{n}(\hat{\theta}_n - \theta^*) \xrightarrow{(d)} \mathcal{N}(0,\sigma^2)\), then an asymptotic \(1-\alpha\) CI is:

\[\mathcal{I} = \left[\hat{\theta}_n - q_{\alpha/2}\frac{\hat{\sigma}}{\sqrt{n}},\;\; \hat{\theta}_n + q_{\alpha/2}\frac{\hat{\sigma}}{\sqrt{n}}\right]\]

where \(q_{\alpha/2}\) is the \((1-\alpha/2)\)-quantile of \(\mathcal{N}(0,1)\) and \(\hat{\sigma}\) is a consistent estimator of \(\sigma\).

Example: CI for a Proportion

Observe \(R_1,\ldots,R_n \stackrel{iid}{\sim} \text{Ber}(p)\). By CLT: \(\sqrt{n}(\bar{R}_n - p)/\sqrt{p(1-p)} \xrightarrow{(d)} \mathcal{N}(0,1)\). The 95% CI for \(p\) uses \(\hat{p} = \bar{R}_n\):

\[\mathcal{I} = \left[\hat{p} - 1.96\sqrt{\frac{\hat{p}(1-\hat{p})}{n}},\;\; \hat{p} + 1.96\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right]\]

4. The Delta Method

The Delta method propagates asymptotic normality through smooth transformations. If we know the distribution of \(\hat{\theta}_n\), and we want the distribution of \(g(\hat{\theta}_n)\) for some smooth function \(g\):

Delta Method

Let \((Z_n)_{n\geq1}\) be a sequence such that \(\sqrt{n}(Z_n - \theta) \xrightarrow{(d)} \mathcal{N}(0,\sigma^2)\). Let \(g: \mathbb{R}\to\mathbb{R}\) be continuously differentiable at \(\theta\). Then:

\[\sqrt{n}(g(Z_n) - g(\theta)) \xrightarrow{(d)} \mathcal{N}(0,\, (g'(\theta))^2\sigma^2)\]

Intuition: Taylor-expand \(g\) around \(\theta\): \(g(Z_n) \approx g(\theta) + g'(\theta)(Z_n - \theta)\). Since the linear term dominates, the asymptotic variance scales by \((g'(\theta))^2\).

Example

From \(\bar{X}_n \approx \mathcal{N}(\mu, \sigma^2/n)\), what is the distribution of \(\bar{X}_n^2\)?

Apply \(g(x) = x^2\), so \(g'(x) = 2x\), \(g'(\mu) = 2\mu\). By the Delta method:

\[\sqrt{n}(\bar{X}_n^2 - \mu^2) \xrightarrow{(d)} \mathcal{N}(0, 4\mu^2\sigma^2)\]

So the asymptotic variance of \(\bar{X}_n^2\) is \(4\mu^2\sigma^2/n\).

5. Multivariate Delta Method

Multivariate Delta Method

Let \((T_n)\) be a sequence of random vectors such that \(\sqrt{n}(T_n - \theta) \xrightarrow{(d)} \mathcal{N}_d(0,\Sigma)\). Let \(g: \mathbb{R}^d \to \mathbb{R}^k\) be continuously differentiable. Then:

\[\sqrt{n}(g(T_n) - g(\theta)) \xrightarrow{(d)} \mathcal{N}_k\!\left(0,\, \nabla g(\theta)^\top \Sigma\, \nabla g(\theta)\right)\]

The gradient \(\nabla g(\theta)\) replaces \(g'(\theta)\), and the covariance transforms by the Jacobian. This is used to derive CIs for functions of multiple parameters, like ratios or products.

6. Limitations

The Delta method requires \(g'(\theta) \neq 0\). When \(g'(\theta) = 0\), the first-order approximation vanishes and a second-order expansion is needed.
Asymptotic CIs require \(n\) to be "large enough" — for small \(n\) or heavy-tailed data, the CLT approximation may be poor.
Bootstrap methods provide an alternative that can be more accurate in finite samples.