Bayesian Inference

Posterior inference, MAP, LMS estimation, and the linear least-squares estimator

1. The Bayesian Framework

In the Bayesian framework, the unknown parameter \(\Theta\) is treated as a random variable with a prior distribution \(f_\Theta(\theta)\). After observing data \(X = x\), we update to the posterior distribution via Bayes' rule:

Bayes' Rule (Continuous)

\[f_{\Theta \mid X}(\theta \mid x) = \frac{f_X(\theta)\,f_{X \mid \Theta}(x \mid \theta)}{f_X(x)} \propto f_\Theta(\theta)\,f_{X \mid \Theta}(x \mid \theta)\]

The posterior is proportional to prior times likelihood. The denominator \(f_X(x) = \int f_\Theta(\theta) f_{X|\Theta}(x|\theta)\,d\theta\) is a normalising constant that does not depend on \(\theta\).

Key Quantities

2. MAP Estimation

Maximum A Posteriori (MAP) Estimate

\[\hat{\theta}_{\text{MAP}} = \arg\max_\theta f_{\Theta \mid X}(\theta \mid x) = \arg\max_\theta \bigl[f_\Theta(\theta)\,f_{X \mid \Theta}(x \mid \theta)\bigr]\]

MAP finds the mode of the posterior. It is the most probable value of \(\Theta\) given the data. Taking logs (which doesn't change the argmax):

\[\hat{\theta}_{\text{MAP}} = \arg\max_\theta \bigl[\log f_\Theta(\theta) + \log f_{X \mid \Theta}(x \mid \theta)\bigr]\]

Compared to MLE: MAP adds the log-prior as a regularisation term. With a uniform prior, MAP reduces to MLE.

MAP as Regularised MLE

3. LMS Estimation (Conditional Expectation)

Least Mean Squares (LMS) Estimator

\[\hat{\theta}_{\text{LMS}} = \mathbb{E}[\Theta \mid X = x]\]

The LMS estimator minimises the mean squared error \(\mathbb{E}[(\Theta - \hat{\theta})^2 \mid X = x]\). It is the posterior mean — the expected value of \(\Theta\) under the posterior distribution.

Properties of the LMS Estimator

Comparison of Estimators

EstimatorDefinitionOptimality
MAPmode of \(f_{\Theta|X}\)Most probable value
LMSmean of \(f_{\Theta|X}\)Minimises MSE
Median of posteriormedian of \(f_{\Theta|X}\)Minimises mean absolute error

For symmetric unimodal posteriors (e.g. Gaussian), all three coincide.

4. LLMS Estimation

The LMS estimator requires knowledge of the full joint distribution. The Linear Least Mean Squares (LLMS) estimator restricts to linear estimators \(\hat{\Theta} = aX + b\):

LLMS Estimator

\[\hat{\Theta}_{\text{LLMS}} = \mathbb{E}[\Theta] + \frac{\text{Cov}(\Theta, X)}{\text{Var}(X)}\bigl(X - \mathbb{E}[X]\bigr)\]

Equivalently: \(\hat{\Theta}_{\text{LLMS}} = \mathbb{E}[\Theta] + \rho\,\frac{\sigma_\Theta}{\sigma_X}\bigl(X - \mathbb{E}[X]\bigr)\), where \(\rho = \text{Corr}(\Theta, X)\).

The coefficients are:

\[a = \frac{\text{Cov}(\Theta, X)}{\text{Var}(X)}, \qquad b = \mathbb{E}[\Theta] - a\,\mathbb{E}[X]\]

MSE of LLMS

\[\mathbb{E}\bigl[(\Theta - \hat{\Theta}_{\text{LLMS}})^2\bigr] = (1 - \rho^2)\,\text{Var}(\Theta)\]

When \(\rho = \pm 1\): perfect linear relationship, zero error. When \(\rho = 0\): \(X\) is useless, estimate degrades to the prior mean \(\mathbb{E}[\Theta]\).

LLMS with Multiple Observations

Given observations \(\mathbf{X} = (X_1, \ldots, X_n)\), the LLMS estimator is:

\[\hat{\Theta}_{\text{LLMS}} = \mathbb{E}[\Theta] + \mathbf{c}^\top(\mathbf{X} - \mathbb{E}[\mathbf{X}])\]

where \(\mathbf{c} = \text{Cov}(\mathbf{X}, \mathbf{X})^{-1}\,\text{Cov}(\mathbf{X}, \Theta)\). This is exactly the solution from linear regression.

5. Conjugate Priors

A prior is conjugate to a likelihood if the posterior is in the same family as the prior. This makes Bayesian updating analytically tractable.

LikelihoodConjugate PriorPosterior
Binomial(\(n,p\))Beta(\(a,b\))Beta(\(a+k, b+n-k\))
Poisson(\(\lambda\))Gamma(\(a,b\))Gamma(\(a+\sum x_i, b+n\))
Normal(\(\mu, \sigma^2\)) known \(\sigma^2\)\(\mathcal{N}(\mu_0, \sigma_0^2)\)Normal (precision-weighted average)

With a Beta(\(a,b\)) prior and observing \(k\) successes in \(n\) trials, the posterior mean is \((a+k)/(a+b+n)\) — a weighted combination of the prior mean and the observed fraction \(k/n\).