Conditioning & Bayes' Rule

Conditional probability, the multiplication rule, total probability, Bayes' rule, and independence

1. Introduction

Conditioning is the single most important operation in probability. When we learn that some event \(B\) has occurred, we update our beliefs about other events. Bayes' rule tells us how to reverse the direction of conditioning — inferring causes from effects.

2. Conditional Probability

Definition

\[\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}, \quad \mathbb{P}(B) > 0\]

Read: "the probability of \(A\) given that \(B\) occurred." Conditioning restricts the sample space to \(B\) and renormalises. Conditional probabilities satisfy all the standard probability axioms within the reduced space.

Figure 1 — Conditioning on B restricts the sample space to B. P(A|B) is the fraction of B occupied by A∩B.

3. Multiplication Rule

\[\mathbb{P}(A \cap B) = \mathbb{P}(B)\,\mathbb{P}(A \mid B) = \mathbb{P}(A)\,\mathbb{P}(B \mid A)\]

Generalised to \(n\) events:

\[\mathbb{P}(A_1 \cap \cdots \cap A_n) = \mathbb{P}(A_1)\prod_{i=2}^n \mathbb{P}(A_i \mid A_1 \cap \cdots \cap A_{i-1})\]

4. Total Probability Theorem

Partition \(\Omega\) into disjoint events \(A_1, A_2, A_3, \ldots\) Then for any event \(B\):

Total Probability Theorem

\[\mathbb{P}(B) = \sum_i \mathbb{P}(A_i)\,\mathbb{P}(B \mid A_i)\]

This decomposes a hard probability into simpler conditional ones. Think of \(A_i\) as scenarios: the total probability of \(B\) is the weighted average of its conditional probability under each scenario.

5. Bayes' Rule

Bayes' Rule

\[\mathbb{P}(A_i \mid B) = \frac{\mathbb{P}(A_i)\,\mathbb{P}(B \mid A_i)}{\sum_j \mathbb{P}(A_j)\,\mathbb{P}(B \mid A_j)}\]

Bayes' rule reverses the direction of conditioning. Given a cause \(A_i\) and an effect \(B\), it computes the probability of the cause given the effect. The key quantities:

\(\mathbb{P}(A_i)\): the prior — what we believed before observing \(B\).
\(\mathbb{P}(B \mid A_i)\): the likelihood — how probable is \(B\) if \(A_i\) is true.
\(\mathbb{P}(A_i \mid B)\): the posterior — updated belief after observing \(B\).
Denominator \(\mathbb{P}(B)\): the normalising constant.

6. Independence

Independence of Two Events

\(A\) and \(B\) are independent iff any of these equivalent conditions hold:

\(\mathbb{P}(A \cap B) = \mathbb{P}(A)\,\mathbb{P}(B)\)
\(\mathbb{P}(A \mid B) = \mathbb{P}(A)\)
\(\mathbb{P}(B \mid A) = \mathbb{P}(B)\)

Independence means knowing \(B\) gives no information about \(A\). Key fact: if \(A\) and \(B\) are independent, then so are \(A\) and \(B^c\), \(A^c\) and \(B\), and \(A^c\) and \(B^c\).

Caution: Mutually exclusive events (with positive probability) are not independent — if \(A\) occurs you know \(B\) did not.

Conditional Independence

\(A\) and \(B\) are conditionally independent given \(C\) if:

\[\mathbb{P}(A \cap B \mid C) = \mathbb{P}(A \mid C)\,\mathbb{P}(B \mid C)\]

Conditional independence does not imply (unconditional) independence, and vice versa.

Independence of Multiple Events

Events \(A_1, \ldots, A_n\) are (mutually) independent if for every subset \(S \subseteq \{1,\ldots,n\}\):

\[\mathbb{P}\!\left(\bigcap_{i \in S} A_i\right) = \prod_{i \in S} \mathbb{P}(A_i)\]

Pairwise independence does not imply mutual independence.

7. Why Bayes' Rule Matters

Bayes' rule is the foundation of Bayesian inference, medical diagnostics, spam filtering, and machine learning. A classic example: a disease test with 99% accuracy sounds reliable, but if the disease affects 1 in 1000 people, a positive test result still has only ~9% chance of being correct (most positives are false alarms from the healthy majority). Bayes' rule makes this precise.