Conditional probability, the multiplication rule, total probability, Bayes' rule, and independence
Conditioning is the single most important operation in probability. When we learn that some event \(B\) has occurred, we update our beliefs about other events. Bayes' rule tells us how to reverse the direction of conditioning — inferring causes from effects.
\[\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}, \quad \mathbb{P}(B) > 0\]
Read: "the probability of \(A\) given that \(B\) occurred." Conditioning restricts the sample space to \(B\) and renormalises. Conditional probabilities satisfy all the standard probability axioms within the reduced space.
Generalised to \(n\) events:
\[\mathbb{P}(A_1 \cap \cdots \cap A_n) = \mathbb{P}(A_1)\prod_{i=2}^n \mathbb{P}(A_i \mid A_1 \cap \cdots \cap A_{i-1})\]Partition \(\Omega\) into disjoint events \(A_1, A_2, A_3, \ldots\) Then for any event \(B\):
\[\mathbb{P}(B) = \sum_i \mathbb{P}(A_i)\,\mathbb{P}(B \mid A_i)\]
This decomposes a hard probability into simpler conditional ones. Think of \(A_i\) as scenarios: the total probability of \(B\) is the weighted average of its conditional probability under each scenario.
\[\mathbb{P}(A_i \mid B) = \frac{\mathbb{P}(A_i)\,\mathbb{P}(B \mid A_i)}{\sum_j \mathbb{P}(A_j)\,\mathbb{P}(B \mid A_j)}\]
Bayes' rule reverses the direction of conditioning. Given a cause \(A_i\) and an effect \(B\), it computes the probability of the cause given the effect. The key quantities:
\(A\) and \(B\) are independent iff any of these equivalent conditions hold:
Independence means knowing \(B\) gives no information about \(A\). Key fact: if \(A\) and \(B\) are independent, then so are \(A\) and \(B^c\), \(A^c\) and \(B\), and \(A^c\) and \(B^c\).
Caution: Mutually exclusive events (with positive probability) are not independent — if \(A\) occurs you know \(B\) did not.
\(A\) and \(B\) are conditionally independent given \(C\) if:
\[\mathbb{P}(A \cap B \mid C) = \mathbb{P}(A \mid C)\,\mathbb{P}(B \mid C)\]Conditional independence does not imply (unconditional) independence, and vice versa.
Events \(A_1, \ldots, A_n\) are (mutually) independent if for every subset \(S \subseteq \{1,\ldots,n\}\):
\[\mathbb{P}\!\left(\bigcap_{i \in S} A_i\right) = \prod_{i \in S} \mathbb{P}(A_i)\]Pairwise independence does not imply mutual independence.
Bayes' rule is the foundation of Bayesian inference, medical diagnostics, spam filtering, and machine learning. A classic example: a disease test with 99% accuracy sounds reliable, but if the disease affects 1 in 1000 people, a positive test result still has only ~9% chance of being correct (most positives are false alarms from the healthy majority). Bayes' rule makes this precise.