Sample space, the axioms of probability, set theory tools, and combinatorics
Probability theory begins by making precise what we mean by "an experiment" and "how likely an outcome is." Everything follows from three simple axioms and a rigorous definition of a sample space. Counting techniques then let us compute probabilities when outcomes are equally likely.
An experiment is any process with an uncertain outcome. The sample space \(\Omega\) is the set of all possible outcomes. A good sample space must be:
An event \(A\) is any subset of \(\Omega\). Probability is assigned to events, not individual outcomes (except in discrete spaces).
Everything else in probability theory follows from these three axioms. Key consequences:
Bonferroni's Inequality: \(\mathbb{P}(A_1 \cap \cdots \cap A_n) \geq \mathbb{P}(A_1) + \cdots + \mathbb{P}(A_n) - (n-1)\).
If \(\Omega\) is finite with \(n\) equally likely outcomes, and event \(A\) contains \(k\) outcomes:
\[\mathbb{P}(A) = \frac{k}{n} = \frac{\text{number of elements in } A}{\text{total number of outcomes}}\]This is the classical probability model. Computing \(\mathbb{P}(A)\) reduces to counting.
Events are sets, so set operations carry direct probability interpretations:
\(\left(\bigcup_n S_n\right)^c = \bigcap_n S_n^c\) — the complement of a union is the intersection of complements.
\(\left(\bigcap_n S_n\right)^c = \bigcup_n S_n^c\) — the complement of an intersection is the union of complements.
If a selection is made in \(r\) stages with \(n_i\) choices at stage \(i\), the total number of selections is \(n_1 n_2 \cdots n_r\).
The number of ways to order \(n\) distinct elements: \(n! = 1 \cdot 2 \cdot 3 \cdots n\).
The number of ways to choose \(k\) elements from \(n\) (order doesn't matter):
\[\binom{n}{k} = \frac{n!}{k!(n-k)!}\]Given an \(n\)-element set and nonnegative integers \(n_1, \ldots, n_r\) summing to \(n\), the number of ways to partition into \(r\) disjoint subsets of sizes \(n_1, \ldots, n_r\) is:
\[\frac{n!}{n_1!\, n_2!\, \cdots\, n_r!}\]\(S = \sum_{i=0}^{\infty} \alpha^i = \frac{1}{1-\alpha}\) for \(|\alpha| < 1\). Finite version: \(\sum_{i=0}^{n-1}\alpha^i = \frac{1-\alpha^n}{1-\alpha}\).
Geometric series appear constantly in probability — summing over a geometric random variable's PMF, computing expected values, and verifying that PMFs normalise to 1.