Markov Chains — Probability From Scratch

The Markov property, transition matrices, recurrence, and convergence to stationarity

1. The Markov Property

A sequence of random variables \(X_0, X_1, X_2, \ldots\) taking values in a state space \(\mathcal{S}\) is a Markov chain if the future depends on the present only — not on the past:

Markov Property

\[\mathbb{P}(X_{n+1} = j \mid X_0, X_1, \ldots, X_n) = \mathbb{P}(X_{n+1} = j \mid X_n)\]

The chain is time-homogeneous if the transition probabilities do not depend on \(n\). We assume this throughout. The transition probability from state \(i\) to state \(j\) is \(p_{ij} = \mathbb{P}(X_{n+1} = j \mid X_n = i)\).

2. Transition Matrix

The transition matrix \(P\) collects all one-step transition probabilities: \(P_{ij} = p_{ij}\). Properties:

The \(n\)-step transition probability is given by the \(n\)-th matrix power: \(\mathbb{P}(X_n = j \mid X_0 = i) = (P^n)_{ij}\).

Figure 1 — A 3-state Markov chain. Each arrow is labeled with its transition probability; each row of probabilities sums to 1.

3. State Classification

Accessibility and Communication

State \(j\) is accessible from \(i\) (written \(i \to j\)) if \((P^n)_{ij} > 0\) for some \(n \geq 0\). States \(i\) and \(j\) communicate (\(i \leftrightarrow j\)) if each is accessible from the other. Communication is an equivalence relation; it partitions the state space into communicating classes.

Recurrence and Transience

A finite irreducible Markov chain is always recurrent. If one state in a communicating class is recurrent (or transient), all states in that class are.

Periodicity

The period of state \(i\) is \(d(i) = \gcd\{n \geq 1 : (P^n)_{ii} > 0\}\). A state is aperiodic if \(d(i) = 1\). An irreducible chain is aperiodic if any state is aperiodic (e.g., if it has a self-loop).

4. Stationary Distribution

Stationary Distribution

A distribution \(\pi = (\pi_1, \pi_2, \ldots)\) is stationary if \(\pi P = \pi\), i.e.:

\[\pi_j = \sum_i \pi_i \, p_{ij} \quad \text{for all } j, \qquad \sum_j \pi_j = 1\]

If the chain starts in the stationary distribution, it stays there at all future times. For an irreducible chain, a unique stationary distribution always exists, with \(\pi_i = 1/\mathbb{E}[T_i]\) where \(T_i\) is the mean return time to state \(i\).

5. Convergence to Stationarity

Convergence Theorem

For an irreducible, aperiodic chain with stationary distribution \(\pi\):

\[(P^n)_{ij} \to \pi_j \quad \text{as } n \to \infty \quad \text{for all } i, j\]

The chain "forgets" its starting state. The convergence rate is geometric in \(n\), governed by the second-largest eigenvalue of \(P\) (spectral gap).

Detailed Balance

then \(\pi\) is the stationary distribution. This is the detailed balance (or reversibility) condition — the probability flux from \(i\) to \(j\) equals the flux from \(j\) to \(i\). Chains satisfying this are called reversible.

6. Random Walk on a Graph

The simple random walk on a graph \(G = (V, E)\): at each step, move uniformly to one of the current node's neighbours. The transition probability is \(p_{ij} = 1/\deg(i)\) for each neighbour \(j\).

The stationary distribution is \(\pi_i = \deg(i) / (2|E|)\) — nodes with higher degree are visited more often. This satisfies detailed balance.

7. Random Walk on \(\mathbb{Z}\)

The simple random walk on the integers: at each step, move right (+1) with probability \(p\) or left (−1) with probability \(q = 1 - p\).

8. Absorption Probabilities

In a chain with absorbing states (states the chain can never leave), define \(a_i = \mathbb{P}(\text{absorbed into state } s \mid X_0 = i)\). The absorption probabilities satisfy:

This gives a linear system that can be solved for all transient states. Expected absorption times satisfy an analogous linear system.