Covariance, Correlation & Conditioning

Measuring dependence, conditional expectation as a random variable, and the law of total variance

1. Covariance

Definition

\[\text{Cov}(X,Y) = \mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])] = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]\]

Covariance measures how \(X\) and \(Y\) vary together. Positive: they tend to move in the same direction. Negative: opposite directions. Zero: uncorrelated (but not necessarily independent).

Properties

\(\text{Cov}(X,X) = \text{Var}(X)\)
\(\text{Cov}(X,Y) = \text{Cov}(Y,X)\)
\(\text{Cov}(X,a) = 0\) for constant \(a\)
\(\text{Cov}(aX, bY) = ab\,\text{Cov}(X,Y)\)
\(\text{Cov}(W+X, Y+Z) = \text{Cov}(W,Y) + \text{Cov}(W,Z) + \text{Cov}(X,Y) + \text{Cov}(X,Z)\)
If \(X \perp Y\): \(\text{Cov}(X,Y) = 0\). Converse is false in general.

Variance of a Sum

\[\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X,Y)\]

For \(n\) random variables (identical distribution, same pairwise covariance):

\[\text{Var}(X_1 + \cdots + X_n) = n\,\text{Var}(X_1) + 2\binom{n}{2}\text{Cov}(X_1, X_2)\]

2. Correlation

Correlation

\[\text{Corr}(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\,\text{Var}(Y)}}\]

Correlation is a dimensionless, scale-invariant measure of linear dependence: \(-1 \leq \text{Corr}(X,Y) \leq 1\). \(\text{Corr} = \pm 1\) iff one is an exact linear function of the other. For any constants \(a,b,c,d\) with \(ac > 0\): \(\text{Corr}(aX+b, cY+d) = \text{Corr}(X,Y)\).

3. Conditional Expectation as a Random Variable

When we condition on a random variable \(Y\) rather than a fixed value, \(\mathbb{E}[X \mid Y]\) is itself a random variable — a function of \(Y\). Write \(\mathbb{E}[X \mid Y] = g(Y)\) where \(g(y) = \mathbb{E}[X \mid Y=y]\).

Law of Iterated Expectations (Tower Property)

\[\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X \mid Y]]\]

The expected value of the conditional expectation equals the unconditional expectation. Outer \(\mathbb{E}\) averages over the randomness in \(Y\).

4. Law of Total Variance

Law of Total Variance (Eve's Law)

\[\text{Var}(X) = \mathbb{E}[\text{Var}(X \mid Y)] + \text{Var}(\mathbb{E}[X \mid Y])\]

The total variance of \(X\) splits into two parts:

Within-group variance: \(\mathbb{E}[\text{Var}(X \mid Y)]\) — average variability within each value of \(Y\).
Between-group variance: \(\text{Var}(\mathbb{E}[X \mid Y])\) — variability of the group means.

Exactly the same decomposition as ANOVA in statistics.

5. Sum of a Random Number of Random Variables

Let \(Y = X_1 + \cdots + X_N\) where \(N\) is itself a random variable independent of the i.i.d. \(X_i\). Then:

\[\mathbb{E}[Y] = \mathbb{E}[N]\,\mathbb{E}[X]\] \[\text{Var}(Y) = \mathbb{E}[N]\,\text{Var}(X) + (\mathbb{E}[X])^2\,\text{Var}(N)\]

Derived by conditioning on \(N\) and applying the laws of total expectation and total variance.

6. Key Inequalities

Cauchy-Schwarz

\(|\mathbb{E}[XY]| \leq \sqrt{\mathbb{E}[X^2]\,\mathbb{E}[Y^2]}\). Equivalently, \(|\text{Corr}(X,Y)| \leq 1\).

Markov's Inequality

For \(X \geq 0\) and \(a > 0\): \(\mathbb{P}(X \geq a) \leq \mathbb{E}[X]/a\).

Chebyshev's Inequality

\(\mathbb{P}(|X - \mu| \geq c) \leq \sigma^2/c^2\) for \(\mathbb{E}[X]=\mu\), \(\text{Var}(X)=\sigma^2\).

Chebyshev follows from Markov applied to \((X-\mu)^2\). Both are distribution-free but loose — they hold for any distribution, which is why they can't be tight.