Classical bits, the matrix formalism, Landauer's principle, and the geometry of a qubit's world
Information cannot be stored in a system that is always the same. A sky that is perpetually blue tells you nothing; a coin that always shows heads cannot encode a message. The minimal requirement for encoding any information is that the system has at least two distinguishable states. We call such a system a bit.
Familiar examples: a coin (heads / tails), a switch (on / off), a magnetic domain (polarised up / down). The key insight from information theory is quantitative:
\[\text{If } n \text{ bits are conveyed, the receiver can distinguish between } N = 2^n \text{ possible situations.}\]Conversely, to encode a message that selects one of \(N\) equally likely options, you need \(\log_2 N\) bits. Eight bits (one byte) can represent \(2^8 = 256\) distinct values — enough for every character on an English keyboard.
Classical computers operate by Boolean information processing: manipulating strings of bits according to logical rules. The question driving this course is: what happens when we miniaturise a bit down to a single electron spin or a photon's polarisation? The answer is quantum mechanics, and it changes everything.
To describe gates systematically — especially once we move to qubits — we need a language that handles both single- and multi-bit operations. The right language is linear algebra. We represent each bit state as a column vector:
\[0 \equiv \begin{bmatrix}1\\0\end{bmatrix}, \qquad 1 \equiv \begin{bmatrix}0\\1\end{bmatrix}\]This looks wasteful — two numbers to represent one bit — but the payoff is immediate: logic gates become matrix multiplications. Whatever a gate does to a bit, it can be written as a matrix acting on the column vector.
There are only two operations you can perform on a single bit: do nothing (Identity), or flip it (NOT). In matrix form:
\(I|0\rangle = |0\rangle,\quad I|1\rangle = |1\rangle\)
\(X|0\rangle = |1\rangle,\quad X|1\rangle = |0\rangle\)
Verify: \(X \begin{bmatrix}1\\0\end{bmatrix} = \begin{bmatrix}0\\1\end{bmatrix}\). The matrix flips the two components, which is exactly a bit-flip. We use the name \(X\) because in quantum information this gate is also called the Pauli-X operator — more on that later.
For two bits, there are four possible states: 00, 01, 10, 11. We represent these as 4-component column vectors via the tensor product (also called the Kronecker product, denoted \(\otimes\)):
\[00 \equiv \begin{bmatrix}1\\0\end{bmatrix}\otimes\begin{bmatrix}1\\0\end{bmatrix} = \begin{bmatrix}1\\0\\0\\0\end{bmatrix}, \quad 01 \equiv \begin{bmatrix}0\\1\\0\\0\end{bmatrix}, \quad 10 \equiv \begin{bmatrix}0\\0\\1\\0\end{bmatrix}, \quad 11 \equiv \begin{bmatrix}0\\0\\0\\1\end{bmatrix}\]In general, \(n\) bits require a \(2^n\)-component column vector. A six-bit string like 001011 is the tensor product of all six individual vectors. This exponential growth in the classical description is one reason quantum computing is powerful — a qubit system of \(n\) qubits also lives in a \(2^n\)-dimensional space, but the quantum state can encode superpositions of all \(2^n\) basis states simultaneously.
The most important two-bit gate is the Controlled-NOT (CNOT), also called CX. It flips the target bit if and only if the control bit is 1. The truth table and matrix:
| Input (control, target) | Output (control, target) |
|---|---|
| 00 | 00 |
| 01 | 01 |
| 10 | 11 ← target flipped |
| 11 | 10 ← target flipped |
This is the quantum analogue of XOR. Later we will see that CNOT is the key gate for generating entanglement between qubits — one of the genuinely non-classical resources of quantum computing.
Here is a subtle but profound question: does computation require energy? In principle, gates like NOT and CNOT don't need to dissipate energy — they are reversible operations (you can always undo them). Billiard-ball computers, where elastic collisions of balls implement logic gates, illustrate that reversible computation can in principle be frictionless.
But there is one operation that cannot be made reversible: erasing a bit. Erasing means taking a bit from some unknown state (0 or 1) and resetting it unconditionally to 0. This is irreversible — knowing the output (always 0) tells you nothing about the input.
Erasing one bit of information at temperature \(T\) necessarily dissipates at least
\[E_{\min} = k_B T \ln 2\]as heat into the environment, where \(k_B\) is Boltzmann's constant.
This is not an engineering limitation — it is a fundamental thermodynamic fact. The argument: a bit can be in the left half (state 0) or right half (state 1) of a one-molecule gas cylinder. Erasing means compressing the gas from volume \(V\) to \(V/2\), regardless of which half it was in. An isothermal compression from \(V\) to \(V/2\) at temperature \(T\) releases exactly \(k_B T \ln 2\) as heat:
\[W = \int_{V}^{V/2} P\,dV = k_B T \ln\frac{V/2}{V} = -k_B T \ln 2\]The magnitude \(k_B T \ln 2\) is the minimum energy cost of bit erasure. At room temperature (\(T \approx 300\) K), this is around \(3 \times 10^{-21}\) joules — tiny, but nonzero.
As transistors shrink toward atomic scales, the energy cost of bit erasure becomes a hard physical limit. This is why reversible and quantum computation, which avoid unnecessary erasure, are not just elegant — they may be thermodynamically necessary at the nanoscale. It also resolves Maxwell's Demon: the demon must erase its memory of each measurement, paying the Landauer cost and preserving the second law.
What happens when we push a bit down to the smallest possible physical system — a single electron spin, or a single photon? The object stops behaving like a classical bit and becomes a qubit (quantum bit). To understand why, we need to look at how spins actually behave in experiments.
A spin-1/2 particle (like an electron or a silver atom) has a tiny magnetic moment — it acts like a miniature bar magnet. Classically, you'd expect this magnetic moment to point in some arbitrary direction, giving a continuous range of values when measured. The Stern-Gerlach experiment (1922) tests exactly this.
The experiment reveals two shattering results:
These facts mean the spin cannot be described as a classical bit. It occupies a richer, inherently quantum space of states — and that space is exactly what the qubit formalism captures.
Paul Dirac devised a compact, elegant notation for quantum states that is now universal in the field. The two key objects are:
For a qubit, the two computational basis states are:
\[|0\rangle \equiv \begin{bmatrix}1\\0\end{bmatrix}, \qquad |1\rangle \equiv \begin{bmatrix}0\\1\end{bmatrix}\]These are the quantum analogues of classical 0 and 1. The reason we call them a "basis" is that any qubit state can be written as a linear combination of them.
The inner product of two kets \(|\chi\rangle\) and \(|\xi\rangle\) is \(\langle\chi|\xi\rangle\) — multiply the bra of the first by the ket of the second to get a number (possibly complex). The basis states satisfy:
\[\langle 0|0\rangle = 1, \quad \langle 1|1\rangle = 1, \quad \langle 0|1\rangle = 0, \quad \langle 1|0\rangle = 0\]They are orthonormal — normalised (inner product with themselves = 1) and orthogonal to each other (inner product between them = 0). A state is normalised when \(\langle\psi|\psi\rangle = 1\), which requires \(|\alpha|^2 + |\beta|^2 = 1\).
From the Stern-Gerlach results, we know a qubit can be in any of infinitely many states — one for each direction on the unit sphere. Let the state be parameterised by two angles \(\theta \in [0, \pi]\) and \(\phi \in [0, 2\pi)\). The state vector of a general qubit is:
where \(\alpha = \cos\frac{\theta}{2}\) and \(\beta = e^{i\phi}\sin\frac{\theta}{2}\) are complex amplitudes satisfying \(|\alpha|^2 + |\beta|^2 = 1\).
This is the most important equation in this lecture. A few things to notice:
How do we extract probabilities from this state vector? The Born rule is the fundamental postulate connecting quantum states to measurable outcomes:
The probability that a system in state \(|\chi\rangle\) is found in state \(|\xi\rangle\) upon measurement is:
\[P_{|\chi\rangle \to |\xi\rangle} = |\langle\chi|\xi\rangle|^2\]This immediately reproduces the Stern-Gerlach result: a spin prepared at angle \(\theta\) to the \(z\)-axis, when measured along \(z\), jumps to \(|0\rangle\) with probability \(\cos^2(\theta/2)\) and to \(|1\rangle\) with probability \(\sin^2(\theta/2)\). The probabilities sum to 1 because the state is normalised.
The state \(|+\rangle = \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)\) is often described as "50% chance of 0, 50% chance of 1." But this is misleading. A classical coin that lands heads with probability 1/2 has a definite state — we just don't know it. The qubit in \(|+\rangle\) genuinely has no definite value of spin-z before measurement. The difference is observable: measure the qubit along the \(x\)-axis instead, and you get a definite result of \(+1\) every time (since \(|+\rangle\) is the \(x\)-eigenstate). Classical uncertainty gives a different prediction. Quantum superposition is a fundamentally different ontological state — not ignorance, but genuine indefiniteness.
Could we get away with real amplitudes? In principle the probabilities only care about \(|\alpha|^2\) and \(|\beta|^2\), so real numbers seem sufficient. The reason we need complex numbers is that quantum gates act on states by matrix multiplication. If we want the modulus and phase of the amplitudes to change under gate operations in a linear (hence tractable) way, we need complex numbers. Restricting to reals gives highly non-linear dependences — the algebra breaks.
The general qubit state can always be written as:
\[|\psi(\theta,\phi)\rangle = \cos\frac{\theta}{2}|0\rangle + e^{i\phi}\sin\frac{\theta}{2}|1\rangle = |0\rangle\langle 0|\psi\rangle + |1\rangle\langle 1|\psi\rangle\]This means \(|0\rangle\) and \(|1\rangle\) form a complete basis: any qubit state can be expressed as a linear combination of them. Extracting the coefficient of each basis state is done by the inner product: the coefficient of \(|0\rangle\) is \(\langle 0|\psi\rangle = \cos(\theta/2)\).
The operator \(|0\rangle\langle 0| + |1\rangle\langle 1|\) acting on any state returns that state unchanged. This is the completeness relation:
\[|0\rangle\langle 0| + |1\rangle\langle 1| = \mathbb{I}\]The space of all normalised qubit states — 2-component complex column vectors with unit norm — is called the Hilbert space of the qubit, denoted \(\mathcal{H} \cong \mathbb{C}^2\). Any two linearly independent states (states with non-zero inner product with each other) span this space and can serve as a basis. The choice of basis is just a choice of measurement axis.
The qubit has a pure state description: the state vector \(|\psi\rangle\) gives a complete description when the system is in a definite quantum state. When the state is uncertain (a statistical mixture), one needs the density operator \(\rho = |\psi\rangle\langle\psi|\) — a generalisation introduced in later lectures.
The two angles \(\theta\) and \(\phi\) in the state vector define a point on the surface of a unit sphere. This is the Bloch sphere — a complete geometric picture of all pure qubit states. Every point on the surface corresponds to a distinct qubit state; no two points correspond to the same physical state (up to the irrelevant global phase).
Key geometric facts to remember:
Suppose you try to prepare the state \(|\psi\rangle\) but end up with \(|\phi\rangle\) instead (due to imperfections in your lab). How well did you do? The natural measure is the fidelity:
Ranges from 0 (orthogonal states, no overlap) to 1 (identical states, perfect preparation).
Fidelity equals the probability that the prepared state \(|\phi\rangle\) would pass a test for being \(|\psi\rangle\) — i.e., the probability it gives the right answer when measured in the \(\{|\psi\rangle, |\psi^\perp\rangle\}\) basis. It is the standard figure of merit for state preparation in quantum experiments and quantum error correction.
An important limitation: if you are given an unknown qubit state and can only measure in a fixed basis (say the Z basis \(\{|0\rangle, |1\rangle\}\)), you cannot fully reconstruct it from one measurement. Averaged over all possible qubit states, the fidelity of single-measurement reconstruction is only \(2/3\). Full reconstruction — quantum state tomography — requires measurements in multiple bases.
The Hadamard gate \(H\) is arguably the most important single-qubit gate in quantum computing. It maps the computational basis states \(\{|0\rangle, |1\rangle\}\) to the equal superposition states \(\{|+\rangle, |-\rangle\}\):
The matrix form shows that \(H\) takes a bit in a definite state and creates a perfect quantum superposition. Applied to all \(n\) qubits of a register simultaneously, Hadamard creates a uniform superposition of all \(2^n\) computational basis states — this is the starting point for Grover's search algorithm and Deutsch-Jozsa.
A crucial property: \(H^2 = \mathbb{I}\). The Hadamard is its own inverse — applying it twice returns you to the original state. Geometrically, \(H\) corresponds to a \(\pi\) rotation about the axis midway between \(x\) and \(z\) on the Bloch sphere.
Beginner intuition says: if H creates a 50-50 superposition, it must be a half-turn (\(\pi/2\) rotation). Wrong. A \(\pi/2\) rotation about the \(x\)-axis would give \((H_{x/2})^2 = R_x(\pi) = -iX\), not the identity. Hadamard squares to the identity, so it must be a \(\pi\) rotation — just about a diagonal axis (the \((x+z)/\sqrt{2}\) axis). This is why applying H twice recovers the original state, not a further rotation.
The three Pauli operators \(X\), \(Y\), \(Z\) are the fundamental single-qubit gates. Each one implements a \(\pi\)-rotation (180°) about its corresponding axis on the Bloch sphere.
π rotation about x-axis.
Maps |0⟩↔|1⟩, |+⟩→|+⟩, |−⟩→−|−⟩
π rotation about z-axis.
Maps |0⟩→|0⟩, |1⟩→−|1⟩, |+⟩↔|−⟩
π rotation about y-axis. Note: \(Y = -iZX\).
The Pauli operators satisfy a set of elegant algebraic identities that are indispensable in quantum circuit design:
\[X^2 = Y^2 = Z^2 = \mathbb{I}, \qquad Y = -iZX\]The first identity (\(P^2 = \mathbb{I}\)) says each Pauli is its own inverse — applying it twice returns to the original state, since a \(2\pi\) rotation returns to the start. The second identity connects the three Paulis and is useful for simplifying circuits.
The Hadamard interconverts the \(X\) and \(Z\) bases:
\[HZH = X, \qquad HXH = Z\]Reading this geometrically: conjugating by \(H\) swaps the \(x\)- and \(z\)-axes on the Bloch sphere. This is the transformation that converts a \(Z\)-basis measurement into an \(X\)-basis measurement — which is exactly how you measure in the \(\{|+\rangle, |-\rangle\}\) basis using standard \(Z\) detectors.
A qubit is a 2-state quantum system. Most physical systems are not 2-state — for example, an electron's position in space has a continuous (infinite-dimensional) state space. A system with exactly \(d\) fully distinguishable states is called a qudit (quantum \(d\)-it).
A qudit has basis states \(|0\rangle, |1\rangle, \ldots, |d-1\rangle\) and its most general pure state is:
\[|\psi\rangle = \sum_{j=0}^{d-1} c_j\,|j\rangle, \qquad \sum_{j=0}^{d-1} |c_j|^2 = 1\]The generalisations of the Pauli matrices to a qudit are the operators \(X_d\) and \(Z_d\), defined by their action on basis states:
\[X_d|j\rangle = |j \oplus 1\rangle, \qquad Z_d|j\rangle = e^{i\frac{2\pi j}{d}}|j\rangle\]where \(\oplus\) denotes addition modulo \(d\). For \(d=2\), these reduce to the usual Pauli \(X\) and \(Z\). A qudit needs \(d^2 - 1\) real parameters to specify its state fully, compared to 2 for a qubit. The interest in qudits — particularly qutrits (\(d=3\)) — has grown in recent years as physical implementations (photon orbital angular momentum, molecular spin states, superconducting transmons) have made them accessible.
| Concept | Key Equation / Fact |
|---|---|
| Bit representation | \(0 \equiv \begin{bmatrix}1\\0\end{bmatrix},\; 1 \equiv \begin{bmatrix}0\\1\end{bmatrix}\) |
| NOT gate | \(X = \begin{bmatrix}0&1\\1&0\end{bmatrix}\); \(X|0\rangle=|1\rangle\) |
| CNOT gate | Flips target iff control is \(|1\rangle\); 4×4 matrix |
| Landauer's principle | Erasing one bit dissipates \(\geq k_BT\ln 2\) as heat |
| General qubit state | \(|\psi\rangle = \cos\frac{\theta}{2}|0\rangle + e^{i\phi}\sin\frac{\theta}{2}|1\rangle\) |
| Born rule | \(P_{|\chi\rangle\to|\xi\rangle} = |\langle\chi|\xi\rangle|^2\) |
| Completeness | \(|0\rangle\langle 0| + |1\rangle\langle 1| = \mathbb{I}\) |
| Bloch sphere | Every pure qubit state is a point on the unit sphere |
| Hadamard | \(H|0\rangle = |{+}\rangle,\; H|1\rangle = |{-}\rangle,\; H^2 = \mathbb{I}\) |
| Pauli operators | \(X^2=Y^2=Z^2=\mathbb{I}\); each is a π-rotation on Bloch sphere |
| Fidelity | \(F(|\phi\rangle,|\psi\rangle) = |\langle\phi|\psi\rangle|^2 \in [0,1]\) |