Shannon's Information Theory

In a single 1948 paper, Claude Shannon defined what information is — and proved you can send it perfectly through a noisy world.

Before 1948, “information” was a fuzzy word. Engineers building telephones and telegraphs had no quantitative grip on what they were sending. They had hunches: longer messages were heavier, important messages were bigger, noise made things worse. They had no formula. There was no number you could attach to a sentence and say, this contains exactly that much information.

Then Claude Shannon, a 32-year-old at Bell Labs, published A Mathematical Theory of Communication. Almost overnight, information became a measurable physical quantity, like mass or charge. The paper laid down two theorems that still bound everything from your Wi-Fi router to a probe at the edge of the solar system, and a definition of information that has since migrated into physics, biology and neuroscience. It is the most consequential paper most people have never read.

The trick: ignore meaning

Shannon's first move was to throw away the thing everyone thought information was about. The meaning of a message, he wrote, is “irrelevant to the engineering problem.” What matters is only that the receiver reconstructs which message, out of all the possible messages, the sender chose.

This is liberating. It turns information into a question about uncertainty. If I am about to tell you whether my coin came up heads or tails, you have one bit of uncertainty. After I tell you, you have none. The message resolved one bit of uncertainty — so it carried one bit of information. If I am about to tell you whether the sun rose this morning, you have approximately zero uncertainty. The message carries approximately zero information, regardless of how loudly I shout it.

Shannon made this precise. For a source that emits symbol i with probability p_i, define

H = − Σ p_i log₂ p_i

This is the entropy of the source, measured in bits per symbol. It is the average surprise of the next symbol. A fair coin has entropy 1 bit. A coin that lands heads 90% of the time has entropy about 0.47 bits — less surprising on average, less informative. A coin glued to heads has entropy 0.

Binary entropy. A fair coin (p = 0.5) is maximally surprising. A loaded coin carries less information per flip; a fixed coin, none.

Once you have entropy, the first big result follows almost as bookkeeping. Shannon's source coding theorem says: any source of entropy H can be losslessly compressed to an average of H bits per symbol, and no further. English text has an entropy of roughly one bit per letter, even though we usually write it with five. The other four bits are redundancy — the slack that lets you read a smudged page or autocomplete a half-typed word. Every ZIP, JPEG and MP3 in the world is, at heart, an attempt to scrape away that redundancy and squeeze a stream toward its entropy floor.

The impossible result: noise without errors

The second theorem is the strange one. Engineers in the 1940s assumed an obvious tradeoff: the noisier the channel, the more errors you make, and the only fix is to slow down or shout louder. Reliability was bought at the price of speed. It felt like a law.

Shannon proved it is not. For every channel — every wire, every radio link, every fibre — there is a number C, the capacity, measured in bits per second. As long as you transmit information at any rate R below C, you can drive your error rate as close to zero as you like, by choosing a clever enough code. The catch is the codes have to be long. But there is no trade. Reliability and speed come apart.

Shannon's communication system. The encoder pads the message M into a longer codeword X; noise corrupts it into Y; the decoder uses the redundancy to recover M.

The proof is famously non-constructive. Shannon showed that random codes hit the limit on average, which means good codes must exist — without telling anyone how to build one. The next sixty years of coding theory were a search for codes that could actually be encoded and decoded by physical hardware while staying near C. Hamming codes, convolutional codes, Reed–Solomon, turbo codes, LDPC, polar codes — each generation crept closer to the wall Shannon drew.

“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.”

Why it underwrites the modern world

If you want to feel the weight of the theorem, look up at Voyager 1. It is more than 24 light-hours from Earth, transmitting with the power of a refrigerator bulb across a noise floor that swamps its signal billions of times over. By any pre-Shannon intuition, the link is impossible. It works because the spacecraft uses a code — a concatenation of Reed–Solomon and convolutional codes — chosen specifically to operate near the channel's Shannon capacity. The same is true of every cellular handset, every satellite phone, every hard drive, every DVD scratch you don't notice. The redundancy was designed in by people who were aiming at C.

The other half of the paper, source coding, runs the modern web. JPEG compresses photos by stripping their entropy down toward the limit. MP3 and Opus do the same for sound. Genome sequencers, weather satellites, and database systems all lean on the fact that real data has lower entropy than its raw representation, and that the gap can be reclaimed.

And the framework escaped its container. Once you accept that information is −log probability, the same machinery starts to apply to physics (Landauer's principle, that erasing a bit costs kT ln 2 of heat; Bekenstein's bound on the bits a region of space can hold), to biology (the channel capacity of a synapse, the bits encoded in a strand of DNA), and to inference itself (Bayesian updating is, in a precise sense, the consumption of bits). Shannon thought he was solving a problem about telephones. He had quietly handed the rest of science a new fundamental quantity.

That is the strangest thing about the 1948 paper. It reads like an engineering report. There are no philosophical flourishes, no claims about meaning or mind. And yet, by the time you reach the end, the world has been re-described. Information stops being a metaphor and becomes a thing you can count.

Shannon's Information Theory

The trick: ignore meaning

The impossible result: noise without errors

Why it underwrites the modern world

Further reading