The Fold
← Wonderland
A Science Explainer No. 002 — Structural Biology

Shape is
everything.  a story of 20,000 proteins

Proteins are the machines of life — enzymes, scaffolds, signals, motors, immune weapons. Every one of them works because of its shape. And for 50 years, predicting that shape from a genetic sequence alone was considered one of the hardest problems in science. Then, in 2020, a neural network solved it in an afternoon.

Unique human proteins
~20,000
Possible folds
10³⁰⁰
Time to fold (real)
Microseconds
AlphaFold DB structures
200M+

§ 01 — The BasicsWhat a Protein Is

Twenty letters.
Infinite sentences.

A protein is a chain of amino acids — small molecules strung together like beads on a string. There are 20 kinds of amino acid. That's the entire alphabet. From those 20 letters, evolution wrote every enzyme, every hormone, every muscle fiber, every antibody that has ever existed on Earth.

Primary
A G V K L P

Amino acid sequence

The linear order of amino acids, encoded directly by DNA. This alone determines everything that follows.

Secondary
α-HELIX

Local structures

Nearby amino acids hydrogen-bond into regular repeating shapes: alpha-helices (spirals) and beta-sheets (pleated planks).

Tertiary
3-D FOLD

The full 3D shape

The entire chain folds into a compact, specific 3D structure. This shape is the protein's identity — it determines what it can bind, what it catalyzes, what it builds.

Quaternary
MULTI-CHAIN

Multi-chain assembly

Some proteins are complexes of multiple chains working together. Hemoglobin has four. The ribosome has dozens. Each subunit folds independently, then docks.

Fold it yourself

Hydrophobic-Polar model · 20 residues · real physics simulation
H-H contacts: 0
Hydrophobic (H) — wants to hide from water
Polar (P) — happy on the surface
H-H contact (stabilising)

The driving force of folding: hydrophobic collapse. Amino acids that hate water bury themselves in the centre, dragging the chain into a compact shape. Polar residues stay on the outside. This simple principle governs the folding of every protein in your body.


§ 02 — FunctionSix Things Proteins Do

The molecule that
does everything.

Proteins don't just do one thing. They are simultaneously your muscles, your enzymes, your hormones, your immune system, and the motors inside every cell. Shape determines function — and evolution has found a shape for nearly every job that needs doing.

LOCK + KEY
Catalysis

Enzymes

Proteins that accelerate chemical reactions — often by factors of a million or more. Every metabolic reaction in your body is catalysed by an enzyme. Without them, life-chemistry would be too slow to support life.

eg. Amylase (digest starch) · DNA Polymerase (copy DNA)
COLLAGEN HELIX
Architecture

Structural proteins

The scaffolding of life. Collagen builds tendons, skin, and bone. Keratin makes hair and nails. Actin and tubulin form the internal skeleton of every cell, giving it shape and letting it move.

eg. Collagen · Keratin · Actin · Tubulin
HEMOGLOBIN
Transport

Carriers

Proteins that bind small molecules and carry them around. Hemoglobin picks up oxygen in the lungs and drops it at tissues. Albumin ferries fatty acids through the blood. Channels in cell membranes move ions across.

eg. Hemoglobin · Albumin · Ion channels
RECEPTOR
Communication

Signalling proteins

Hormones are proteins (insulin, growth hormone). Receptors are proteins. The entire conversation between cells — from "grow" to "die" to "release glucose" — is proteins talking to proteins.

eg. Insulin · GH receptor · GPCR · Kinases
IgG ANTIBODY
Defence

Immune proteins

Antibodies are proteins shaped precisely to grab pathogens. The MHC complex is a protein that presents pathogen fragments to T-cells. Complement proteins punch holes in bacteria. The immune system is essentially a protein army.

eg. IgG antibodies · MHC · Complement
MYOSIN MOTOR
Movement

Motor proteins

Myosin literally walks along actin filaments, converting ATP energy into mechanical steps. This is how muscles contract. Kinesin and dynein carry cargo along microtubule highways inside cells — protein trucks on protein roads.

eg. Myosin · Kinesin · Dynein · ATP synthase

§ 03 — ExplorerReal Molecular Structures

Drag. Rotate.
Actually look.

Every structure below is real, determined by X-ray crystallography or cryo-EM and deposited in the Protein Data Bank. Drag to rotate. Scroll to zoom. Switch proteins and view modes with the controls.

Fetching from Protein Data Bank…
Insulin
Glucose-regulating hormone, 51 amino acids
Signalling · Pancreatic β-cells · Nobel 1958

§ 04 — The Problem50 Years of Impossible

It folds in microseconds.
We couldn't predict it in a lifetime.

The sequence determines the shape — Christian Anfinsen proved this in 1972. But computing the shape from the sequence? That was a different matter. The search space is so vast it became its own paradox.

Levinthal's Paradox — 1969
10
possible conformations checked so far — at 10¹³ per second
Search space
10³⁰⁰
Possible conformations for a typical 100-residue protein if each bond can rotate freely. More than atoms in the observable universe.
vs
Actual folding time
10⁻⁶s
One microsecond. The protein doesn't search randomly — it follows a funnel-shaped energy landscape, collapsing directly toward the minimum-energy structure.
When folding goes wrong

Misfolding is not an edge case. It's a disease category.

When a protein folds into the wrong shape, it can become toxic — aggregating into sticky clumps that damage cells. These "proteopathies" are among the hardest diseases to treat, because the culprit is structurally almost identical to the normal protein.

Neurodegeneration

Alzheimer's disease

Amyloid-β peptides misfold and aggregate into plaques between neurons. Tau protein also misfolds into tangles inside them. Both are hallmarks of the disease, detected decades before symptoms.

Neurodegeneration

Parkinson's disease

Alpha-synuclein misfolds and forms Lewy bodies inside dopaminergic neurons. Why it misfolds in some people and not others remains unknown. Genetic variants in the protein itself are one risk factor.

Prion disease

CJD & BSE

Prion proteins are infectious misfolded proteins — the only known biological entity that replicates with no DNA or RNA. They convert normal proteins into their misfolded shape on contact. There is no cure.

Genetic

Cystic fibrosis

A single amino acid mutation (F508del) causes the CFTR protein to misfold and be destroyed before reaching the cell surface. No channel protein → no ion transport → thick mucus in lungs. The most common life-shortening mutation in people of European descent.

Metabolic

Type 2 diabetes

IAPP (islet amyloid polypeptide) misfolds and deposits in the pancreas, contributing to β-cell death. A different disease than Type 1, but protein aggregation is central to both.

Cancer

p53 mutations

p53 is the "guardian of the genome" — a protein that triggers cell death when DNA is damaged. 50% of all cancers carry a p53 mutation that causes it to misfold, lose its shape, and fail to function. Restoring its fold is an active drug target.

"If you know the sequence of a protein, you should be able to predict its structure — and from structure, its function."
— Christian Anfinsen, Nobel Lecture, 1972

§ 05 — HistoryThe 60-Year Race

From sequences
to AlphaFold.

1958
First sequence
Sanger sequences insulin

Frederick Sanger determines the complete amino acid sequence of insulin — the first protein ever sequenced. He wins the Nobel Prize. We can now read the letters. Predicting what they fold into is another matter entirely.

1962
First structure
Kendrew & Perutz solve myoglobin

X-ray crystallography gives us the first 3D protein structure. Myoglobin, the oxygen-storage protein in muscle, is resolved at atomic resolution. The structure takes years of data collection and months of hand-calculation. Kendrew and Perutz share the Nobel Prize in Chemistry.

1972
The Axiom
Anfinsen's thermodynamic hypothesis

Christian Anfinsen demonstrates that a denatured protein can refold spontaneously into its correct shape. His conclusion: the native structure is entirely determined by the amino acid sequence. This is the founding axiom of the folding problem — and his Nobel Prize. It also means: if you could compute it, you could predict any protein structure from its gene.

1969
The Paradox
Levinthal — the problem is impossibly hard

Cyrus Levinthal calculates that if a protein sampled random conformations, it would take longer than the age of the universe. Yet proteins fold in microseconds. This "Levinthal Paradox" defines the problem: there must be a guided pathway, but we don't know what it is.

1994
Competition
CASP begins — the protein Olympics

Critical Assessment of Structure Prediction: a biennial competition where teams predict protein structures from sequences alone, then real structures are revealed. The scoring system (GDT — Global Distance Test, 0–100) becomes the benchmark. Early winners score around 20–30. Human experts cluster around 90.

2018
Signal
AlphaFold 1 wins CASP13

DeepMind enters a neural network trained on evolutionary sequence data. It wins the competition with a GDT score of ~61, far ahead of all traditional methods. The biology community notices. Two years of intense development follow.

2020
Solved
AlphaFold 2 — GDT 92.4

At CASP14, AlphaFold 2 predicts structures with median GDT of 92.4 — exceeding human expert performance on many targets. John Moult, who co-founded CASP, says: "This is a watershed moment for biology." The 50-year-old problem is effectively solved. Science names it Breakthrough of the Year.

2021
Nature paper
The method is published

DeepMind and EMBL-EBI publish the AlphaFold 2 paper in Nature and simultaneously release the code. The architecture — attention-based neural network processing multiple sequence alignments into 3D geometry — becomes the most read structural biology paper ever.

2022
Database
200 million structures, free

The AlphaFold Protein Structure Database launches with predicted structures for nearly every known protein — including all 20,000 human proteins, all proteins in key model organisms, and UniProt's 200M+ entries. All free. All downloadable. The "dark proteome" — proteins never structurally characterised — is largely illuminated overnight.

2024
AlphaFold 3
Proteins + DNA + RNA + small molecules

AlphaFold 3 extends prediction to protein complexes, DNA, RNA, and small molecules — the full cast of biology. Drug discovery changes overnight. Hassabis and Jumper receive the Nobel Prize in Chemistry. The other half goes to David Baker for computational protein design — not just predicting natural proteins, but designing new ones from scratch.


§ 06 — The CASP Race26 Years of Progress, Then a Cliff

Decades of inches.
Then AlphaFold.

The CASP competition ran every two years from 1994. For 24 years, the best human methods improved by fractions of a point per competition. Then in 2020, AlphaFold 2 scored 92.4 — leaping past human expert performance in a single jump. Hover each point to see the story.

CASP competition — best GDT score by year

GDT (Global Distance Test) · 0 = random · 100 = perfect · ~90 = human expert
100 83 67 50 33 0 HUMAN EXPERT ~90

How AlphaFold 2 works — conceptually

1
Multiple Sequence Alignment

Start with your protein sequence. Search all known sequences across evolution for similar proteins. Co-evolving positions — pairs of residues that mutate together — reveal which pairs are physically close in 3D space. This is evolutionary geometry.

2
Attention over residues

A transformer-like attention mechanism processes every residue pair simultaneously, learning which positions constrain each other. Unlike older methods that looked locally, this sees the whole sequence globally at once. This is the key architectural insight.

3
Structure module — direct geometry

Instead of predicting distances and then assembling a structure, AlphaFold 2 directly outputs 3D rotation and translation for each residue using equivariant neural networks. Iterative refinement cycles ("recycling") tighten the prediction.

4
Confidence scoring (pLDDT)

For every residue, AlphaFold predicts its own confidence (pLDDT, 0–100). High confidence = reliable. Low confidence often means the region is intrinsically disordered in real life — which is itself biologically meaningful information.

pLDDT confidence coloring — example protein
Very high (>90) — confident Medium (70–90) Low (<70) — intrinsically disordered disorder ≠ wrong prediction it's often real biology

Disordered regions often become structured only when they bind a partner. AlphaFold's low-confidence flag is not a failure — it's pointing at something biologically interesting.


§ 07 — ImpactWhat Solving It Unlocks

A tool for
every disease.

The protein folding problem was not just an intellectual puzzle. It was the key blocking scientific progress in drug discovery, vaccine design, enzyme engineering, and our understanding of virtually every disease. Now that key turns.

01

Drug discovery

Immediate · Pharma + Biotech

Knowing a pathogen's protein structure immediately reveals binding pockets for drug molecules. AlphaFold structures have already been used to design drug candidates for malaria, tuberculosis, and antibiotic-resistant bacteria — in weeks rather than years.

02

Plastic-eating enzymes

Environment · Enzyme engineering

PETase, a bacterial enzyme that breaks down plastic bottles, was redesigned using structural knowledge for 6× better efficiency. AlphaFold structures of related enzymes now accelerate this further. Enzymatic plastic recycling may be economically viable within a decade.

03

mRNA vaccine design

Public health · Immunology

Designing vaccines requires knowing the exact shape of the antigen your immune system will train against. COVID mRNA vaccines were designed using knowledge of the spike protein structure. Future vaccines for RSV, HIV, and cancer will be faster to design.

04

Venom antidotes

Neglected disease · Global health

~138,000 people die from snakebite annually, mostly in the Global South. Antivenoms require milking live snakes and are expensive to produce. AlphaFold structures of venom proteins are enabling rational design of cheap, synthetic, broadly effective antidotes.

05

De novo protein design

David Baker · University of Washington

AlphaFold's inverse problem: given a desired function, design a sequence that folds into a shape that achieves it. Baker's lab (Nobel 2024) has designed proteins that don't exist in nature — vaccines, nanomachines, self-assembling materials — purely from first principles.

06

The dark proteome

Basic science · All diseases

Before AlphaFold, roughly 1/3 of human proteins had no known structure. These "dark proteome" proteins were often the most interesting — disease-relevant, drug targets, but too difficult to crystallise. AlphaFold predicted nearly all of them. Thousands of papers are being written about proteins that were structurally invisible last year.

"AlphaFold is a once in a generation advance — one of the most important scientific achievements in decades."
— Edith Heard, EMBL Director-General, 2021
§ Coda

50 years of impossible.
Then one afternoon.

The protein folding problem seemed intractable because we tried to solve it as a physics problem — simulate every atom. AlphaFold solved it as a pattern recognition problem: 200 million years of evolution had already run the experiment. The answers were hidden in the sequences all along. We just needed to learn to read them.

Evolution is the world's longest running simulation. AlphaFold learned to play it back.