P versus NP

Some problems are easy to check but seem impossibly hard to solve. Whether those two things are secretly the same is the deepest open question in computer science — and worth a million dollars.

Picture a finished jigsaw puzzle. Glance at it and you can tell in a second whether it's correct: every piece fits, the picture is whole. Now picture the same puzzle dumped out in a heap. Assembling it from scratch is an afternoon of patient, exhausting trial and error. Checking a solution is trivial; finding one is hard.

That gap — between recognising an answer and producing one — is not a quirk of jigsaws. It shows up in scheduling, in circuit design, in protein structure, in cracking codes. And it hides a question that has resisted the world's best minds for half a century: when something is easy to check, must it also, somewhere, be easy to solve? That question is called P versus NP. The Clay Mathematics Institute lists it among its seven Millennium Prize Problems, with a million-dollar reward for a proof either way. Almost every expert bets the answer is “no.” Nobody can prove it.

Easy to check, hard to find

Consider a Sudoku grid. If I hand you a completed grid, you can verify it in moments: scan each row, column, and box for repeats. But if I hand you a blank one — or a large, fiendish variant — finding the solution may demand a great deal of searching. The same asymmetry runs through multiplication and its inverse. Multiplying two large prime numbers is the work of an instant. Going backwards — taking the product and recovering the original primes, the problem of factoring — is so hard that much of the internet's encryption rests on the difficulty.

Computer scientists make this precise with two classes of problem. P is the set of problems a computer can solve quickly — loosely, in a number of steps that grows as a polynomial in the size of the input (like n, n², or n³). NP is the set of problems whose answers a computer can check quickly, given a candidate solution. The Sudoku grid, the assembled jigsaw, the factored number: each is hard to find but fast to verify, so each lives in NP.

One direction is obvious. If you can solve a problem quickly, you can certainly check an answer quickly — just solve it yourself and compare. So everything in P is also in NP; P sits inside NP. The trillion-dollar question is whether the reverse holds. Is every quickly-checkable problem also quickly-solvable? Is NP really just P wearing a disguise?

Drawn the way almost everyone bets it looks: P sits strictly inside NP, with the NP-complete problems marooned at the hard frontier, outside P. If it turned out that P = NP, all three regions would collapse into one.

The keystone: NP-completeness

For a while these were just two interesting categories. Then, in 1971, Stephen Cook proved something that reorganised the whole landscape. He showed that one particular problem — Boolean satisfiability, or SAT, the question of whether a logical formula can be made true by some choice of true/false inputs — is, in a precise sense, the hardest problem in NP. Every other problem in NP can be efficiently translated into it. Leonid Levin reached the same conclusion independently in the Soviet Union, and we call the result the Cook–Levin theorem.

The translation trick is called a reduction. If problem A can be rewritten as problem B using only a quick, polynomial-time transformation, then any fast solution to B hands you a fast solution to A for free. The year after Cook's paper, Richard Karp showed that twenty-one famous problems — among them the Travelling Salesman problem, graph colouring, and the knapsack problem — all reduce to one another in this way. They form an interlocked club: the NP-complete problems.

Here is what makes the club extraordinary. Because every NP-complete problem reduces to every other, and because every problem in NP reduces to any one of them, they rise or fall together. Discover a genuinely fast algorithm for a single NP-complete problem and you would, in one stroke, have a fast algorithm for all of them — and for every problem in NP. P would equal NP. Conversely, prove that just one of them cannot have a fast algorithm, and they are all condemned, and P is forever smaller than NP. Thousands of problems across mathematics, biology, logistics, and economics have since been shown to be NP-complete. They are, deep down, the same problem.

A handful of NP-complete problems. Each can be translated into the others, so a single fast algorithm anywhere in this web would solve every problem in it — and all of NP besides.

Why anyone should care

It would be easy to mistake this for an abstraction that only logicians could love. It is the opposite. Suppose, against all expectation, that P = NP, and that the fast algorithm is not just theoretical but practical. The consequences would be staggering. Most modern cryptography assumes that certain problems — factoring, discrete logarithms — are hard to solve but easy to check. If solving became as easy as checking, secure communication as we know it would dissolve overnight.

But the upheaval would cut both ways, and the second edge is stranger. A vast range of human endeavour comes down to searching an enormous space of possibilities for one that satisfies a condition we can recognise. Designing a more efficient aircraft wing; finding a mathematical proof; folding a protein into its lowest-energy shape; composing a melody that obeys the rules of harmony. In each case we can check a candidate far more easily than we can conceive it. If P = NP, that gap closes. The computer scientist Scott Aaronson put the point memorably:

“If P = NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in ‘creative leaps,’ no fundamental gap between solving a problem and recognizing the solution once it's found.”

To appreciate a symphony would be to be able to compose one; to follow a proof would be to be able to discover it. The very texture of creativity rests, quietly, on the assumption that finding is harder than checking.

Which is why the consensus runs the other way: P almost certainly does not equal NP. The world looks like a place where some things are genuinely hard. The harder question is why, after fifty years, we cannot prove it. The trouble is that proving a problem has no fast algorithm means ruling out every possible clever shortcut, including ones nobody has imagined. Researchers have even mapped out why their own tools keep failing — results with names like relativization and the natural proofs barrier show that the obvious techniques cannot, in principle, settle the matter. A proof, when it comes, will need an idea we do not yet have.

So the question sits there: simple to state, understood by a curious teenager in an afternoon, and utterly immovable. It marks the boundary between what we can dream up and what we can merely admire — and we still do not know exactly where that boundary lies.

P versus NP

Easy to check, hard to find

The keystone: NP-completeness

Why anyone should care

Further reading