The Most Important Unsolved Problem in Computer Science

Here’s a look at the $1-million math problem at the heart of computation

mathematic formulas on a computer display, blue text on black screen — alengo/Getty Images

When the Clay Mathematics Institute put individual $1-million prize bounties on seven unsolved mathematical problems, they may have undervalued one entry—by a lot. If mathematicians were to resolve, in the right way, computer science's “P versus NP” question, the result could be worth worlds more than $1 million. They'd be cracking most online-security systems, revolutionizing science and even, in effect, solving the other six of the so-called Millennium Problems, all of which were chosen in the year 2000. It's hard to overstate the stakes surrounding the most important unsolved problem in computer science.

P versus NP concerns the apparent asymmetry between finding solutions to problems and verifying solutions to problems. For example, imagine you're planning a world tour to promote your new book. You pull up Priceline and start testing routes, but each one you try blows your total trip budget. Unfortunately, as the number of cities grows on your worldwide tour, the number of possible routes to check skyrockets exponentially, making it infeasible even for computers to exhaustively search through every case. But when you complain, your book agent writes back with a solution sequence of flights. You can easily verify whether their route stays in budget by simply checking that it hits every city and summing the fares to compare against the budget limit. Notice the asymmetry here: finding a solution is hard, but verifying a solution is easy.

The P versus NP question asks whether this asymmetry is real or an illusion. If you can efficiently verify a solution to a problem, does that mean you can also efficiently find a solution? It might seem obvious that finding a solution should be harder than verifying one. But researchers have been surprised before. Problems can look similarly difficult—but when you dig deeper you find shortcuts to some and hit brick walls on others. Perhaps a clever shortcut can circumvent searching through zillions of potential routes in the book tour problem. For example, if you instead wanted to find a sequence of flights between two specific remote airports while abiding by the budget, you might also throw up your hands at the immense number of possible routes to check. In fact, this problem contains enough structure that computer scientists have developed a fast procedure (or algorithm) for it that bypasses the need for an exhaustive search.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The P versus NP question rears its head everywhere we look in the computational world well beyond the specifics of our travel scenario—so much so that it has come to symbolize a holy grail in our understanding of computation. Yet every attempt to resolve it only further exposes how monumentally difficult it is to prove one way or another.

In the subfield of theoretical computer science called complexity theory, researchers try to pin down how easily computers can solve various types of problems. P represents the class of problems they can solve efficiently, such as sorting a column of numbers in a spreadsheet or finding the shortest path between two addresses on a map. In contrast, NP represents the class of problems for which computers can verify solutions efficiently. Our book tour problem, which academics call the Traveling Salesperson Problem, lives in NP because we have an efficient procedure for verifying that the agent's solution worked.

Notice that NP actually contains P as a subset because solving a problem outright is one way to verify a solution to it. For example, how would you verify that 27 × 89 = 2,403? You would solve the multiplication problem yourself and check that your answer matches the claimed one. We typically depict the relation between P and NP with a simple Venn diagram:

The region inside of NP but not inside of P contains problems that can't be solved with any known efficient algorithm. (Theoretical computer scientists use a technical definition for “efficient” that can be debated, but it serves as a useful proxy for the colloquial concept.) But we don't know whether that's because such algorithms don't exist or we just haven't mustered the ingenuity to discover them. This representation provides another way to phrase the P versus NP question: Are these classes actually distinct? Or does the Venn diagram collapse into one circle? Can all NP problems be solved efficiently?

Here are some examples of problems in NP that are not currently known to be in P:

Given a social network, is there a group of a specified size in which all of the people in it are friends with one another?
Given a varied collection of boxes to be shipped, can all of them be fit into a specified number of trucks?
Given a sudoku (generalized to n × n puzzle grids), does it have a solution?
Given a map, can the countries be colored with only three colors such that no two neighboring countries are the same color?

Ask yourself how you would verify proposed solutions to some of the problems listed and then how you would find a solution. Note that approximating a solution or solving a small instance (most of us can solve a 9 × 9 sudoku) doesn't suffice. To qualify as solving a problem, an algorithm needs to find an exact solution for all instances, including very large ones.

Each of the problems can be solved via brute-force search (for example, try every possible coloring of the map and see whether any of them work), but the number of cases to try grows exponentially with the size of the problem. This means that if we call the size of the problem n (for example, the number of countries on the map or the number of boxes to pack into trucks), then the number of cases to check looks something like 2ⁿ. The world's fastest supercomputers have no hope against exponential growth. Even when n equals 300, a tiny input size by modern data standards, 2³⁰⁰ exceeds the number of atoms in the observable universe. After hitting “go” on such an algorithm, your computer would display a spinning pinwheel that would outlive you and your descendants.

Thousands of other problems belong on our list. From cell biology to game theory, the P versus NP question reaches into far corners of science and industry. If P = NP (that is, our Venn diagram dissolves into a single circle, and we obtain fast algorithms for these seemingly hard problems), then the entire digital economy would become vulnerable to collapse. This is because much of the cryptography that secures such things as your credit card number and passwords works by shrouding private information behind computationally difficult problems that can become easy to solve only if you know the secret key. Online security as we know it rests on unproven mathematical assumptions that crumble if P = NP.

Amazingly, we can even cast mathematics itself as an NP problem because we can program computers to efficiently verify proofs. In fact, legendary mathematician Kurt Gödel first posed the P versus NP problem in a letter to his colleague John von Neumann in 1956. Gödel observed that P = NP “would have consequences of the greatest importance. Namely, it would obviously mean that ... the mental work of a mathematician concerning yes-or-no questions could be completely replaced by a machine.”

If you're a mathematician worried for your job, rest assured that most experts believe that P does not equal NP. Aside from the intuition that sometimes solutions should be harder to find than to verify, thousands of the hardest NP problems that are not known to be in P have sat unsolved across disparate fields, glowing with incentives of fame and fortune, and yet not one person has designed an efficient algorithm for a single one of them.

Of course, gut feeling and a lack of counterexamples don't constitute a proof. To prove that P is different from NP, you somehow have to rule out all potential algorithms for all of the hardest NP problems, a task that appears out of reach for current mathematical techniques. Indeed, the field has coped by proving so-called barrier theorems, which say that entire categories of tempting proof strategies to resolve P versus NP cannot succeed. Not only have we failed to find a proof, but we also have no clue what an eventual proof might look like.