NP-complete

# NP-complete

Overview

Discussion

Encyclopedia

In computational complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

, the complexity class
Complexity class
In computational complexity theory, a complexity class is a set of problems of related resource-based complexity. A typical complexity class has a definition of the form:...

NP-complete (abbreviated NP-C or NPC) is a class of decision problem
Decision problem
In computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...

s. A decision problem L is NP-complete if it is in the set of NP
NP (complexity)
In computational complexity theory, NP is one of the most fundamental complexity classes.The abbreviation NP refers to "nondeterministic polynomial time."...

problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard
NP-hard
NP-hard , in computational complexity theory, is a class of problems that are, informally, "at least as hard as the hardest problems in NP". A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H...

problems so that any NP problem can be converted into L by a transformation of the inputs in polynomial time.

Although any given solution to such a problem can be verified quickly, there is no known efficient way to locate a solution in the first place; indeed, the most notable characteristic of NP-complete problems is that no fast solution to them is known. That is, the time required to solve the problem using any currently known algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

increases very quickly as the size of the problem grows. As a result, the time required to solve even moderately large versions of many of these problems easily reaches into the billions or trillions of years, using any amount of computing power available today. As a consequence, determining whether or not it is possible to solve these problems quickly, called the P versus NP problem, is one of the principal unsolved problems in computer science today.

While a method for computing the solutions to NP-complete problems using a reasonable amount of time remains undiscovered, computer scientist
Computer scientist
A computer scientist is a scientist who has acquired knowledge of computer science, the study of the theoretical foundations of information and computation and their application in computer systems....

s and programmers still frequently encounter NP-complete problems. NP-complete problems are often addressed by using approximation algorithm
Approximation algorithm
In computer science and operations research, approximation algorithms are algorithms used to find approximate solutions to optimization problems. Approximation algorithms are often associated with NP-hard problems; since it is unlikely that there can ever be efficient polynomial time exact...

s.

## Formal overview

NP-complete is a subset
Subset
In mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...

of NP
NP (complexity)
In computational complexity theory, NP is one of the most fundamental complexity classes.The abbreviation NP refers to "nondeterministic polynomial time."...

, the set of all decision problems whose solutions can be verified in polynomial time; NP may be equivalently defined as the set of decision problems that can be solved in polynomial time on a nondeterministic Turing machine. A problem p in NP is also in NPC if and only if
If and only if
In logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....

every other problem in NP can be transformed into p in polynomial time. NP-complete can also be used as an adjective: problems in the class NP-complete are known as NP-complete problems.

NP-complete problems are studied because the ability to quickly verify solutions to a problem (NP) seems to correlate with the ability to quickly solve that problem (P
P (complexity)
In computational complexity theory, P, also known as PTIME or DTIME, is one of the most fundamental complexity classes. It contains all decision problems which can be solved by a deterministic Turing machine using a polynomial amount of computation time, or polynomial time.Cobham's thesis holds...

). It is not known whether every problem in NP can be quickly solved—this is called the P = NP problem. But if any single problem in NP-complete can be solved quickly, then every problem in NP can also be quickly solved, because the definition of an NP-complete problem states that every problem in NP must be quickly reducible to every problem in NP-complete (that is, it can be reduced in polynomial time). Because of this, it is often said that the NP-complete problems are harder or more difficult than NP problems in general.

## Formal definition of NP-completeness

A decision problem is NP-complete if:
1. is in NP, and
2. Every problem in NP is reducible
Reduction (complexity)
In computability theory and computational complexity theory, a reduction is a transformation of one problem into another problem. Depending on the transformation used this can be used to define complexity classes on a set of problems....

to in polynomial time.

can be shown to be in NP by demonstrating that a candidate solution to can be verified in polynomial time.
Note that a problem satisfying condition 2 is said to be NP-hard
NP-hard
NP-hard , in computational complexity theory, is a class of problems that are, informally, "at least as hard as the hardest problems in NP". A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H...

, whether or not it satisfies condition 1.

A consequence of this definition is that if we had a polynomial time algorithm (on a UTM
Universal Turing machine
In computer science, a universal Turing machine is a Turing machine that can simulate an arbitrary Turing machine on arbitrary input. The universal machine essentially achieves this by reading both the description of the machine to be simulated as well as the input thereof from its own tape. Alan...

, or any other Turing-equivalent
Turing completeness
In computability theory, a system of data-manipulation rules is said to be Turing complete or computationally universal if and only if it can be used to simulate any single-taped Turing machine and thus in principle any computer. A classic example is the lambda calculus...

abstract machine
Abstract machine
An abstract machine, also called an abstract computer, is a theoretical model of a computer hardware or software system used in automata theory...

) for , we could solve all problems in NP in polynomial time.

## Background

The concept of NP-complete was introduced in 1971 by Stephen Cook
Stephen Cook
Stephen Arthur Cook is a renowned American-Canadian computer scientist and mathematician who has made major contributions to the fields of complexity theory and proof complexity...

in a paper entitled The complexity of theorem-proving procedures on pages 151–158 of the Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, though the term NP-complete did not appear anywhere in his paper. At that computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

conference, there was a fierce debate among the computer scientists about whether NP-complete problems could be solved in polynomial time on a deterministic Turing machine
Turing machine
A Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...

. John Hopcroft
John Hopcroft
John Edward Hopcroft is an American theoretical computer scientist. His textbooks on theory of computation and data structures are regarded as standards in their fields. He is the IBM Professor of Engineering and Applied Mathematics in Computer Science at Cornell University.He received his...

brought everyone at the conference to a consensus that the question of whether NP-complete problems are solvable in polynomial time should be put off to be solved at some later date, since nobody had any formal proofs for their claims one way or the other. This is known as the question of whether P=NP.

Nobody has yet been able to determine conclusively whether NP-complete problems are in fact solvable in polynomial time, making this one of the great unsolved problems of mathematics. The Clay Mathematics Institute
Clay Mathematics Institute
The Clay Mathematics Institute is a private, non-profit foundation, based in Cambridge, Massachusetts. The Institute is dedicated to increasing and disseminating mathematical knowledge. It gives out various awards and sponsorships to promising mathematicians. The institute was founded in 1998...

is offering a US\$1 million reward to anyone who has a formal proof that P=NP or that P≠NP.

In the celebrated Cook-Levin theorem
Cook's theorem
In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete...

(independently proved by Leonid Levin
Leonid Levin

), Cook proved that the Boolean satisfiability problem
Boolean satisfiability problem
In computer science, satisfiability is the problem of determining if the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to TRUE...

is NP-complete (a simpler, but still highly technical proof of this is available). In 1972, Richard Karp
Richard Karp
Richard Manning Karp is a computer scientist and computational theorist at the University of California, Berkeley, notable for research in the theory of algorithms, for which he received a Turing Award in 1985, The Benjamin Franklin Medal in Computer and Cognitive Science in 2004, and the Kyoto...

proved that several other problems were also NP-complete (see Karp's 21 NP-complete problems
Karp's 21 NP-complete problems
One of the most important results in computational complexity theory was Stephen Cook's 1971 demonstration of the first NP-complete problem, the boolean satisfiability problem...

); thus there is a class of NP-complete problems (besides the Boolean satisfiability problem). Since Cook's original results, thousands of other problems have been shown to be NP-complete by reductions from other problems previously shown to be NP-complete; many of these problems are collected in Garey
Michael Garey
Michael Randolph Garey is a computer science researcher, and co-author of Computers and Intractability: A Guide to the Theory of NP-completeness. He earned his PhD in computer science in 1970 from the University of Wisconsin–Madison. In 1995 he was inducted as a Fellow of the Association for...

and Johnson's
David S. Johnson
David Stifler Johnson is a computer scientist specializing in algorithms and optimization. He is currently the head of the Algorithms and Optimization Department of AT&T Labs Research. He was awarded the 2010 Knuth Prize....

1979 book Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
In computer science, more specifically computational complexity theory,Computers and Intractability: A Guide to the Theory of NP-Completeness is an influential textbook by Michael Garey and David S. Johnson....

.

## NP-complete problems

An interesting example is the graph isomorphism problem
Graph isomorphism problem
The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.Besides its practical importance, the graph isomorphism problem is a curiosity in computational complexity theory as it is one of a very small number of problems belonging to NP...

, the graph theory
Graph theory
In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. A "graph" in this context refers to a collection of vertices or 'nodes' and a collection of edges that connect pairs of...

problem of determining whether a graph isomorphism
Graph isomorphism
In graph theory, an isomorphism of graphs G and H is a bijection between the vertex sets of G and H f \colon V \to V \,\!such that any two vertices u and v of G are adjacent in G if and only if ƒ and ƒ are adjacent in H...

exists between two graphs. Two graphs are isomorphic if one can be transformed
Isomorphism
In abstract algebra, an isomorphism is a mapping between objects that shows a relationship between two properties or operations.  If there exists an isomorphism between two structures, the two structures are said to be isomorphic.  In a certain sense, isomorphic structures are...

into the other simply by renaming vertices
Vertex (graph theory)
In graph theory, a vertex or node is the fundamental unit out of which graphs are formed: an undirected graph consists of a set of vertices and a set of edges , while a directed graph consists of a set of vertices and a set of arcs...

. Consider these two problems:
• Graph Isomorphism: Is graph G1 isomorphic to graph G2?
• Subgraph Isomorphism: Is graph G1 isomorphic to a subgraph of graph G2?

The Subgraph Isomorphism problem is NP-complete. The graph isomorphism problem is suspected to be neither in P nor NP-complete, though it is in NP. This is an example of a problem that is thought to be hard, but isn't thought to be NP-complete.

The easiest way to prove that some new problem is NP-complete is first to prove that it is in NP, and then to reduce some known NP-complete problem to it. Therefore, it is useful to know a variety of NP-complete problems. The list below contains some well-known problems that are NP-complete when expressed as decision problems.
To the right is a diagram of some of the problems and the reductions
Reduction (complexity)
In computability theory and computational complexity theory, a reduction is a transformation of one problem into another problem. Depending on the transformation used this can be used to define complexity classes on a set of problems....

typically used to prove their NP-completeness. In this diagram, an arrow from one problem to another indicates the direction of the reduction. Note that this diagram is misleading as a description of the mathematical relationship between these problems, as there exists a polynomial-time reduction between any two NP-complete problems; but it indicates where demonstrating this polynomial-time reduction has been easiest.

There is often only a small difference between a problem in P and an NP-complete problem. For example, the 3-satisfiability problem, a restriction of the boolean satisfiability problem, remains NP-complete, whereas the slightly more restricted 2-satisfiability
2-satisfiability
In computer science, 2-satisfiability is the problem of determining whether a collection of two-valued variables with constraints on pairs of variables can be assigned values satisfying all the constraints...

problem is in P (specifically, NL-complete
NL-complete
In computational complexity theory, NL-Complete is a complexity class which is complete for NL. It contains the most "difficult" or "expressive" problems in NL...

), and the slightly more general max. 2-sat. problem is again NP-complete. Determining whether a graph can be colored with 2 colors is in P, but with 3 colors is NP-complete, even when restricted to planar graph
Planar graph
In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints...

s. Determining if a graph is a cycle
Cycle graph
In graph theory, a cycle graph or circular graph is a graph that consists of a single cycle, or in other words, some number of vertices connected in a closed chain. The cycle graph with n vertices is called Cn...

or is bipartite
Bipartite graph
In the mathematical field of graph theory, a bipartite graph is a graph whose vertices can be divided into two disjoint sets U and V such that every edge connects a vertex in U to one in V; that is, U and V are independent sets...

is very easy (in L
L (complexity)
In computational complexity theory, L is the complexity class containing decision problems which can be solved by a deterministic Turing machine using a logarithmic amount of memory space...

), but finding a maximum bipartite or a maximum cycle subgraph is NP-complete. A solution of the knapsack problem
Knapsack problem
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as...

within any fixed percentage of the optimal solution can be computed in polynomial time, but finding the optimal solution is NP-complete.

## Solving NP-complete problems

At present, all known algorithms for NP-complete problems require time that is superpolynomial in the input size, and it is unknown whether there are any faster algorithms.

The following techniques can be applied to solve computational problems in general, and they often give rise to substantially faster algorithms:
• Approximation
Approximation algorithm
In computer science and operations research, approximation algorithms are algorithms used to find approximate solutions to optimization problems. Approximation algorithms are often associated with NP-hard problems; since it is unlikely that there can ever be efficient polynomial time exact...

: Instead of searching for an optimal solution, search for an "almost" optimal one.
• Randomization
Randomized algorithm
A randomized algorithm is an algorithm which employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits...

: Use randomness to get a faster average running time
Running Time
Running Time may refer to:* Running Time * see Analysis of algorithms...

, and allow the algorithm to fail with some small probability. See Monte Carlo method
Monte Carlo method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

.
• Restriction: By restricting the structure of the input (e.g., to planar graphs), faster algorithms are usually possible.
• Parameterization
Parameterized complexity
Parameterized complexity is a branch of computational complexity theory in computer science that focuses on classifying computational problems according to their inherent difficulty with respect to multiple parameters of the input. The complexity of a problem is then measured as a function in those...

: Often there are fast algorithms if certain parameters of the input are fixed.
• Heuristic: An algorithm that works "reasonably well" in many cases, but for which there is no proof that it is both always fast and always produces a good result. Metaheuristic
Metaheuristic
In computer science, metaheuristic designates a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Metaheuristics make few or no assumptions about the problem being optimized and can search very large spaces...

approaches are often used.

One example of a heuristic algorithm is a suboptimal greedy coloring algorithm
Greedy coloring
In the study of graph coloring problems in mathematics and computer science, a greedy coloring is a coloring of the vertices of a graph formed by a greedy algorithm that considers the vertices of the graph in sequence and assigns each vertex its first available color...

used for graph coloring during the register allocation
Register allocation
In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers...

phase of some compilers, a technique called graph-coloring global register allocation. Each vertex is a variable, edges are drawn between variables which are being used at the same time, and colors indicate the register assigned to each variable. Because most RISC machines have a fairly large number of general-purpose registers, even a heuristic approach is effective for this application.

## Completeness under different types of reduction

In the definition of NP-complete given above, the term reduction was used in the technical meaning of a polynomial-time many-one reduction.

Another type of reduction is polynomial-time Turing reduction. A problem is polynomial-time Turing-reducible to a problem if, given a subroutine that solves in polynomial time, one could write a program that calls this subroutine and solves in polynomial time. This contrasts with many-one reducibility, which has the restriction that the program can only call the subroutine once, and the return value of the subroutine must be the return value of the program.

If one defines the analogue to NP-complete with Turing reductions instead of many-one reductions, the resulting set of problems won't be smaller than NP-complete; it is an open question whether it will be any larger.

Another type of reduction that is also often used to define NP-completeness is the logarithmic-space many-one reduction which is a many-one reduction that can be computed with only a logarithmic amount of space. Since every computation that can be done in logarithmic space can also be done in polynomial time it follows that if there is a logarithmic-space many-one reduction then there is also a polynomial-time many-one reduction. This type of reduction is more refined than the more usual polynomial-time many-one reductions and it allows us to distinguish more classes such as P-complete
P-complete
In complexity theory, the notion of P-complete decision problems is useful in the analysis of both:# which problems are difficult to parallelize effectively, and;# which problems are difficult to solve in limited space....

. Whether under these types of reductions the definition of NP-complete changes is still an open problem. All currently known NP-complete problems are NP-complete under log space reductions. Indeed, all currently known NP-complete problems remain NP-complete even under much weaker reductions. It is known, however, that AC0 reductions define a strictly smaller class than polynomial-time reductions.

## Naming

According to Don Knuth, the name "NP-complete" was popularized by Alfred Aho
Alfred Aho
Alfred Vaino Aho is a Canadian computer scientist.-Career:Aho received a B.A.Sc. in Engineering Physics from the University of Toronto and a Ph.D. in Electrical Engineering/Computer Science from Princeton University...

, John Hopcroft
John Hopcroft
John Edward Hopcroft is an American theoretical computer scientist. His textbooks on theory of computation and data structures are regarded as standards in their fields. He is the IBM Professor of Engineering and Applied Mathematics in Computer Science at Cornell University.He received his...

and Jeffrey Ullman
Jeffrey Ullman
Jeffrey David Ullman is a renowned computer scientist. His textbooks on compilers , theory of computation , data structures, and databases are regarded as standards in their fields.-Early life & Career:Ullman received a Bachelor of Science degree in Engineering...

in their celebrated textbook "The Design and Analysis of Computer Algorithms". He reports that they introduced the change in the galley proofs for the book (from "polynomially-complete"), in accordance with the results of a poll he had conducted of the Theoretical Computer Science
Theoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....

community. Other suggestions made in the poll included "Herculean", "formidable", Steiglitz
Kenneth Steiglitz
Dr. Kenneth "Ken" Steiglitz is a professor of computer science at Princeton University. He was born in Weehawken, New Jersey on January 30, 1939. He received his Doctor of Engineering Science from New York University in 1963. In 1997 he was inducted as a Fellow of the Association for Computing...

's "hard-boiled" in honor of Cook, and Shen Lin's acronym "PET", which stood for "probably exponential time", but depending on which way the P versus NP problem went, could stand for "provably exponential time" or "previously exponential time".

## Common misconceptions

The following misconceptions are frequent.
• "NP-complete problems are the most difficult known problems." Since NP-complete problems are in NP, their running time is at most exponential. However, some problems provably require more time, for example Presburger arithmetic
Presburger arithmetic
Presburger arithmetic is the first-order theory of the natural numbers with addition, named in honor of Mojżesz Presburger, who introduced it in 1929. The signature of Presburger arithmetic contains only the addition operation and equality, omitting the multiplication operation entirely...

.
• "NP-complete problems are difficult because there are so many different solutions." On the one hand, there are many problems that have a solution space just as large, but can be solved in polynomial time (for example minimum spanning tree
Minimum spanning tree
Given a connected, undirected graph, a spanning tree of that graph is a subgraph that is a tree and connects all the vertices together. A single graph can have many different spanning trees...

). On the other hand, there are NP-problems with at most one solution that are NP-hard under randomized polynomial-time reduction (see Valiant–Vazirani theorem).
• "Solving NP-complete problems requires exponential time." First, this would imply P ≠ NP, which is still an unsolved question. Further, some NP-complete problems actually have algorithms running in superpolynomial, but subexponential time. For example, the Independent set and Dominating set problems are NP-complete when restricted to planar graph
Planar graph
In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints...

s, but can be solved in subexponential time on planar graph
Planar graph
In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints...

s using the planar separator theorem
Planar separator theorem
In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices...

.
• "All instances of an NP-complete problem are difficult." Often some instances, or even almost all instances, may be easy to solve within polynomial time.

• List of NP-complete problems
• Almost complete
In computational complexity, problems that are in the complexity class NP but are neither in the class P nor NP-complete are called NP-intermediate, and the class of such problems is called NPI. Ladner's theorem, shown in 1975 by Richard Ladner, is a result asserting that, if P ≠ NP, then NPI is...

• Strongly NP-complete
Strongly NP-complete
In computational complexity, strong NP-completeness is a property of computational problems that is a special case of NP-completeness. A general computational problem may have numerical parameters...

• P = NP problem
• NP-hard
NP-hard
NP-hard , in computational complexity theory, is a class of problems that are, informally, "at least as hard as the hardest problems in NP". A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H...