In
computational complexity theoryComputational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...
,
NP is one of the most fundamental
complexity classIn computational complexity theory, a complexity class is a set of problems of related resource-based complexity. A typical complexity class has a definition of the form:...
es.
The abbreviation
NP refers to "
nondeterministicIn computer science, a nondeterministic algorithm is an algorithm that can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different...
polynomial time."
Intuitively,
NP is the set of all
decision problemIn computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...
s for which the instances where the answer is "yes" have efficiently verifiable proofs of the fact that the answer is indeed "yes." More precisely, these proofs have to be
verifiable in polynomial time by a deterministic Turing machine.
In an equivalent formal definition,
NP is the set of
decision problemIn computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...
s where the "yes"-instances can be recognized in polynomial time by a
non-deterministic Turing machineIn theoretical computer science, a Turing machine is a theoretical machine that is used in thought experiments to examine the abilities and limitations of computers....
. The equivalence of the two definitions follows from the fact that an algorithm on such a non-deterministic machine consists of two phases, the first of which consists of a guess about the solution which is generated in a non-deterministic way, while the second consists of a deterministic algorithm which verifies or rejects the guess as a valid solution to the problem.
The complexity class
PIn computational complexity theory, P, also known as PTIME or DTIME, is one of the most fundamental complexity classes. It contains all decision problems which can be solved by a deterministic Turing machine using a polynomial amount of computation time, or polynomial time.Cobham's thesis holds...
is contained in
NP, but
NP contains many important problems, the hardest of which are called
NP-completeIn computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...
problems, for which no polynomial-time algorithms are known for solving (not verifying) them. The most important open question in complexity theory, the
P =
NP problem, asks whether such algorithms actually exist for
NP-complete, and by corollary, all
NP problems. It is widely believed that this is not the case.
Formal definition
The complexity class
NP can be defined in terms of
NTIMEIn computational complexity theory, the complexity class NTIME is the set of decision problems that can be solved by a non-deterministic Turing machine using time O, and unlimited space....
as follows:
Alternatively,
NP can be defined using deterministic Turing machines as verifiers. A
languageA formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...
L is in
NP if and only if there exist polynomials
p and
q, and a deterministic Turing machine
M, such that
- For all x and y, the machine M runs in time p(|x|) on input (x,y)
- For all x in L, there exists a string y of length q(|x|) such that M(x,y) = 1
- For all x not in L and all strings y of length q(|x|), M(x,y) = 0
Introduction
Many natural
computer scienceComputer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
problems are covered by the class
NP.
In particular, the decision versions of many interesting
search problemIn computational complexity theory and computability theory, a search problem is a type of computational problem represented by a binary relation...
s and optimization problems are contained in
NP.
Verifier-based definition
In order to explain the verifier-based definition of
NP, let us consider the
subset sum problem:
Assume that we are given some
integerThe integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
s, such as {−7, −3, −2, 5, 8}, and we wish to know whether some of these integers sum up to zero. In this example, the answer is "yes", since the subset of integers {−3, −2, 5} corresponds to the sum The task of deciding whether such a subset with sum zero exists is called the
subset sum problem.
As the number of integers that we feed into the algorithm becomes larger, the number of subsets grows exponentially, and in fact the subset sum problem is
NP-complete.
However, notice that, if we are given a particular subset (often called a
certificate), we can easily check or
verify whether the subset sum is zero, by just summing up the integers of the subset. So if the sum is indeed zero, that particular subset is the
proof or
witnessIn mathematical logic, a witness is a specific value t to be substituted for variable x of an existential statement of the form ∃x φ such that φ is true.- Examples :...
for the fact that the answer is "yes". An algorithm that verifies whether a given subset has sum zero is called
verifier. A problem is said to be in
NP if and only ifIn logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....
there exists a verifier for the problem that executes in polynomial time. In case of the subset sum problem, the verifier needs only polynomial time, for which reason the subset sum problem is in
NP.
Note that the verifier-based definition of
NP does
not require an easy-to-verify certificate for the "no"-answers. The class of problems with such certificates for the "no"-answers is called
co-NP. In fact, it is an open question whether all problems in
NP also have certificates for the "no"-answers and thus are in
co-NP.
Machine-definition
Equivalent to the verifier-based definition is the following characterization:
NP is the set of
decision problemIn computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...
s decidable in polynomial time by a
non-deterministic Turing machineIn theoretical computer science, a Turing machine is a theoretical machine that is used in thought experiments to examine the abilities and limitations of computers....
. This definition is equivalent to the verifier-based definition because a non-deterministic Turing machine could solve an NP problem in polynomial time by non-deterministically selecting a certificate and running the verifier on the certificate. Similarly, if such a machine exists, then a polynomial time verifier can naturally be constructed from it.
Examples
This is an incomplete list of problems that are in
NP.
- All problems in P
In computational complexity theory, P, also known as PTIME or DTIME, is one of the most fundamental complexity classes. It contains all decision problems which can be solved by a deterministic Turing machine using a polynomial amount of computation time, or polynomial time.Cobham's thesis holds...
(For, given a certificate for a problem in P, we can ignore the certificate and just solve the problem in polynomial time. Alternatively, note that a deterministic Turing machine is also trivially a non-deterministic Turing machine that just happens to not use any non-determinism.)
- The decision problem version of the integer factorization problem: given integer n and k, is there a factor f with 1 < f < k and f dividing n?
- The graph isomorphism problem
The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.Besides its practical importance, the graph isomorphism problem is a curiosity in computational complexity theory as it is one of a very small number of problems belonging to NP...
of determining whether two graphs can be drawn identically
- All NP-complete
In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...
problems, e.g.:
- A variant of the traveling salesman problem, where we want to know if there is a route of some length that goes through all the nodes in a certain network
- The boolean satisfiability problem
In computer science, satisfiability is the problem of determining if the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to TRUE...
, where we want to know whether or not a certain formula in propositional logic with boolean variables is true for some value of the variables
Why some NP problems are hard to solve
Because of the many important problems in this class, there have been extensive efforts to find polynomial-time algorithms for problems in
NP. However, there remain a large number of problems in
NP that defy such attempts, seeming to require super-polynomial time. Whether these problems really aren't decidable in polynomial time is one of the greatest open questions in
computer scienceComputer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
(see
P=NP problem for an in-depth discussion).
An important notion in this context is the set of
NP-completeIn computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...
decision problems, which is a subset of
NP and might be informally described as the "hardest" problems in
NP. If there is a polynomial-time algorithm for even
one of them, then there is a polynomial-time algorithm for
all the problems in
NP. Because of this, and because dedicated research has failed to find a polynomial algorithm for any
NP-complete problem, once a problem has been proven to be
NP-complete this is widely regarded as a sign that a polynomial algorithm for this problem is unlikely to exist.
However, in practical uses, instead of spending computational resources looking for an optimal solution, a good enough (but suboptimal) solution may often be found in polynomial time. Also, the real life applications of some problems are easier than their theoretical equivalents. For example, when solving the
Travelling salesman problemThe travelling salesman problem is an NP-hard problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pairwise distances, the task is to find the shortest possible tour that visits each city exactly once...
the computer may be unable to use knowledge of
triangle inequalityIn mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side ....
as the graph may be biased beyond the limits of Euclidean plane, unlike real road networks.
Equivalence of definitions
The two definitions of
NP as the class of problems solvable by a nondeterministic
Turing machineA Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...
(TM) in polynomial time and the class of problems verifiable by a deterministic Turing machine in polynomial time are equivalent. The proof is described by many textbooks, for example Sipser's
Introduction to the Theory of Computation, section 7.3.
To show this, first suppose we have a deterministic verifier. A nondeterministic machine can simply nondeterministically run the verifier on all possible proof strings (this requires only polynomially-many steps because it can nondeterministically choose the next character in the proof string in each step, and the length of the proof string must be polynomially bounded). If any proof is valid, some path will accept; if no proof is valid, the string is not in the language and it will reject.
Conversely, suppose we have a nondeterministic TM called A accepting a given language L. At each of its polynomially-many steps, the machine's
computation treeA computation tree is a representation for the computation steps of a non-deterministic Turing machine on a specified input. A computation tree is a rooted tree of nodes and edges. Each node in the tree represents a single computational state, while each edge represents a transition to the next...
branches in at most a constant number of directions. There must be at least one accepting path, and the string describing this path is the proof supplied to the verifier. The verifier can then deterministically simulate A, following only the accepting path, and verifying that it accepts at the end. If A rejects the input, there is no accepting path, and the verifier will never accept.
Relationship to other classes
NP contains all problems in
PIn computational complexity theory, P, also known as PTIME or DTIME, is one of the most fundamental complexity classes. It contains all decision problems which can be solved by a deterministic Turing machine using a polynomial amount of computation time, or polynomial time.Cobham's thesis holds...
, since one can verify any instance of the problem by simply ignoring the proof and solving it.
NP is contained in
PSPACEIn computational complexity theory, PSPACE is the set of all decision problems which can be solved by a Turing machine using a polynomial amount of space.- Formal definition :...
—to show this, it suffices to construct a
PSPACE machine that loops over all proof strings and feeds each one to a polynomial-time verifier. Since a polynomial-time machine can only read polynomially-many bits, it cannot use more than polynomial space, nor can it read a proof string occupying more than polynomial space (so we don't have to consider proofs longer than this).
NP is also contained in
EXPTIMEIn computational complexity theory, the complexity class EXPTIME is the set of all decision problems solvable by a deterministic Turing machine in O time, where p is a polynomial function of n....
, since the same algorithm operates in exponential time.
The
complementIn computational complexity theory, the complement of a decision problem is the decision problem resulting from reversing the yes and no answers. Equivalently, if we define decision problems as sets of finite strings, then the complement of this set over some fixed domain is its complement...
of
NP,
co-NP, contains those problems which have a simple proof for
no instances, sometimes called counterexamples. For example,
primality testA primality test is an algorithm for determining whether an input number is prime. Amongst other fields of mathematics, it is used for cryptography. Unlike integer factorization, primality tests do not generally give prime factors, only stating whether the input number is prime or not...
ing trivially lies in co-NP, since one can refute the primality of an integer by merely supplying a nontrivial factor.
NP and co-NP together form the first level in the
polynomial hierarchyIn computational complexity theory, the polynomial hierarchy is a hierarchy of complexity classes that generalize the classes P, NP and co-NP to oracle machines...
, higher only than
P.
NP is defined using only deterministic machines. If we permit the verifier to be probabilistic (specifically, a BPP machine), we get the class
MA solvable using a
Arthur-Merlin protocolIn computational complexity theory, an Arthur–Merlin protocol is an interactive proof system in which the verifier's coin tosses are constrained to be public . This notion was introduced by...
with no communication from Merlin to Arthur.
NP is a class of
decision problemIn computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...
s; the analogous class of function problems is
FNPIn computational complexity theory, the complexity class FNP is the function problem extension of the decision problem class NP. The name is somewhat of a misnomer, since technically it is a class of binary relations, not functions, as the following formal definition explains:This definition does...
.
Other characterizations
In terms of descriptive complexity theory,
NP corresponds precisely to the set of languages definable by existential
second-order logicIn logic and mathematics second-order logic is an extension of first-order logic, which itself is an extension of propositional logic. Second-order logic is in turn extended by higher-order logic and type theory....
(
Fagin's theoremFagin's theorem is a result in descriptive complexity theory that states that the set of all properties expressible in existential second-order logic is precisely the complexity class NP...
).
NP can be seen as a very simple type of
interactive proof systemIn computational complexity theory, an interactive proof system is an abstract machine that models computation as the exchange of messages between two parties. The parties, the verifier and the prover, interact by exchanging messages in order to ascertain whether a given string belongs to a...
, where the prover comes up with the proof certificate and the verifier is a deterministic polynomial-time machine that checks it. It is complete because the right proof string will make it accept if there is one, and it is sound because the verifier cannot accept if there is no acceptable proof string.
A major result of complexity theory is that
NP can be characterized as the problems solvable by probabilistically checkable proofs where the verifier uses O(log
n) random bits and examines only a constant number of bits of the proof string (the class
PCP(log
n, 1)). More informally, this means that the
NP verifier described above can be replaced with one that just "spot-checks" a few places in the proof string, and using a limited number of coin flips can determine the correct answer with high probability. This allows several results about the hardness of
approximation algorithmIn computer science and operations research, approximation algorithms are algorithms used to find approximate solutions to optimization problems. Approximation algorithms are often associated with NP-hard problems; since it is unlikely that there can ever be efficient polynomial time exact...
s to be proven.
Example
The decision version of the traveling salesman problem is in
NP. Given an input matrix of distances between
n cities, the problem is to determine if there is a route visiting all cities with total distance less than
k.
A proof certificate can simply be a list of the cities. Then verification can clearly be done in polynomial time by a deterministic Turing machine. It simply adds the matrix entries corresponding to the paths between the cities.
A nondeterministic Turing machine can find such a route as follows:
- At each city it visits it "guesses" the next city to visit, until it has visited every vertex. If it gets stuck, it stops immediately.
- At the end it verifies that the route it has taken has cost less than k in O(n) time.
One can think of each guess as "forking" a new copy of the Turing machine to follow each of the possible paths forward, and if at least one machine finds a route of distance less than
k, that machine accepts the input. (Equivalently, this can be thought of as a single Turing machine that always guesses correctly)
Binary search on the range of possible distances can convert the decision version of Traveling Salesman to the optimization version, by calling the decision version repeatedly (a polynomial number of times).
External links
- American Scientist
American Scientist is the bimonthly science and technology magazine published since 1913 by Sigma Xi. Each issue includes four to five feature articles written by scientists and engineers. These authors review research in all fields of science...
primer on traditional and recent complexity theory research: "Accidental Algorithms"