Knapsack problem - AbsoluteAstronomy.com

The knapsack problem or rucksack problem is a problem in combinatorial optimization

Combinatorial optimization

In applied mathematics and theoretical computer science, combinatorial optimization is a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not feasible...

: Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most useful items.

The problem often arises in resource allocation

Resource allocation

Resource allocation is used to assign the available resources in an economic way. It is part of resource management. In project management, resource allocation is the scheduling of activities and the resources required by those activities while taking into consideration both the resource...

with financial constraints. A similar problem also appears in combinatorics

Combinatorics

Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures. Aspects of combinatorics include counting the structures of a given kind and size , deciding when certain criteria can be met, and constructing and analyzing objects meeting the criteria ,...

, complexity theory

Computational complexity theory

Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

, cryptography

Cryptography

Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

and applied mathematics

Applied mathematics

Applied mathematics is a branch of mathematics that concerns itself with mathematical methods that are typically used in science, engineering, business, and industry. Thus, "applied mathematics" is a mathematical science with specialized knowledge...

.

The decision problem

Decision problem

In computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...

form of the knapsack problem is the question "can a value of at least V be achieved without exceeding the weight W?"

Definition

In the following, we have n kinds of items, 1 through n.
Each kind of item i has a value v_i and a weight w_i.
We usually assume that all values and weights are nonnegative. To simplify the representation, we can also assume that the items are listed in increasing order of weight.
The maximum weight that we can carry in the bag is W.

The most common formulation of the problem is the 0-1 knapsack problem, which restricts the number x_i of copies of each kind of item
to zero or one.
Mathematically the 0-1-knapsack problem can be formulated as:

maximize
subject to

The bounded knapsack problem restricts the number

of copies of
each kind of item to a maximum integer value

.
Mathematically the bounded knapsack problem can be formulated as:

maximize
subject to

The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item.

Of particular interest is the special case of the problem with these properties:

it is a decision problem,
it is a 0-1 problem,
for each kind of item, the weight equals the value: .

Notice that in this special case, the problem is equivalent to this: given a set of nonnegative integers, does any subset of it add up to exactly W? Or, if negative weights are allowed and W is chosen to be zero, the problem is: given a set of integers, does any nonempty subset add up to exactly 0? This special case is called the subset sum problem. In the field of cryptography

Cryptography

Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

, the term knapsack problem is often used to refer specifically to the subset sum problem.

If multiple knapsacks are allowed, the problem is better thought of as the bin packing problem.

Computational complexity

The knapsack problem is interesting from the perspective of computer science because

there is a pseudo-polynomial time
Pseudo-polynomial time
In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is polynomial in the numeric value of the input ....

algorithm using dynamic programming
Dynamic programming
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...
there is a fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine
the problem is NP-complete
NP-complete
In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...

to solve exactly, thus it is expected that no algorithm can be both correct and fast (polynomial-time) on all cases
many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly.

The subset sum version of the knapsack problem is commonly known as one of Karp's 21 NP-complete problems

Karp's 21 NP-complete problems

One of the most important results in computational complexity theory was Stephen Cook's 1971 demonstration of the first NP-complete problem, the boolean satisfiability problem...

.

There have been attempts to use subset sum as the basis for public key cryptography systems, such as the Merkle-Hellman knapsack cryptosystem. These attempts typically used some group

Group (mathematics)

In mathematics, a group is an algebraic structure consisting of a set together with an operation that combines any two of its elements to form a third element. To qualify as a group, the set and the operation must satisfy a few conditions called group axioms, namely closure, associativity, identity...

other than the integers. Merkle-Hellman and several similar algorithms were later broken, because the particular subset sum problems they produced were in fact solvable by polynomial-time algorithms.

One theme in research literature is to identify what the "hard" instances of the knapsack problem look like, or viewed another way, to identify what properties of instances in practice might make them more amenable than their worst-case NP-complete behaviour suggests. name ="poirriez et all 2009">Vincent Poirriez, Nicola Yanev, Rumen Andonov (2009) A Hybrid Algorithm for the Unbounded Knapsack Problem Discrete Optimization http://dx.doi.org/10.1016/j.disopt.2008.09.004

Several algorithms are freely available to solve knapsack problems, based on dynamic programming approach, branch and bound approach or hybridizations of both approaches. name="martellopisingertoth99a">S. Martello, D. Pisinger, P. Toth, Dynamic programming and strong bounds for the 0-1
knapsack problem , Manag. Sci., 45:414–424, 1999. name="plateau85">G. Plateau, M. Elkihel, A hybrid algorithm for the 0-1 knapsack problem, Methods of
Oper. Res., 49:277–293, 1985.

Unbounded knapsack problem

If all weights (

) are
nonnegative integers, the knapsack problem can be solved in pseudo-polynomial time

Pseudo-polynomial time

In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is polynomial in the numeric value of the input ....

using dynamic programming

Dynamic programming

In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

. The following describes a dynamic programming solution for the unbounded knapsack problem.

To simplify things, assume all weights are strictly positive (w_i > 0). We wish to maximize total value subject to the constraint that total weight is less than or equal to W. Then for each w ≤ W, define m[w] to be the maximum value that can be attained with total weight less than or equal to w. m[W] then is the solution to the problem.

Observe that m[w] has the following properties:

(the sum of zero items, i.e., the summation of the empty set)

where

is the value of the i-th kind of item.

Here the maximum of the empty set is taken to be zero. Tabulating the results from

up through

gives the solution. Since the calculation of each

involves examining

items, and there are

values of

to calculate, the running time of the dynamic programming solution is

Big O notation

In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

. Dividing

by their greatest common divisor

Greatest common divisor

In mathematics, the greatest common divisor , also known as the greatest common factor , or highest common factor , of two or more non-zero integers, is the largest positive integer that divides the numbers without a remainder.For example, the GCD of 8 and 12 is 4.This notion can be extended to...

is an obvious way to improve the running time.

The

complexity does not contradict the fact that the knapsack problem is NP-complete

NP-complete

In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...

, since

, unlike

, is not polynomial in the length of the input to the problem. The length of the

input to the problem is proportional to the number of bits in

, not to

itself.

0-1 knapsack problem

A similar dynamic programming solution for the 0-1 knapsack problem also runs in pseudo-polynomial time

Pseudo-polynomial time

In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is polynomial in the numeric value of the input ....

. As above, assume

are strictly positive integers. Define

to be the maximum value that can be attained with weight less than or equal to

using items up to

.

We can define

recursively as follows:

if (the new item is more than the current weight limit)
if .

The solution can then be found by calculating

. To do this efficiently we can use a table to store previous computations. This solution will therefore run in

time and

space. Additionally, if we use only a 1-dimensional array

to store the current optimal values and pass over this array

times, rewriting from

every time, we get the same result for only

space.

Another algorithm for 0-1 knapsack, discovered in 1974 and sometimes called "meet-in-the-middle" due to parallels to a similarly-named algorithm in cryptography

Meet-in-the-middle attack

The meet-in-the-middle attack is a cryptographic attack which, like the birthday attack, makes use of a space-time tradeoff. While the birthday attack attempts to find two values in the domain of a function that map to the same value in its range, the meet-in-the-middle attack attempts to find a...

, is exponential in the number of different items but may be preferable to the DP algorithm when

is large compared to n. In particular, if the

are nonnegative but not integers, we could still use the dynamic programming algorithm by scaling and rounding (i.e. using fixed-point arithmetic

Fixed-point arithmetic

In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after the radix point...

), but if the problem requires

fractional digits of precision to arrive at the correct answer,

will need to be scaled by

, and the DP algorithm will require

space and

time.

The "meet-in-the-middle" algorithm is as follows:

Partition the set {1...n} into two sets A and B of approximately equal size
Compute the weights and values of all subsets of each set.
For each subset of A, find the "best matching" subset of B, i.e. the subset of B of greatest value such that the combined weight is less than W. Keep track of the greatest combined value seen so far.

The algorithm takes

space, and efficient implementations of step 3 (for instance, sorting the subsets of B by weight, discarding subsets of B which weigh more than other subsets of B of greater or equal value, and using binary search to find the best match) result in a runtime of

. As with the meet in the middle attack

Meet-in-the-middle attack

in cryptography, this improves on the

runtime of a naive brute force approach (examining all subsets of {1...n}), at the cost of using exponential rather than constant space.

Greedy approximation algorithm

George Dantzig

George Bernard Dantzig was an American mathematical scientist who made important contributions to operations research, computer science, economics, and statistics....

proposed a greedy

Greedy algorithm

A greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stagewith the hope of finding the global optimum....

approximation algorithm

Approximation algorithm

In computer science and operations research, approximation algorithms are algorithms used to find approximate solutions to optimization problems. Approximation algorithms are often associated with NP-hard problems; since it is unlikely that there can ever be efficient polynomial time exact...

to solve the unbounded knapsack problem. His version sorts the items in decreasing order of value per unit of weight,

. It then proceeds to insert them into the sack, starting with as many copies as possible of the first kind of item until there is no longer space in the sack for more. Provided that there is an unlimited supply of each kind of item, if

is the maximum value of items that fit into the sack, then the greedy algorithm is guaranteed to achieve at least a value of

. However, for the bounded problem, where the supply of each kind of item is limited, the algorithm may be far from optimal.

Dominance relations in the UKP

Solving the unbounded knapsack problem can be made easier by throwing away items which will never be needed. For a given item i, suppose we could find a set of items J such that their total weight is less than the weight of i, and their total value is greater than the value of i. Then i cannot appear in the optimal solution, because we could always improve any potential solution containing i by replacing i with the set J. Therefore we can disregard the i-th item altogether. In such cases, J is said to dominate i. (Note that this does not apply to bounded knapsack problems, since we may have already used up the items in J.)

Finding dominance relations allows us to significantly reduce the size of the search space. There are several different types of dominance relations, name ="poirriez et all 2009">Vincent Poirriez, Nicola Yanev, Rumen Andonov (2009) A Hybrid Algorithm for the Unbounded Knapsack Problem, section 2) Discrete Optimization http://dx.doi.org/10.1016/j.disopt.2008.09.004 which all satisfy an inequality of the form:

, and

for some

where

and

. The vector

denotes the number of copies of each member of J.

Collective dominance

The i-th item is collectively dominated by J, written as

, if the total weight of some combination of items in J is less than w_i and their total value is greater than v_i. Formally,

and

for some

, i.e.

. Verifying this dominance is computationally hard, so it can only be used with a dynamic programming approach. In fact, this is equivalent to solving a smaller knapsack decision problem where2 V = v_i, W = w_i, and the items are restricted to J.

Threshold dominance

The i-th item is threshold dominated by J, written as

, if some number of copies of i are dominated by J. Formally,

, and

for some

and

. This is a generalization of collective dominance, first introduced in and used in the EDUK algorithm. The smallest such

defines the threshold of the item i, written

. In this case, the optimal solution could contain at most

copies of i.

Multiple dominance

The i-th item is multiply dominated by a single item j, written as

, if i is dominated by some number of copies of j. Formally,

, and

for some

i.e.

.
This dominance could be efficiently used during preprocessing because it can be detected relatively easily.

Modular dominance

Let b be the best item, i.e.

for all i. This is the item with the greatest density of value.

The i-th item is modularly dominated by a single item j, written as

, if i is dominated by j plus several copies of b. Formally,

, and

i.e.

Applications

Knapsack problems can be applied to real-world decision-making processes in a wide variety of fields, such as the finding the least wasteful cutting of raw materials, selection of capital investments and financial portfolios, selection of assets for asset-backed securitization, and generating keys for the Merkle–Hellman knapsack cryptosystem.

One early application of knapsack algorithms was in the construction and scoring of tests in which the test-takers have a choice as to which questions they answer. On tests with a homogeneous

Homogeneity and heterogeneity

Homogeneity and heterogeneity are concepts relating to the uniformity or lack thereof in a substance. A material that is homogeneous is uniform in composition or character; one that is heterogeneous lacks uniformity in one of these qualities....

distribution of point values for each question, it is a fairly simple process to provide the test-takers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the test-taker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values—that is, when different questions or sections are worth different amounts of points— it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score.

History

The knapsack problem has been studied for more than a century, with early works dating as far back as 1897. It is not known how the name "knapsack problem" originated, though the problem was referred to as such in the early works of mathematician Tobias Dantzig

Tobias Dantzig

Tobias Dantzig was a Baltic German Russian American mathematician, the father of George Dantzig, and the author of NUMBER: The Language of Science and Aspects of Science .Born in Latvia, Dantzig studied mathematics with Henri Poincaré in Paris...

(1884–1956) , suggesting that the name could have existed in folklore before a mathematical problem had been fully defined.

The quadratic knapsack problem was first introduced by Gallo, Hammer, and Simeone in 1960.

A 1998 study of the Stony Brook University algorithms repository showed that, out of 75 algorithmic problems, the knapsack problem was the 18th most popular and the 4th most needed after kd-tree

Kd-tree

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key...

s, suffix tree

Suffix tree

In computer science, a suffix tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations.The suffix tree for a string S is a tree whose edges are labeled with strings, such that each suffix...

s, and the bin packing problem

Bin packing problem

In computational complexity theory, the bin packing problem is a combinatorial NP-hard problem. In it, objects of different volumes must be packed into a finite number of bins of capacity V in a way that minimizes the number of bins used....

External links

Lecture slides on the knapsack problem
PYAsUKP: Yet Another solver for the Unbounded Knapsack Problem, with code taking advantage of the dominance relations in an hybride algorithm, benchmarks and downloadable copies of some papers.
Home page of David Pisinger with downloadable copies of some papers on the publication list (including "Where are the hard knapsack problems?")
Knapsack Problem solutions in many languages at Rosetta Code
Rosetta Code
Rosetta Code is a wiki-based programming chrestomathy website with solutions to various programming problems in many different programming languages. It was created in 2007 by Mike Mol. Rosetta Code includes 450 programming tasks, and covers 351 programming languages...
Dynamic Programming algorithm to 0/1 Knapsack problem
0-1 Knapsack Problem in Python
Interactive JavaScript branch-and-bound solver
Solving 0-1-KNAPSACK with Genetic Algorithms in Ruby

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Definition

Computational complexity

Unbounded knapsack problem

0-1 knapsack problem

Greedy approximation algorithm

Dominance relations in the UKP

Collective dominance

Threshold dominance

Multiple dominance

Modular dominance

Applications

History

See also

External links