Knapsack problem
Encyclopedia
The knapsack problem or rucksack problem is a problem in combinatorial optimization
Combinatorial optimization
In applied mathematics and theoretical computer science, combinatorial optimization is a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not feasible...

: Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most useful items.

The problem often arises in resource allocation
Resource allocation
Resource allocation is used to assign the available resources in an economic way. It is part of resource management. In project management, resource allocation is the scheduling of activities and the resources required by those activities while taking into consideration both the resource...

 with financial constraints. A similar problem also appears in combinatorics
Combinatorics
Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures. Aspects of combinatorics include counting the structures of a given kind and size , deciding when certain criteria can be met, and constructing and analyzing objects meeting the criteria ,...

, complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

, cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

 and applied mathematics
Applied mathematics
Applied mathematics is a branch of mathematics that concerns itself with mathematical methods that are typically used in science, engineering, business, and industry. Thus, "applied mathematics" is a mathematical science with specialized knowledge...

.

The decision problem
Decision problem
In computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...

 form of the knapsack problem is the question "can a value of at least V be achieved without exceeding the weight W?"

Definition

In the following, we have n kinds of items, 1 through n.
Each kind of item i has a value vi and a weight wi.
We usually assume that all values and weights are nonnegative. To simplify the representation, we can also assume that the items are listed in increasing order of weight.
The maximum weight that we can carry in the bag is W.

The most common formulation of the problem is the 0-1 knapsack problem, which restricts the number xi of copies of each kind of item
to zero or one.
Mathematically the 0-1-knapsack problem can be formulated as:
  • maximize
  • subject to


The bounded knapsack problem restricts the number of copies of
each kind of item to a maximum integer value .
Mathematically the bounded knapsack problem can be formulated as:
  • maximize
  • subject to


The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item.

Of particular interest is the special case of the problem with these properties:
  • it is a decision problem,
  • it is a 0-1 problem,
  • for each kind of item, the weight equals the value: .

Notice that in this special case, the problem is equivalent to this: given a set of nonnegative integers, does any subset of it add up to exactly W? Or, if negative weights are allowed and W is chosen to be zero, the problem is: given a set of integers, does any nonempty subset add up to exactly 0? This special case is called the subset sum problem. In the field of cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

, the term knapsack problem is often used to refer specifically to the subset sum problem.

If multiple knapsacks are allowed, the problem is better thought of as the bin packing problem.

Computational complexity

The knapsack problem is interesting from the perspective of computer science because
  • there is a pseudo-polynomial time
    Pseudo-polynomial time
    In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is polynomial in the numeric value of the input ....

     algorithm using dynamic programming
    Dynamic programming
    In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

  • there is a fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine
  • the problem is NP-complete
    NP-complete
    In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...

     to solve exactly, thus it is expected that no algorithm can be both correct and fast (polynomial-time) on all cases
  • many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly.


The subset sum version of the knapsack problem is commonly known as one of Karp's 21 NP-complete problems
Karp's 21 NP-complete problems
One of the most important results in computational complexity theory was Stephen Cook's 1971 demonstration of the first NP-complete problem, the boolean satisfiability problem...

.

There have been attempts to use subset sum as the basis for public key cryptography systems, such as the Merkle-Hellman knapsack cryptosystem. These attempts typically used some group
Group (mathematics)
In mathematics, a group is an algebraic structure consisting of a set together with an operation that combines any two of its elements to form a third element. To qualify as a group, the set and the operation must satisfy a few conditions called group axioms, namely closure, associativity, identity...

 other than the integers. Merkle-Hellman and several similar algorithms were later broken, because the particular subset sum problems they produced were in fact solvable by polynomial-time algorithms.

One theme in research literature is to identify what the "hard" instances of the knapsack problem look like, or viewed another way, to identify what properties of instances in practice might make them more amenable than their worst-case NP-complete behaviour suggests. name ="poirriez et all 2009">Vincent Poirriez, Nicola Yanev, Rumen Andonov (2009) A Hybrid Algorithm for the Unbounded Knapsack Problem Discrete Optimization http://dx.doi.org/10.1016/j.disopt.2008.09.004

Several algorithms are freely available to solve knapsack problems, based on dynamic programming approach, branch and bound approach or hybridizations of both approaches. name="martellopisingertoth99a">S. Martello, D. Pisinger, P. Toth, Dynamic programming and strong bounds for the 0-1
knapsack problem , Manag. Sci., 45:414–424, 1999. name="plateau85">G. Plateau, M. Elkihel, A hybrid algorithm for the 0-1 knapsack problem, Methods of
Oper. Res., 49:277–293, 1985.

Unbounded knapsack problem

If all weights () are
nonnegative integers, the knapsack problem can be solved in pseudo-polynomial time
Pseudo-polynomial time
In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is polynomial in the numeric value of the input ....

 using dynamic programming
Dynamic programming
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

. The following describes a dynamic programming solution for the unbounded knapsack problem.

To simplify things, assume all weights are strictly positive (wi > 0). We wish to maximize total value subject to the constraint that total weight is less than or equal to W. Then for each wW, define m[w] to be the maximum value that can be attained with total weight less than or equal to w. m[W] then is the solution to the problem.

Observe that m[w] has the following properties:
  • (the sum of zero items, i.e., the summation of the empty set)

where is the value of the i-th kind of item.

Here the maximum of the empty set is taken to be zero. Tabulating the results from up through gives the solution. Since the calculation of each involves examining items, and there are values of to calculate, the running time of the dynamic programming solution is
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

. Dividing by their greatest common divisor
Greatest common divisor
In mathematics, the greatest common divisor , also known as the greatest common factor , or highest common factor , of two or more non-zero integers, is the largest positive integer that divides the numbers without a remainder.For example, the GCD of 8 and 12 is 4.This notion can be extended to...

 is an obvious way to improve the running time.

The complexity does not contradict the fact that the knapsack problem is NP-complete
NP-complete
In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...

, since , unlike , is not polynomial in the length of the input to the problem. The length of the input to the problem is proportional to the number of bits in , , not to itself.

0-1 knapsack problem

A similar dynamic programming solution for the 0-1 knapsack problem also runs in pseudo-polynomial time
Pseudo-polynomial time
In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is polynomial in the numeric value of the input ....

. As above, assume are strictly positive integers. Define to be the maximum value that can be attained with weight less than or equal to using items up to .

We can define recursively as follows:
  • if (the new item is more than the current weight limit)
  • if .


The solution can then be found by calculating . To do this efficiently we can use a table to store previous computations. This solution will therefore run in time and space. Additionally, if we use only a 1-dimensional array to store the current optimal values and pass over this array times, rewriting from to every time, we get the same result for only space.

Another algorithm for 0-1 knapsack, discovered in 1974 and sometimes called "meet-in-the-middle" due to parallels to a similarly-named algorithm in cryptography
Meet-in-the-middle attack
The meet-in-the-middle attack is a cryptographic attack which, like the birthday attack, makes use of a space-time tradeoff. While the birthday attack attempts to find two values in the domain of a function that map to the same value in its range, the meet-in-the-middle attack attempts to find a...

, is exponential in the number of different items but may be preferable to the DP algorithm when is large compared to n. In particular, if the are nonnegative but not integers, we could still use the dynamic programming algorithm by scaling and rounding (i.e. using fixed-point arithmetic
Fixed-point arithmetic
In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after the radix point...

), but if the problem requires fractional digits of precision to arrive at the correct answer, will need to be scaled by , and the DP algorithm will require space and time.

The "meet-in-the-middle" algorithm is as follows:
  1. Partition the set {1...n} into two sets A and B of approximately equal size
  2. Compute the weights and values of all subsets of each set.
  3. For each subset of A, find the "best matching" subset of B, i.e. the subset of B of greatest value such that the combined weight is less than W. Keep track of the greatest combined value seen so far.


The algorithm takes space, and efficient implementations of step 3 (for instance, sorting the subsets of B by weight, discarding subsets of B which weigh more than other subsets of B of greater or equal value, and using binary search to find the best match) result in a runtime of . As with the meet in the middle attack
Meet-in-the-middle attack
The meet-in-the-middle attack is a cryptographic attack which, like the birthday attack, makes use of a space-time tradeoff. While the birthday attack attempts to find two values in the domain of a function that map to the same value in its range, the meet-in-the-middle attack attempts to find a...

 in cryptography, this improves on the runtime of a naive brute force approach (examining all subsets of {1...n}), at the cost of using exponential rather than constant space.

Greedy approximation algorithm

George Dantzig
George Dantzig
George Bernard Dantzig was an American mathematical scientist who made important contributions to operations research, computer science, economics, and statistics....

 proposed a greedy
Greedy algorithm
A greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stagewith the hope of finding the global optimum....

 approximation algorithm
Approximation algorithm
In computer science and operations research, approximation algorithms are algorithms used to find approximate solutions to optimization problems. Approximation algorithms are often associated with NP-hard problems; since it is unlikely that there can ever be efficient polynomial time exact...

 to solve the unbounded knapsack problem. His version sorts the items in decreasing order of value per unit of weight, . It then proceeds to insert them into the sack, starting with as many copies as possible of the first kind of item until there is no longer space in the sack for more. Provided that there is an unlimited supply of each kind of item, if is the maximum value of items that fit into the sack, then the greedy algorithm is guaranteed to achieve at least a value of . However, for the bounded problem, where the supply of each kind of item is limited, the algorithm may be far from optimal.

Dominance relations in the UKP

Solving the unbounded knapsack problem can be made easier by throwing away items which will never be needed. For a given item i, suppose we could find a set of items J such that their total weight is less than the weight of i, and their total value is greater than the value of i. Then i cannot appear in the optimal solution, because we could always improve any potential solution containing i by replacing i with the set J. Therefore we can disregard the i-th item altogether. In such cases, J is said to dominate i. (Note that this does not apply to bounded knapsack problems, since we may have already used up the items in J.)

Finding dominance relations allows us to significantly reduce the size of the search space. There are several different types of dominance relations, name ="poirriez et all 2009">Vincent Poirriez, Nicola Yanev, Rumen Andonov (2009) A Hybrid Algorithm for the Unbounded Knapsack Problem, section 2) Discrete Optimization http://dx.doi.org/10.1016/j.disopt.2008.09.004 which all satisfy an inequality of the form:
, and for some

where
and . The vector denotes the number of copies of each member of J.

Collective dominance

The i-th item is collectively dominated by J, written as , if the total weight of some combination of items in J is less than wi and their total value is greater than vi. Formally, and for some , i.e. . Verifying this dominance is computationally hard, so it can only be used with a dynamic programming approach. In fact, this is equivalent to solving a smaller knapsack decision problem where2 V = vi, W = wi, and the items are restricted to J.

Threshold dominance

The i-th item is threshold dominated by J, written as , if some number of copies of i are dominated by J. Formally,
, and for some and . This is a generalization of collective dominance, first introduced in and used in the EDUK algorithm. The smallest such
defines the threshold of the item i, written . In this case, the optimal solution could contain at most copies of i.

Multiple dominance

The i-th item is multiply dominated by a single item j, written as , if i is dominated by some number of copies of j. Formally,
, and for some
i.e. .
This dominance could be efficiently used during preprocessing because it can be detected relatively easily.

Modular dominance

Let b be the best item, i.e. for all i. This is the item with the greatest density of value.

The i-th item is modularly dominated by a single item j, written as , if i is dominated by j plus several copies of b. Formally,
, and i.e. .

Applications

Knapsack problems can be applied to real-world decision-making processes in a wide variety of fields, such as the finding the least wasteful cutting of raw materials, selection of capital investments and financial portfolios, selection of assets for asset-backed securitization, and generating keys for the Merkle–Hellman knapsack cryptosystem.

One early application of knapsack algorithms was in the construction and scoring of tests in which the test-takers have a choice as to which questions they answer. On tests with a homogeneous
Homogeneity and heterogeneity
Homogeneity and heterogeneity are concepts relating to the uniformity or lack thereof in a substance. A material that is homogeneous is uniform in composition or character; one that is heterogeneous lacks uniformity in one of these qualities....

 distribution of point values for each question, it is a fairly simple process to provide the test-takers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the test-taker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values—that is, when different questions or sections are worth different amounts of points— it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score.

History

The knapsack problem has been studied for more than a century, with early works dating as far back as 1897. It is not known how the name "knapsack problem" originated, though the problem was referred to as such in the early works of mathematician Tobias Dantzig
Tobias Dantzig
Tobias Dantzig was a Baltic German Russian American mathematician, the father of George Dantzig, and the author of NUMBER: The Language of Science and Aspects of Science .Born in Latvia, Dantzig studied mathematics with Henri Poincaré in Paris...

 (1884–1956) , suggesting that the name could have existed in folklore before a mathematical problem had been fully defined.

The quadratic knapsack problem was first introduced by Gallo, Hammer, and Simeone in 1960.

A 1998 study of the Stony Brook University algorithms repository showed that, out of 75 algorithmic problems, the knapsack problem was the 18th most popular and the 4th most needed after kd-tree
Kd-tree
In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key...

s, suffix tree
Suffix tree
In computer science, a suffix tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations.The suffix tree for a string S is a tree whose edges are labeled with strings, such that each suffix...

s, and the bin packing problem
Bin packing problem
In computational complexity theory, the bin packing problem is a combinatorial NP-hard problem. In it, objects of different volumes must be packed into a finite number of bins of capacity V in a way that minimizes the number of bins used....

.

See also

  • List of knapsack problems
  • Packing problem
    Packing problem
    Packing problems are a class of optimization problems in mathematics which involve attempting to pack objects together , as densely as possible. Many of these problems can be related to real life packaging, storage and transportation issues...

  • Cutting stock problem
    Cutting stock problem
    The cutting-stock problem is an optimization problem, or more specifically, an integer linear programming problem. It arises from many applications in industry. Imagine that you work in a paper mill and you have a number of rolls of paper of fixed width waiting to be cut, yet different customers...

  • Continuous knapsack problem
    Continuous knapsack problem
    The continuous knapsack problem, also known as the fractional knapsack problem, is similar to the classic knapsack problem but in this problem fractions of an item can be put into the knapsack.The problem is as following:...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK