Perfect hash function
Encyclopedia
A perfect hash function for a set S is a hash function
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

 that maps distinct elements in S to a set of integers, with no collisions
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....

. A perfect hash function has many of the same applications as other hash functions, but with the advantage that no collision resolution has to be implemented.

Properties and uses

A perfect hash function for a specific set S that can be evaluated in constant time, and with values in a small range, can be found by a randomized algorithm
Randomized algorithm
A randomized algorithm is an algorithm which employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits...

 in a number of operations that is proportional to the size of S. The minimal size of the description of a perfect hash function depends on the range of its function values: The smaller the range, the more space is required. Any perfect hash functions suitable for use with a hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

 require at least a number of bits that is proportional to the size of S.

A perfect hash function with values in a limited range
Range (mathematics)
In mathematics, the range of a function refers to either the codomain or the image of the function, depending upon usage. This ambiguity is illustrated by the function f that maps real numbers to real numbers with f = x^2. Some books say that range of this function is its codomain, the set of all...

 can be used for efficient lookup operations, by placing keys from S (or other associated values) in a table
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...

 indexed by the output of the function. Using a perfect hash function is best in situations where there is a frequently queried large set, S, which is seldom updated. Efficient solutions to performing updates are known as dynamic perfect hashing
Dynamic perfect hashing
In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set, S, of elements.-Details:In this method, the entries...

, but these methods are relatively complicated to implement. A simple alternative to perfect hashing, which also allows dynamic updates, is cuckoo hashing
Cuckoo hashing
Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001...

.

Minimal perfect hash function

A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers—usually [0..n−1] or [1..n]. A more formal way of expressing this is: Let j and k be elements of some finite set K. F is a minimal perfect hash function iff
IFF
IFF, Iff or iff may refer to:Technology/Science:* Identification friend or foe, an electronic radio-based identification system using transponders...

 F(j) =F(k) implies j=k and there exists an integer a such that the range of F is a..a+|K|−1. It has been proved that a general purpose minimal perfect hash scheme requires at least 1.44 bits/key. However the smallest currently use around 2.5 bits/key.

A minimal perfect hash function F is order preserving if keys are given in some order a1, a2, ..., and for any keys aj and ak, j<k implies F(aj)<F(ak). Order-preserving minimal perfect hash functions require necessarily Ω(n log n) bits to be represented.

A minimal perfect hash function F is monotone if it preserves the lexicographical order of the keys. Monotone minimal perfect hash functions can be represented in very little space.

Further reading

  • Richard J. Cichelli. Minimal Perfect Hash Functions Made Simple, Communications of the ACM, Vol. 23, Number 1, January 1980.
  • Thomas H. Cormen
    Thomas H. Cormen
    Thomas H. Cormen is the co-author of Introduction to Algorithms, along with Charles Leiserson, Ron Rivest, and Cliff Stein. He is a Full Professor of computer science at Dartmouth College and currently Chair of the Dartmouth College Department of Computer Science. Between 2004 and 2008 he directed...

    , Charles E. Leiserson
    Charles E. Leiserson
    Charles Eric Leiserson is a computer scientist, specializing in the theory of parallel computing and distributed computing, and particularly practical applications thereof; as part of this effort, he developed the Cilk multithreaded language...

    , Ronald L. Rivest, and Clifford Stein
    Clifford Stein
    Clifford Stein, a computer scientist, is currently a professor of industrial engineering and operations research at Columbia University in New York, NY, where he also holds an appointment in the Department of Computer Science. Stein is chair of the Industrial Engineering and Operations Research...

    . Introduction to Algorithms
    Introduction to Algorithms
    Introduction to Algorithms is a book by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. It is used as the textbook for algorithms courses at many universities. It is also one of the most commonly cited references for algorithms in published papers, with over 4600...

    , Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 11.5: Perfect hashing, pp. 245–249.
  • Fabiano C. Botelho, Rasmus Pagh and Nivio Ziviani. "Perfect Hashing for Data Management Applications".
  • Fabiano C. Botelho and Nivio Ziviani. "External perfect hashing for very large key sets". 16th ACM Conference on Information and Knowledge Management (CIKM07), Lisbon, Portugal, November 2007.
  • Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. "Monotone minimal perfect hashing: Searching a sorted table with O(1) accesses". In Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), New York, 2009. ACM Press.
  • Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. "Theory and practise of monotone minimal perfect hashing". In Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 2009.

External links

  • Minimal Perfect Hashing by Bob Jenkins
  • gperf is a Free software
    Free software
    Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

     C and C++ perfect hash generator
  • cmph is Free Software
    Free software
    Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

     implementing many perfect hashing methods
  • Sux4J is Free Software
    Free software
    Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

     implementing perfect hashing, including monotone minimal perfect hashing in Java
  • MPHSharp is Free Software
    Free software
    Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

    implementing many perfect hashing methods in C#
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK