All Topics  
Hash function

 

   Email Print
   Bookmark   Link






 

Hash function



 
 
A hash function is any well-defined procedure
Algorithm

In mathematics, computing, linguistics and related subjects, an algorithm is a sequence of finite instructions, often used for calculation and data processing....
 or mathematical function
Function (mathematics)

The mathematical concept of a function expresses dependence between two quantities, one of which is known and the other which is produced. A function associates a single output to each input element drawn from a fixed Set , such as the real numbers , although different inputs may have the same output....
 which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer
Integer

The integers are natural numbers including 0 and their negative and non-negative numberss . They are numbers that can be written without a fractional or decimal component, and fall within the set ....
 that may serve as an index into an array
Array

In computer science, an array is a data structure consisting of a group of element s that are accessed by index . In most programming languages each element has the same data type and the array occupies a contiguous area of computer memory....
. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.

Hash functions are mostly used to speed up table lookup or data comparison tasks — such as finding items in a database
Database

A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model....
, detecting duplicated or similar records in a large file, finding similar stretches in DNA
Nucleic acid

A nucleic acid is a macromolecule composed of chains of monomeric nucleotides. In biochemistry these molecules carry genetic information or form structures within Cell ....
 sequences, and so on.

Hash functions are related to (and often confused with) checksum
Checksum

A checksum or hash sum is a fixed-size data computed from an arbitrary block of digital data for the purpose of error detection that may have been introduced during its telecommunications or computer storage....
s, check digit
Check digit

A check digit is a form of redundancy check used for error detection, the decimal equivalent of a binary checksum. It consists of a single digit computed from the other digits in the message....
s, fingerprint
Fingerprint (computing)

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes...
s, randomizing functions, error correcting codes, and cryptographic hash function
Cryptographic hash function

A cryptographic hash function is a algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will almost certainly change the hash value....
s.






Discussion
Ask a question about 'Hash function'
Start a new discussion about 'Hash function'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A hash function is any well-defined procedure
Algorithm

In mathematics, computing, linguistics and related subjects, an algorithm is a sequence of finite instructions, often used for calculation and data processing....
 or mathematical function
Function (mathematics)

The mathematical concept of a function expresses dependence between two quantities, one of which is known and the other which is produced. A function associates a single output to each input element drawn from a fixed Set , such as the real numbers , although different inputs may have the same output....
 which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer
Integer

The integers are natural numbers including 0 and their negative and non-negative numberss . They are numbers that can be written without a fractional or decimal component, and fall within the set ....
 that may serve as an index into an array
Array

In computer science, an array is a data structure consisting of a group of element s that are accessed by index . In most programming languages each element has the same data type and the array occupies a contiguous area of computer memory....
. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.

Hash functions are mostly used to speed up table lookup or data comparison tasks — such as finding items in a database
Database

A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model....
, detecting duplicated or similar records in a large file, finding similar stretches in DNA
Nucleic acid

A nucleic acid is a macromolecule composed of chains of monomeric nucleotides. In biochemistry these molecules carry genetic information or form structures within Cell ....
 sequences, and so on.

Hash functions are related to (and often confused with) checksum
Checksum

A checksum or hash sum is a fixed-size data computed from an arbitrary block of digital data for the purpose of error detection that may have been introduced during its telecommunications or computer storage....
s, check digit
Check digit

A check digit is a form of redundancy check used for error detection, the decimal equivalent of a binary checksum. It consists of a single digit computed from the other digits in the message....
s, fingerprint
Fingerprint (computing)

In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes...
s, randomizing functions, error correcting codes, and cryptographic hash function
Cryptographic hash function

A cryptographic hash function is a algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will almost certainly change the hash value....
s. Although these concepts overlap to some extent, each has its own uses and requirements. The HashKeeper
HashKeeper

HashKeeper is a database application of value primarily to those conducting forensic examinations of computers on a somewhat regular basis....
 database maintained by the National Drug Intelligence Center
National Drug Intelligence Center

The U.S. National Drug Intelligence Center , established in 1993, is a component of the U.S. Department of Justice and a member of the Intelligence Community....
, for instance, is more aptly described as a catalog of file fingerprints than of hash values.

Applications

Hash functions are mostly used in hash table
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
s, to quickly locate a data record (for example, a dictionary
Dictionary

A dictionary is a book of Alphabetical order listed words in a specific language, with definitions, etymologies, pronunciations, and other information; or a book of alphabetically listed words in one language with their equivalents in another, also known as a lexicon....
 definition) given its search key (the headword). Specifically, the hash function is used to map the search key to the index of a slot in the table where the corresponding record is supposedly stored.

In general, a hashing function may map several different keys to the same hash value. Therefore, each slot of a hash table contains (implicitly or explicitly) a set of records, rather than a single record. For this reason, each slot of a hash table is often called a bucket, and hash values are also called bucket indices.

Thus, the hash function only hints at the record's location — it only tells where one should start looking for it. Still, in a half-full table, a good hash function will typically narrow the search down to only one or two entries.

In the Java programming language, for example, the Object parent class
Object-oriented programming

Object-oriented programming is a programming paradigm that uses "Object_" and their interactions to design applications and computer programs....
 provides a standard hashCode method that is required to generate a 32-bit integer hash value of its object. This method is used in several hash-table-based classes, such as HashMap and HashSet.

Finding duplicate records

To find duplicated records in a large unsorted file, one may use a hash function to map each file record to an index into a table T, and collect in each bucket T[i] a list
List (computing)

In computer science, a list is an ordered Multiset of entity/items.In the context of object-oriented programming languages, a list is defined as an instance of an abstract data type , formalizing the concept of an order theoryed Collection class of entity....
 of the numbers of all records with the same hash value i. Once the table is complete, any two duplicate records will end up in the same bucket. The duplicates can then be found by scanning every bucket T[i] which contains two or more members, fetching those records, and comparing them. With a table of appropriate size, this method is likely to be much faster than any alternative approach (such as sorting the file and comparing all consecutive pairs).

Finding similar records

Hash functions can also be used to locate table records whose key is similar, but not identical, to a given key; or pairs of records in a large file which have similar keys. For that purpose, one needs a hash function that maps similar keys to hash values that differ by at most m, where m is a small integer (say, 1 or 2). If one builds a table of T of all record numbers, using such a hash function, then similar records will end up in the same bucket, or in nearby buckets. Then one need only check the records in each bucket T[i] against those in buckets T[i+k] where k ranges between -m and m.

This class includes the so-called acoustic fingerprint
Acoustic fingerprint

An acoustic fingerprint is a condensed digital summary, deterministic algorithm generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database....
 algorithms, that are used to locate similar-sounding entries in large collection of audio files (as in the MusicBrainz
MusicBrainz

MusicBrainz is a project that aims to create an open content music database. Similar to the freedb project, it was founded in response to the restrictions placed on the CDDB....
 song labeling service). For this application, the hash function must be as insensitive as possible to data capture or transmission errors, and to "trivial" changes such as timing and volume changes, compression, etc. .

Finding similar substrings

The same techniques can be used to find equal or similar stretches in a large collection of strings, such as a document repository or a genomic database. In this case, the input strings are broken into many small pieces, and a hash function is used to detect potentially equal pieces, as above.

The Rabin-Karp algorithm
Rabin-Karp string search algorithm

The Rabin-Karp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find a substring in a text....
 is a relatively fast string searching algorithm
String searching algorithm

String searching algorithms, sometimes called string matching algorithms, are an important class of string algorithms that try to find a place where one or several string are found within a larger string or text....
 that works in O(n)
Big O notation

In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
 time on average. It is based on the use of hashing to compare strings.

Geometric hashing

This principle is widely used in computer graphics
Computer graphics

Computer graphics are graphics created by computers and, more generally, the representation and manipulation of pictorial data by a computer....
, computational geometry
Computational geometry

Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems are also considered to be part of computational geometry....
 and many other disciplines, to solve many proximity problems in the plane or in three-dimensional space, such as finding closest pairs in a set of points, similar shapes in a list of shapes, similar image
Image processing

In electrical engineering and computer science, image processing is any form of signal processing for which the input is an , such as photographs or video frame; the output of image processing can be either an image or a set of characteristics or parameters related to the image....
s in an image database, and so on. In these applications, the set of all inputs is some sort of metric space
Metric space

In mathematics, a metric space is a Set where a notion of distance between elements of the set is defined.The metric space which most closely corresponds to our intuitive understanding of space is the 3-dimensional Euclidean space....
, and the hashing function can be interpreted as a partition of that space into a grid of cells. The table is often an array with two or more indices (called a bucket grids), and the hash function returns an index tuple. This special case of hashing is known as geometric hashing
Geometric hashing

In computer science, geometric hashing is a method for efficientlyfinding two-dimensional objects represented by discrete points that have undergone an affine transformation....
 or the grid method. Geometric hashing is also used in telecommunication
Telecommunication

Telecommunication is the assisted Transmission of Signal over a distance for the purpose of communication. In earlier times, this may have involved the use of smoke signals, Drum , Semaphore line, flag signals or heliograph....
s (usually under the name vector quantization
Vector quantization

Vector quantization is a classical quantization technique from signal processing which allows the modeling of probability density functions by the distribution of prototype vectors....
) to encode and compress
Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an code representation would use through use of specific encoding schemes....
 multi-dimensional signals.

Properties


Good hash functions, in the original sense of the term, are usually required to satisfy certain properties listed below. Note that different requirements apply to the other related concepts (cryptographic hash functions, checksums, etc.).

Low cost

The cost of computing a hash function must be small enough to make a hashing-based solution more advantageous over other approaches. For instance, binary search can locate an item in a sorted table of n items with log2 n key comparisons. Therefore, a hash-table solution will be more efficient than binary search only if computing the hash function for one key costs less than performing log2 n key comparisons.

Determinism

A hash procedure must be deterministic
Deterministic algorithm

In computer science, a deterministic algorithm is an algorithm which, in informal terms, behaves predictably. Given a particular input, it will always produce the same output, and the underlying machine will always pass through the same sequence of states....
 — meaning that for a given input value it must always generate the same hash value. In other words, it must be a function
Function (mathematics)

The mathematical concept of a function expresses dependence between two quantities, one of which is known and the other which is produced. A function associates a single output to each input element drawn from a fixed Set , such as the real numbers , although different inputs may have the same output....
 of the hashed data, in the mathematical sense of the term. This requirement excludes hash functions that depend on external variable parameters, such as pseudo-random number generators that depend on the time of day. It also excludes functions that depend on the memory address of the object being hashed, if that address may change during processing (as may happen in systems that use certain methods of garbage collection
Garbage collection (computer science)

In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage , or memory used by Object that will never be accessed or mutated again by the Application software....
).

Uniformity

A good hash function should map the expected inputs as evenly as possible over its output range. That is, every hash value in the output range should be generated with roughly the same probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
. The reason for this last requirement is that the cost of hashing-based methods goes up sharply as the number of collisions — pairs of inputs that are mapped to the same hash value — increases. Basically, if some hash values are more likely to occur than others, a larger fraction of the lookup operations will have to search through a larger set of colliding table entries.

Note that this criterion only requires the value to be uniformly distributed, not random in any sense. A good randomizing function is usually good for hashing, but the converse need not be true.

Hash tables often contain only a small subset of the valid inputs. For instance, a club membership list may contain only a hundred or so member names, out of the very large set of all possible names. In these cases, the uniformity criterion should hold for almost all typical subsets of entries that may be found in the table, not just for the global set of all possible entries.

In other words, if a typical set of m records is hashed to n table slots, the probability of a bucket receiving many more than m/n records should be vanishingly small. In particular, if m is less than n, very few buckets should have more than one or two records. (In an ideal "perfect hash function
Perfect hash function

A perfect hash function for a set S is a hash function that maps distinct elements in S to distinct integers, with no hash collision. A perfect hash function with values in a limited range can be used for efficient lookup operations, by placing keys from S in a lookup table indexed by the output of the function....
", no bucket should have more than one record; but a small number of collisions is virtually inevitable, even if n is much larger than m -- see the birthday paradox
Birthday paradox

In probability theory, the birthday problem, or birthday paradox pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday....
).

Variable range

In many applications, the range of hash values may be different for each run of the program, or may change along the same run (for instance, when a hash table needs to be expanded). In those situations, one needs a hash function which takes two parameters — the input data z, and the number n of allowed hash values.

Data normalization

In some applications, the input data may contain features that are irrelevant for comparison purposes. When looking up a personal name, for instance, it may be desirable to ignore the distinction between upper and lower case letters. For such data, one must use a hash function that is compatible with the data equivalence
Equivalence relation

In mathematics, an equivalence relation is, loosely, a binary relation on a Set that specifies how to split up the set into subsets such that every element of the larger set is in exactly one of the subsets....
 criterion being used: that is, any two inputs that are considered equivalent must yield the same hash value.

Continuity

A hash function that is used to search for similar (as opposed to equivalent) data must be as continuous
Continuous function

In mathematics, a continuous function is a function for which, intuitively, small changes in the input result in small changes in the output. Otherwise, a function is said to be discontinuous....
 as possible; two inputs that differ by a little should be mapped to equal or nearly equal hash values.

Note that continuity is usually considered a fatal flaw for checksums, cryptographic hash functions, and other related concepts. Continuity is desirable for hash functions only in some applications, such as hash tables that use linear search
Linear search

In computer science, linear search is a search algorithm, also known as sequential search, that is suitable for searching a list of data for a particular value....
.

Hash function algorithms

The choice of a hashing function depends strongly on the nature of the input data, and their probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
 in the intended application.

Trivial hash function

If the data to be hashed is small enough, one can use the data itself (reinterpreted as an integer in binary notation) as the hashed value. The cost of computing this "trivial" (identity
Identity function

In mathematics, an identity function, also called identity map or identity transformation, is a function that always returns the same value that was used as its argument....
) hash function is effectively zero.

The meaning of 'small enough' depends on how much memory is available for the hash table. A typical PC (as of 2008) might have a gigabyte of available memory, meaning that hash values of up to 30 bits could be accommodated. However, there are many applications that can get by with much less. For example, when mapping character strings between upper and lower case, one can use the binary encoding of each character, interpreted as an integer, to index a table that gives the alternative form of that character ('A' for 'a', '8' for '8', etc.). If each character is stored in 8 bits (as in ASCII
ASCII

American Standard Code for Information Interchange , is a coding standard that can be used for interchanging information, if the information is expressed mainly by the written form of English words....
 or ISO Latin 1), the table has only 28 = 256 entries; in the case of Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
 characters, the table would have 216 = 65536 entries.

The same technique can be used to map two-letter country codes like 'us' or 'za' to country names (65536 table entries), 5-digit zip codes like 13083 to city names (100000 entries), etc. Invalid data values (such as the country code 'xx' or the zip code 00000) may be left undefined in the table, or mapped to some appropriate 'null' value.

Injective and perfect hashing

The ideal hashing function should be injective
Injective function

In mathematics, an injective function is a function which associates distinct arguments with distinct values.An injective function is called an injection, and is also said to be a one-to-one function ....
 — that is, it should map each valid input to a different hash value. Such a function would directly locate the desired entry in a hash table, without any additional search.

An injective hash function whose range is all integers between 0 and n-1, where n is the number of valid inputs, is said to be perfect
Perfect hash function

A perfect hash function for a set S is a hash function that maps distinct elements in S to distinct integers, with no hash collision. A perfect hash function with values in a limited range can be used for efficient lookup operations, by placing keys from S in a lookup table indexed by the output of the function....
. Besides providing single-step lookup, a perfect hash function also results in a compact hash table, without any vacant slots.

Unfortunately, injective and perfect hash functions exist only in very few special situations (such as mapping month names to the integers 0 to 11); and even then they are often too complicated or expensive to be of practical use. Indeed, hash functions are typically required to map a large set of valid potential inputs to a much smaller range of hash values (e.g. for mapping data items from a large set of valid values into elements of a memory array of limited size) and therefore cannot be injective.

Hashing uniformly distributed data

If the inputs are bounded-length strings
String (computer science)

In computer programming and some branches of mathematics, a string is an ordered sequence of symbols. These symbols are chosen from a predetermined set or alphabet....
 (such as telephone
Telephone

The telephone is a telecommunications device that is used to transmitter and receive electronically or digitally encoded sound between two or more people conversing....
 numbers, car license plates, invoice
Invoice

An invoice or bill is a Commerce document issued by a sales to the buyer, indicating the product s, quantities, and agreed prices for products or Service s the seller has provided the buyer....
 numbers, etc.), and each input may independently
Statistical independence

In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
 occur with uniform
Uniform distribution

Uniform distribution can refer to:...
 probability, then a hash function need only map roughly the same number of inputs to each hash value. For instance, suppose that each input is an integer z in the range 0 to N-1, and the output must be an integer h in the range 0 to n-1, where N is much larger than n. Then the hash function could be h = z mod n (the remainder of z divided by n), or h = (z × n) ÷ N (the value z scaled down by n/N and truncated to an integer), or many other formulas.

Hashing data with other distributions

These simple formulas will not do if the input values are not equally likely, or are not independent. For instance, most patrons of a supermarket
Supermarket

A supermarket is a self-service Retailing#Retail types offering a wide variety of food and household merchandise, organized into departments....
 will live in the same geographic area, so their telephone numbers are likely to begin with the same 3 to 4 digits. In that case, if n is 10000 or so, the division formula (z × n) ÷ N, which depends mainly on the leading digits, will generate a lot of collisions; whereas the remainder formula z mod n, which is quite sensitive to the trailing digits, may still yield a fairly even distribution of hash values.

Hashing variable-length data


When the data values are long (or variable-length) character
Character (computing)

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written language form of a natural language....
 strings — such as personal names, web page addresses, or mail messages — their distribution is usually very uneven, with complicated dependencies. For example, text in any natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
 has highly non-uniform distributions of characters, and character pairs, very characteristic of the language. For such data, it is prudent to use a hash function that depends on all characters of the string — and depends on each character in a different way.

A fairly common scheme for hashing such data is to break the input into a sequence of small units (bit
Bit

A bit is a binary numeral system numerical digit, taking a value of either 0 or 1. Binary digits are a basic unit of information Computer data storage and transmission in digital computing and digital information theory....
s, byte
Byte

A byte is a basic unit of measurement of Computer storage in computer science. In many computer architectures it is a Byte addressing memory address space....
s, words, etc.) and combine all the units b[1], b[2], …, b[m] sequentially, as follows

S ? S0; // Initialize the state. for k in 1, 2, …, m do // Scan the input data units: S ? F(S, b[k]); // Combine data unit k into the state. return G(S, n) // Extract the hash value from the state.

This schema is also used in many text checksum and fingerprint algorithms. The state variable S may be a 32- or 64-bit unsigned integer; in that case, S0 can be 0, and G(S,n) can be just S mod n. The best choice of F is a complex issue and depends on the nature of the data. If the units b[k] are single bits, then F(S,b) could be, for instance if highbit(S) = 0 then return 2 * S + b else return (2 * S + b) ^ P Here highbit(S) denotes the most significant bit of S; the '*' operator denotes unsigned integer multiplication with lost overflow
Overflow (software)

OVERFLOW - the OVERset grid FLOW solver - is a Software package for simulating Fluid dynamics around solid bodies using computational fluid dynamics ....
; '^' is the bitwise exclusive or operation applied to words; and P is a suitable fixed word .

Special-purpose hash functions

In many such cases, one can design a special-purpose (heuristic
Heuristic (computer science)

In computer science, a heuristic algorithm, or simply a heuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, but for which there is no formal proof of its correctness....
) hash function that yields many fewer collisions than a good general-purpose hash function. For example, suppose that the input data are file names such as FILE0000.CHK, FILE0001.CHK, FILE0002.CHK, etc., with mostly sequential numbers. For such data, a function that extracts the numeric part k of the file name and returns k mod n would be nearly optimal. Needless to say, a function that is exceptionally good for a specific kind of data may have dismal performance on data with different distribution.

Hashing with checksum functions

One can obtain good general-purpose hash functions for string data by adapting certain checksum or fingerprinting algorithms. Some of those algorithms will map arbitrary long string data z, with any typical real-world distribution — no matter how non-uniform and dependent — to a fixed length bit string, with a fairly uniform distribution. This string can be interpreted as a binary integer k, and turned into a hash value by the formula h = k mod n.

This method will produce a fairly even distribution of hash values, as long as the hash range size n is small compared to the range of the checksum function. Bob Jenkins' algorithm uses a 32-bit checksum. A 64-bit checksum should provide adequate hashing for tables of any feasible size.

Hashing with cryptographic hash functions

Some cryptographic hash functions, such as SHA-1
SHA hash functions

The SHA hash functions are a set of cryptographic hash functions designed by the National Security Agency and published by the National Institute of Standards and Technology as a U.S....
, have even stronger uniformity guarantees than checksums or fingerprints, and thus can provide very good general-purpose hashing functions. However, the uniformity advantage may be too small to offset their much higher cost.

Origins of the term


The term "hash" comes by way of analogy with its standard meaning in the physical world, to "chop and mix". Indeed, typical hash functions, like the mod
Modular arithmetic

In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value — the modulus....
 operation, "chop" the input domain into many sub-domains that get "mixed" into the output range. Donald Knuth
Donald Knuth

Donald Ervin Knuth is a renowned computer science and Emeritus of the Art of Computer Programming at Stanford University.Author of the seminal multi-volume work The Art of Computer Programming , Knuth has been called the "father" of the run-time analysis, contributing to the development of, and systematizing formal mathematical techn...
 notes that Hans Peter Luhn
Hans Peter Luhn

Hans Peter Luhn was a computer science for IBM, and creator of the Luhn algorithm and Key Word in Context indexing. He was awarded over 80 patents....
 of IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
 appears to have been the first to use the concept, in a memo dated January 1953, and that Robert Morris
Robert Morris (cryptographer)

Robert "Bob" H. Morris is an United States cryptographer. He received a bachelor's degree in mathematics from Harvard University in 1957 and a master's degree in mathematics from Harvard University in 1958....
 used the term in a survey paper in CACM
Communications of the ACM

Communications of the ACM is the flagship monthly journal of the Association for Computing Machinery . First published in 1957, CACM is sent to all ACM members, currently numbering about 80,000....
 which elevated the term from technical jargon to formal terminology.

See also


  • Bloom filter
    Bloom filter

    The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set ....
  • Coalesced hashing
    Coalesced hashing

    Coalesced hashing, also called coalesced chaining, is a strategy of collision resolution in a hash table that forms a hybrid of separate chaining and open addressing....
  • Cryptography
    Cryptography

    Cryptography is the practice and study of hiding information. In modern times cryptography is considered a branch of both mathematics and computer science and is affiliated closely with information theory, computer security and engineering....
  • Cryptographic hash function
    Cryptographic hash function

    A cryptographic hash function is a algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will almost certainly change the hash value....
  • Distributed hash table
    Distributed hash table

    Distributed hash tables are a class of decentralized Distributed computing that provide a lookup service similar to a hash table: pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given key....
  • Geometric hashing
    Geometric hashing

    In computer science, geometric hashing is a method for efficientlyfinding two-dimensional objects represented by discrete points that have undergone an affine transformation....
  • HMAC
    HMAC

    In cryptography, a keyed-Hash Message Authentication Code , is a type of message authentication code calculated using a specific algorithm involving a cryptographic hash function in combination with a secret cryptographic key....
  • Linear hash
    Linear hash

    Linear hashing is a dynamic hash table algorithm invented by Witold Litwin , and later popularized by Paul Larson. Linear hashing allows for the expansion of the hash table one slot at a time....
  • Rabin-Karp string search algorithm
    Rabin-Karp string search algorithm

    The Rabin-Karp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find a substring in a text....
  • Rolling hash
    Rolling hash

    A rolling hash is a hash function where the input is hashed in a window that moves through the input.A few hash functions allow a rolling hash to be computed very quickly -- the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window -- similar to the w...
  • Hash list
    Hash list

    In computer science, a hash list is typically a List of Hash function of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup and distributed databases ....
  • Hash table
    Hash table

    In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
  • Hash tree
    Hash tree

    In cryptography and computer science Hash trees or Merkle trees are a type of data structure which contains a Tree of summary information about a larger piece of data ? for instance a file ? used to verify its contents....
  • List of hash functions
    List of hash functions

    This is a list of hash functions, including cyclic redundancy checks, checksum functions, and cryptographic hash functions....
  • Perfect hash function
    Perfect hash function

    A perfect hash function for a set S is a hash function that maps distinct elements in S to distinct integers, with no hash collision. A perfect hash function with values in a limited range can be used for efficient lookup operations, by placing keys from S in a lookup table indexed by the output of the function....
  • Transposition table
    Transposition table

    In computer chess and other computer games, transposition tables are used to speed up the search of the game tree. Transposition tables are primarily useful in perfect information games, meaning the entire state of the game is known to all players at all times....
  • Universal hashing
    Universal hashing

    Universal hashing is a randomized algorithm for selecting a hash function F with the following property: for any two distinct inputs x and y, the probability that F=F is the same as if F was a random function....
  • Zobrist hashing
    Zobrist hashing

    Zobrist hashing is a hash function construction used in computer programs that play abstract board games, such as computer chess and computer go, to implement transposition tables, a special kind of hash table that is indexed by a board position....


External links

  • by Thomas Wang
  • (PDF
    Portable Document Format

    Portable Document Format is a file format created by Adobe Systems in 1993 for document exchange. PDF is used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system....
    ) by Mayur Patel
  • by Paul Hsieh
  • by Austin Appleby
  • by Herbert Glarner
  • Fowler, Noll, Vo Hash Function
  • Online Hash Generator (md2,md4,md5,sha1,tiger,snefru,ripemd,whirlpool,haval...)
  • Online Hash Generator with instant hash computation while typing
  • — opensource library
  • A tool to create hashes in more than 40 hashing algorithms. Available trough website or Firefox extension
  • MIT OCW lecture Video
  • MIT OCW lecture Video