All Topics  
Hash join

 

   Email Print
   Bookmark   Link






 

Hash join



 
 
The Hash join is an example of a join algorithm
Join (SQL)

An SQL JOIN clause combines records from two table s in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each....
 and is used in the implementation of a relational
Relational database

A relational database is a database that groups data using common attributes found in the data set. The resulting "clumps" of organized data are much easier for people to understand....
 database management system
Database management system

A database management system is computer software that manages databases. DBMSes may use any of a variety of database models, such as the network model or relational model....
.

The task of a join algorithm is to find, for each distinct value of the join attribute, the set of tuples
Tuple

In mathematics, a tuple is a sequence of a specific number of values, called the components of the tuple. These components can be any kind of mathematical objects, where each component of a tuple is a value of a specified type....
 in each relation which have that value.

The classic hash join algorithm for an inner join
Join (SQL)

An SQL JOIN clause combines records from two table s in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each....
 of two relations proceeds as follows: first prepare a hash table
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
 for the smaller relation, by applying a hash function
Hash function

A hash function is any algorithm or function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an array index into an array....
 to the join attribute of each row.






Discussion
Ask a question about 'Hash join'
Start a new discussion about 'Hash join'
Answer questions from other users
Full Discussion Forum



Encyclopedia


The Hash join is an example of a join algorithm
Join (SQL)

An SQL JOIN clause combines records from two table s in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each....
 and is used in the implementation of a relational
Relational database

A relational database is a database that groups data using common attributes found in the data set. The resulting "clumps" of organized data are much easier for people to understand....
 database management system
Database management system

A database management system is computer software that manages databases. DBMSes may use any of a variety of database models, such as the network model or relational model....
.

The task of a join algorithm is to find, for each distinct value of the join attribute, the set of tuples
Tuple

In mathematics, a tuple is a sequence of a specific number of values, called the components of the tuple. These components can be any kind of mathematical objects, where each component of a tuple is a value of a specified type....
 in each relation which have that value.

The classic hash join algorithm for an inner join
Join (SQL)

An SQL JOIN clause combines records from two table s in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each....
 of two relations proceeds as follows: first prepare a hash table
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
 for the smaller relation, by applying a hash function
Hash function

A hash function is any algorithm or function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an array index into an array....
 to the join attribute of each row. Then scan the larger relation and find the relevant rows by looking in the hash table
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
. The first phase is usually called the "build" phase, while the second is called the "probe" phase. Similarly, the join relation on which the hash table is built is called the "build" input, whereas the other input is called the "probe" input. [GG] This algorithm is simple, but it requires that the smaller join relation fits into memory, which is sometimes not the case. A simple approach to handling this situation proceeds as follows:

  1. For each tuple in the build input
    1. Add to the in-memory hash table
    2. If the size of the hash table equals the maximum in-memory size:
      1. Scan the probe input , and add matching join tuples to the output relation
      2. Reset the hash table
  2. Do a final scan of the probe input and add the resulting join tuples to the output relation


(This is essentially the same as the block nested loop
Block nested loop

A block-nested loop is an algorithm used to join two relations in a relational database.This algorithm is a variation on the simple nested loop join used to join two relations and ....
 join algorithm.) This algorithm scans more times than necessary. A better approach is known as the "grace hash join", after the GRACE database machine for which it was first implemented. This algorithm avoids rescanning the entire relation by first partitioning both and via a hash function, and writing these partitions out to disk. The algorithm then loads pairs of partitions into memory, builds a hash table for the smaller partitioned relation, and probes the other relation for matches with the current hash table. Because the partitions were formed by hashing on the join key, it must be the case that any join output tuples must belong to the same partition. It is possible that one or more of the partitions still does not fit into the available memory, in which case the algorithm is recursively applied: an additional orthogonal hash function is chosen to hash the large partition into sub-partitions, which are then processed as before. Since this is expensive, the algorithm tries to reduce the chance that it will occur by forming as many partitions as possible during the initial partitioning phase.

The hybrid hash join algorithm is a refinement of the grace hash join which takes advantage of more available memory. During the partitioning phase, the hybrid hash join uses the available memory for two purposes:
  1. To hold the current output buffer page for each of the partitions
  2. To hold an entire partition in-memory, known as "partition 0"
Because partition 0 is never written to or read from disk, the hybrid hash join typically performs fewer I/O operations than the grace hash join. Note that this algorithm is memory-sensitive, because there are two competing demands for memory (the hash table for partition 0, and the output buffers for the remaining partitions). Choosing too large a hash table might cause the algorithm to recurse because one of the non-zero partitions is too large to fit into memory.

Hash joins require an equi-join predicate (a predicate comparing values from one table with values from the other table using the equals operator '='). Hash joins can also be evaluated for an anti-join predicate (a predicate selecting values from one table when no related values are found in the other). Applying this algorithm proceeds as follows: first prepare a hash table
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
 for the 'not in' side of the join. Then scan the other table, selecting any rows where the join attribute hashes to an empty entry in the hash table
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
.

External links