Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Binary heap

Binary heap

Overview


A binary heap is a heap
Heap (data structure)
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key. This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap...

 data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 created using a binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two children. Typically the first node is known as the parent and the child nodes are called left and right. In type theory, a binary tree with nodes of type A is defined inductively as TA =...

. It can be seen as a binary tree with two additional constraints:
  • The shape property: the tree is an almost complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right.
  • The heap property: each node is greater than or equal to each of its children according to some comparison predicate which is fixed for the entire data structure.


“Greater than” means according to whatever comparison function is chosen to sort the heap, not necessarily “greater than” in the mathematical sense (since the quantities are not always numerical).
Discussion
Ask a question about 'Binary heap'
Start a new discussion about 'Binary heap'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia


A binary heap is a heap
Heap (data structure)
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key. This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap...

 data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 created using a binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two children. Typically the first node is known as the parent and the child nodes are called left and right. In type theory, a binary tree with nodes of type A is defined inductively as TA =...

. It can be seen as a binary tree with two additional constraints:
  • The shape property: the tree is an almost complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right.
  • The heap property: each node is greater than or equal to each of its children according to some comparison predicate which is fixed for the entire data structure.


“Greater than” means according to whatever comparison function is chosen to sort the heap, not necessarily “greater than” in the mathematical sense (since the quantities are not always numerical). Heaps where the comparison function is mathematical “greater than” are called max-heaps; those where the comparison function is mathematical “less than” are called “min-heaps.” Conventionally, min-heaps are used, since they are readily applicable for use in priority queue
Priority queue
A priority queue is an abstract data type in computer programming that supports the following three operations:* InsertWithPriority: add an element to the queue with an associated priority...

s.

Note that the ordering of siblings in a heap is not specified by the heap property, so the two children of a parent can be freely interchanged, as long as this does not violate the shape and heap properties (compare with treap
Treap
In computer science, the treap and the randomized binary search tree are two closely-related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys...

).

The binary heap is a special case of the d-ary heap
D-ary heap
The d-ary heap or d-heap is a generalization of the binary heap data structure whose non-leaf nodes have d children, instead of 2. Thus, a binary heap is a 2-heap.According to Jensen et al., d-ary heaps were invented by Johnson in 1975....

 in which d = 2.

It is possible to modify the heap structure to allow extraction of both the smallest and largest element in time. To do this the rows alternate between min heap and max heap. The algorithms are roughly the same, but in each step must consider the alternating rows with alternating comparisons. The performance is roughly the same as a normal single direction heap. This idea can be generalised to a min-max-median heap.

Adding to the heap


If we have a heap, and we add an element, we can perform an operation known as up-heap, bubble-up, percolate-up, sift-up, or heapify-up in order to restore the heap property. We can do this in O
Big O notation
In mathematics, computer science, and related fields, big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions...

(log n) time, using a binary heap, by following this algorithm:
  1. Add the element on the bottom level of the heap.
  2. Compare the added element with its parent; if they are in the correct order, stop.
  3. If not, swap the element with its parent and return to the previous step.


We do this at maximum for each level in the tree — the height of the tree, which is O(log n). However, since approximately 50% of the elements are leaves and 75% are in the bottom two levels, it is likely that the new element to be inserted will only move a few levels upwards to maintain the heap. Thus, binary heaps support insertion in average constant time, O(1).

Say we have a max-heap


and we want to add the number 15 to the heap. We first place the 15 in the position marked by the X. However the heap property is violated since 15 is greater than 8, so we need to swap the 15 and the 8. So, we have the heap looking as follows after the first swap:


However the heap property is still violated since 15 is greater than 11, so we need to swap again:


which is a valid max-heap.

Deleting the root from the heap


The procedure for deleting the root from the heap — effectively extracting the maximum element in a max-heap or the minimum element in a min-heap — starts by replacing it with the last element on the last level. So, if we have the same max-heap as before, we remove the 11 and replace it with the 4.


Now the heap property is violated since 8 is greater than 4. The operation that restores the property is called down-heap, bubble-down, percolate-down, sift-down, or heapify-down. In this case, swapping the two elements 4 and 8, is enough to restore the heap property and we need not swap elements further:


In general, the downward-moving node is swapped with its larger child in a max-heap (in a min-heap it would be swapped with its smaller child), until it satisfies the heap property in its new position. Note that the down-heap operation (without the preceding swap) can be used in general to modify the value of the root, even when an element is not being deleted.

Building a heap


A heap could be built by successive insertions. This approach requires time because each insertion takes time and there are n elements ('lg' denotes a binary logarithm
Binary logarithm
In mathematics, the binary logarithm is the logarithm for base 2. It is the inverse function of . The binary logarithm of n is the power to which the number 2 must be raised to obtain the value n. This makes the binary logarithm useful for anything involving powers of 2...

 here.) However this is not the optimal method. The optimal method starts by arbitrarily putting the elements on a binary tree (which could be represented by an array, see below). Then starting from the lowest level and moving upwards until the heap property is restored by shifting the root of the subtree downward as in the deletion algorithm. More specifically if all the subtrees starting at some height (measured from the bottom) have already been "heapified", the trees at height can be heapified by sending their root down (along the path of maximum children when building a max-heap, or minimum children when building a min-heap). This process takes operations (swaps). In this method most of the heapification takes place in the lower levels. The number of nodes at height is . Therefore, the cost of heapifying all subtrees is:

Heap implementation


It is perfectly acceptable to use a traditional binary tree data structure to implement a binary heap. There is an issue with finding the adjacent element on the last level on the binary heap when adding an element which can be resolved algorithmically or by adding extra data to the nodes, called “threading” the tree — that is, instead of merely storing references to the children, we store the inorder successor of the node as well.


However, a more common approach, and an approach aligned with the theory behind heaps, is to store the heap in an array. Any binary tree can be stored in an array, but because a heap is always an almost complete binary tree, it can be stored compactly. No space is required for pointers; instead, the parent and children of each node can be found by simple arithmetic on array indices. Details depend on the root position (which in turn may depend on constraints of a programming language
Programming language
A programming language is an artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine, to express algorithms precisely, or as a mode of human...

 used for implementation). If the tree root item has index 0 (n tree elements are a[0] .. a[n−1]), then for each index i, element a[i] has children a[2i+1] and a[2i+2], and the parent a[floor
Floor function
In mathematics and computer science, the floor and ceiling functions map a real number to the next smallest or next largest integer. More precisely, floor is the largest integer not greater than x and ceiling is the smallest integer not less than x.-Notation:Gauss introduced the square bracket...

((i−1)/2)], as shown in the figure. If the root is a[1] (tree elements are a[1] .. a[n]), then for each index i, element a[i] has children a[2i] and a[2i+1], and the parent a[floor((i-1)/2)]. This is a simple example of an implicit data structure
Implicit data structure
In computer science, an implicit data structure is a data structure that uses very little memory besides the actual data elements. It is called "implicit" because most of the structure of the elements is expressed implicitly by their order. Another term used interchangeably is space efficient....

.

This approach is particularly useful in the heapsort
Heapsort
Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantage of a worst-case Θ runtime. Heapsort is an in-place algorithm, but is not a stable sort.-...

 algorithm, where it allows the space in the input array to be reused to store the heap (i.e. the algorithm is in-place
In-place algorithm
In computer science, an in-place algorithm is an algorithm which transforms a data structure using a small, constant amount of extra storage space. The input is usually overwritten by the output as the algorithm executes...

). However it requires allocating the array before filling it, which makes this method not that useful in priority queues implementation, where the number of tasks (heap elements) is not necessarily known in advance.

The upheap/downheap operations can be stated then in terms of an array as follows: suppose that the heap property holds for the indices b, b+1, ..., e. The sift-down function extends the heap property to b-1, b, b+1, ..., e.
Only index i = b-1 can violate the heap property.
Let j be the index of the largest child of a[i] (for a max-heap, or the smallest child for a min-heap) within the range b, ..., e.
(If no such index exists because 2i > e then the heap property holds for the newly extended range and nothing needs to be done.)
By swapping the values a[i] and a[j] the heap property for position i is established.
At this point, the only problem is that the heap property might not hold for index j.
The sift-down function is applied tail-recursively
Tail recursion
In computer science, tail recursion is a special case of recursion in which the last operation of the function, the tail call, is a recursive call. Such recursions can be easily transformed to iterations. Replacing recursion with iteration, manually or automatically, can drastically decrease the...

 to index j until the heap property is established for all elements.

The sift-down function is fast. In each step it only needs two comparisons and one swap. The index value where it is working doubles in each iteration, so that at most log2 e steps are required.

The merge operation in a binary heap takes Θ(n) for equal-sized heaps. The best you can do is (in case of array implementation) simply concatenating the two heap arrays and build a heap of the result.http://nist.gov/dads/HTML/binaryheap.html When merging is a common task, a different heap implementation is recommended, such as binomial heaps
Binomial heap
In computer science, a binomial heap is a heap similar to a binary heap but also supporting the operation of merging two heaps quickly. This is achieved by using a special tree structure...

, which can be merged in O(log n).

External links