All Topics  
Heapsort

 
Heapsort

   Email Print
   Bookmark   Link






 

Heapsort



 
 
Heapsort (method) is a comparison-based
Comparison sort

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list....
 sorting algorithm
Sorting algorithm

In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
, and is part of the selection sort
Selection sort

Selection sort is a sorting algorithm, specifically an in-place algorithm comparison sort. It has Big O notation complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort....
 family. Although somewhat slower in practice on most machines than a good implementation of quicksort
Quicksort

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, average performance, makes comparisons to sort n items. However, in the Best, worst and average case, it makes comparisons....
, it has the advantage of a worst-case T
Big O notation

In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
(n log n) runtime. Heapsort is an in-place algorithm
In-place algorithm

In computer science, an in-place algorithm is an algorithm which transforms a data structure using a small, constant amount of extra storage space....
, but is not a stable sort.

heap sort works as its name suggests - it begins by building a heap
Heap (data structure)

In computer science, a heap is a specialized tree data structure-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key....
 out of the data set, and then removing the largest item and placing it at the end of the sorted array.






Discussion
Ask a question about 'Heapsort'
Start a new discussion about 'Heapsort'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Heapsort (method) is a comparison-based
Comparison sort

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list....
 sorting algorithm
Sorting algorithm

In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
, and is part of the selection sort
Selection sort

Selection sort is a sorting algorithm, specifically an in-place algorithm comparison sort. It has Big O notation complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort....
 family. Although somewhat slower in practice on most machines than a good implementation of quicksort
Quicksort

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, average performance, makes comparisons to sort n items. However, in the Best, worst and average case, it makes comparisons....
, it has the advantage of a worst-case T
Big O notation

In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
(n log n) runtime. Heapsort is an in-place algorithm
In-place algorithm

In computer science, an in-place algorithm is an algorithm which transforms a data structure using a small, constant amount of extra storage space....
, but is not a stable sort.

Overview

The heap sort works as its name suggests - it begins by building a heap
Heap (data structure)

In computer science, a heap is a specialized tree data structure-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key....
 out of the data set, and then removing the largest item and placing it at the end of the sorted array. After removing the largest item, it reconstructs the heap and removes the largest remaining item and places it in the next open position from the end of the sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements.

Heapsort inserts the input list elements into a heap
Binary heap

A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:*The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one are fully filled, and, if the last level of the tree is not complete, the node...
 data structure. The largest value (in a max-heap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.

During extraction, the only space required is that needed to store the heap. In order to achieve constant space overhead, the heap is stored in the part of the input array that has not yet been sorted. (The structure of this heap is described at Binary heap: Heap implementation
Binary heap

A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:*The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one are fully filled, and, if the last level of the tree is not complete, the node...
.)

Heapsort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements.

Variations

  • The most important variation to the simple variant is an improvement by R. W. Floyd
    Robert Floyd

    Robert W Floyd was an eminent computer scientist.Born in New York, Floyd finished school at age 14. At the University of Chicago, he received a Bachelor's degree in liberal arts in 1953 and a second Bachelor's degree in physics in 1958....
     which gives in practice about 25% speed improvement by using only one comparison in each siftup
    Binary heap

    A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:*The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one are fully filled, and, if the last level of the tree is not complete, the node...
     run which then needs to be followed by a siftdown
    Binary heap

    A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:*The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one are fully filled, and, if the last level of the tree is not complete, the node...
     for the original child; moreover it is more elegant to formulate. Heapsort's natural way of indexing works on indices from 1 up to the number of items. Therefore the start address of the data should be shifted such that this logic can be implemented avoiding unnecessary +/- 1 offsets in the coded algorithm.


  • Ternary heapsort uses a ternary heap instead of a binary heap; that is, each element in the heap has three children. It is more complicated to program, but does a constant number of times fewer swap and comparison operations. This is because each step in the shift operation of a ternary heap requires three comparisons and one swap, whereas in a binary heap two comparisons and one swap are required. The ternary heap does two steps in less time than the binary heap requires for three steps, which multiplies the index by a factor of 9 instead of the factor 8 of three binary steps. Ternary heapsort is about 12% faster than the simple variant of binary heapsort.


  • The smoothsort algorithm is a variation of heapsort developed by Edsger Dijkstra
    Edsger Dijkstra

    Edsger Wybe Dijkstra was a Netherlands computer science. He received the 1972 Turing Award for fundamental contributions in the area of programming languages, and was the Schlumberger Centennial Chair of Computer Sciences at University of Texas at Austin from 1984 until 2000....
     in 1981. Like heapsort, smoothsort's upper bound is O
    Big O notation

    In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
    (n log n). The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree
    Adaptive sort

    A sorting algorithm falls into the adaptive sort family if it takes advantage of existing order in its input. It benefits from the presortedness in the input sequence ? or a limited amount of randomness for various definitions of measures of disorder ? and sorts faster....
    , whereas heapsort averages O(n log n) regardless of the initial sorted state. Due to its complexity, smoothsort is rarely used.


Comparison with other sorts

Heapsort primarily competes with quicksort
Quicksort

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, average performance, makes comparisons to sort n items. However, in the Best, worst and average case, it makes comparisons....
, another very efficient general purpose nearly-in-place comparison-based sort algorithm.

Quicksort is typically somewhat faster, due to better cache behavior and other factors, but the worst-case running time for quicksort is O
Big O notation

In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
(n2), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. See quicksort
Quicksort

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, average performance, makes comparisons to sort n items. However, in the Best, worst and average case, it makes comparisons....
 for a detailed discussion of this problem, and possible solutions.

Thus, because of the O(n log n) upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort.

Heapsort also competes with merge sort
Merge sort

Merge sort or merge_sort is an Big O notation comparison sort sorting algorithm. In most implementations it is Sorting algorithm#Classification, meaning that it preserves the input order of equal elements in the sorted output....
, which has the same time bounds, but requires O(n) auxiliary space, whereas heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort:
  • Like quicksort, merge sort on arrays has considerably better data cache performance, often outperforming heapsort on a modern desktop PC, because it accesses the elements in order.
  • Merge sort is a stable sort.
  • Merge sort parallelizes better
    Parallel algorithm

    In computer science, a parallel algorithm, as opposed to a traditional sequential algorithm, is one which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result....
    ; the most trivial way of parallelizing merge sort achieves close to linear speedup, while there is no obvious way to parallelize heapsort at all.
  • Merge sort can be easily adapted to operate on linked list
    Linked list

    In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of node s, each containing arbitrary data Field s and one or two reference s pointing to the next and/or previous nodes....
    s and very large lists stored on slow-to-access media such as disk storage
    Disk storage

    Disk storage is a general category of a computer storage mechanisms, in which data is recorded on planar, round and rotating surfaces . A disk drive is a peripheral device used to record and retrieve information....
     or network attached storage. Heapsort relies strongly on random access
    Random access

    In computer science, random access is the ability to access an arbitrary element of a sequence in equal time. The opposite is sequential access, where a remote element takes longer time to access....
    , and its poor locality of reference
    Locality of reference

    In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related computer storage locations being frequently accessed....
     makes it very slow on media with long access times.


An interesting alternative to Heapsort is Introsort
Introsort

Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with quicksort and switches to heapsort when the recursion depth exceeds a level based on the number of elements being sorted....
 which combines quicksort and heapsort to retain advantages of both: worst case speed of heapsort and average speed of quicksort.

Pseudocode


The following is the "simple" way to implement the algorithm, in pseudocode
Pseudocode

Pseudocode is a compact and informal high-level description of a computer programming algorithm that uses the structural conventions of some programming language, but is intended for human reading rather than machine reading....
, where swap is used to swap two elements of the array. Notice that the arrays are zero based in this example. The heapify function can be thought of as building a heap from the bottom up, successively shifting downward to establish the heap property. An alternative version (shown below) that builds the heap top-down and shifts upward is conceptually simpler to grasp. This "siftUp" version can be visualized as starting with an empty heap and successively inserting elements. However, it is asymptotically slower: the "siftDown" version is O(n), and the "siftUp" version is O(n log n) in the worst case. The heapsort algorithm is O(n log n) overall using either version of heapify.

External links

  • – graphical demonstration and discussion of heap sort
  • which allows experimentation with initial state and shows statistics