Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Comparison sort

Comparison sort

Overview

A comparison sort is a type of sorting algorithm
Sorting algorithm
In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

 that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator) that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator obey two of the properties of a total order
Total order
In mathematics and set theory, a total order, linear order, simple order, or ordering is a binary relation on some set X. The relation is transitive, antisymmetric, and total...

:
  1. if ab and bc then ac (transitivity)
  2. for all a and b, either ab or ba (totalness or trichotomy).


It is possible that both ab and ba; in this case either may come first in the sorted list.
Discussion
Ask a question about 'Comparison sort'
Start a new discussion about 'Comparison sort'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia

A comparison sort is a type of sorting algorithm
Sorting algorithm
In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

 that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator) that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator obey two of the properties of a total order
Total order
In mathematics and set theory, a total order, linear order, simple order, or ordering is a binary relation on some set X. The relation is transitive, antisymmetric, and total...

:
  1. if ab and bc then ac (transitivity)
  2. for all a and b, either ab or ba (totalness or trichotomy).


It is possible that both ab and ba; in this case either may come first in the sorted list. In a stable sort, the input order determines the sorted order in this case.

A metaphor for thinking about comparison sorts is that you have a set of unlabelled weights and a balance scale. The goal is to line up the weights in order by their weight without any information except that obtained by placing two weights on the scale and seeing which one is heavier (or if they weigh the same).

Examples


Some of the most well-known comparison sorts include:
  • Quicksort
    Quicksort
    Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes comparisons to sort n items. However, in the worst case, it makes comparisons...

  • Heapsort
    Heapsort
    Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantage of a worst-case Θ runtime. Heapsort is an in-place algorithm, but is not a stable sort.-...

  • Merge sort
    Merge sort
    Merge sort is an O comparison-based sorting algorithm. In most implementations it is stable, meaning that it preserves the input order of equal elements in the sorted output. It is an example of the divide and conquer algorithmic paradigm...

  • Introsort
    Introsort
    Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with quicksort and switches to heapsort when the recursion depth exceeds a level based on the number of elements being sorted...

  • Insertion sort
    Insertion sort
    Insertion sort is a simple sorting algorithm, a comparison sort in which the sorted array is built one entry at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:* simple...

  • Selection sort
    Selection sort
    Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort...

  • Bubble sort
    Bubble sort
    Bubble sort is a simple sorting algorithm. It works by repeatedly stepping through the list to be sorted, comparing each pair of adjacent items and swapping them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is...



Some examples of sorts which are not comparison sorts include:
  • Radix sort
    Radix sort
    In computer science, radix sort is a sorting algorithm that sorts integers by processing individual digits, by comparing individual digits sharing the same significant position. A positional notation is required, but because integers can represent strings of characters and specially formatted...

     (examines individual bits of keys)
  • Counting sort
    Counting sort
    Counting sort is a sorting algorithm which takes advantage of knowing the range of the numbers in the array to be sorted . It uses this range to create an array C of this length...

     (indexes using key values)
  • Bucket sort
    Bucket sort
    Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a cousin of radix sort in the most to...

     (examines bits of keys)

Performance limits and advantages of different sorting techniques


There are fundamental limits on the performance of comparison sorts. A comparison sort must have a lower bound of Ω(n log n) comparison operations. This is a consequence of the limited information available through comparisons alone — or, to put it differently, of the vague algebraic structure of totally ordered sets. In this sense, mergesort, heapsort, and introsort are asymptotically optimal
Asymptotically optimal
In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm...

 in terms of the number of comparisons they must perform, although this metric neglects other operations. The three non-comparison sorts above achieve O(n) performance by using operations other than comparisons, allowing them to sidestep this lower bound (assuming elements are constant-sized).

Nevertheless, comparison sorts offer the notable advantage that control over the comparison function allows sorting of many different datatypes and fine control over how the list is sorted. For example, reversing the result of the comparison function allows the list to be sorted in reverse; for instance, one can sort a list of tuple
Tuple
In mathematics and computer science a tuple captures the intuitive notion of an ordered list of elements. Depending on the mathematical foundation chosen, the formal notion differs slightly. In set theory, an n-tuple is a sequence of n elements, where n is a...

s in lexicographic order by just creating a comparison function that compares each part in sequence:
function tupleCompare((lefta, leftb, leftc), (righta, rightb, rightc))
if lefta ≠ righta
return compare(lefta, righta)
else if leftb ≠ rightb
return compare(leftb, rightb)
else
return compare(leftc, rightc)

Balanced ternary
Balanced ternary
Balanced ternary is a non-standard positional numeral system , useful for comparison logic. It is a ternary system, but unlike the standard ternary system, the digits have the values −1, 0, and 1...

 notation allows comparisons to be made in one step, whose result will be one of "less than", "greater than" or "equal to".

Comparison sorts generally adapt more easily to complex orders such as the order of floating-point numbers. Additionally, once a comparison function is written, any comparison sort can be used without modification; non-comparison sorts typically require specialized versions for each datatype.

This flexibility, together with the efficiency of the above comparison sorting algorithms on modern computers, has led to widespread preference for comparison sorts in most practical work.

Number of comparisons required to sort a list


The number of comparisons that a comparison sort algorithm requires increases in proportion to , where is the number of elements to sort. This bound is asymptotically tight:

Given a list of distinct numbers (we can assume this because this is a worst-case analysis), there are n factorial
Factorial
In mathematics, the factorial of a non-negative integer n, denoted by n!, is the product of all positive integers less than or equal to n...

 permutations exactly one of which is the list in sorted order. The sort algorithm must gain enough information from the comparisons to identify the correct permutations. If the algorithm always completes after at most f(n) steps, it cannot distinguish more than 2f(n) cases because the keys are distinct and each comparison has only two possible outcomes. Therefore,, or equivalently
From Stirling's approximation
Stirling's approximation
In mathematics, Stirling's approximation is an approximation for large factorials. It is named after James Stirling.The formula as typically used in applications is...

 we know that is . This provides the lower-bound part of the claim.

An identical upper bound follows from the existence of the algorithms that attain this bound in the worst case.

The above argument provides an absolute, rather than only asymptotic lower bound on the number of comparisons, namely comparisons. This lower bound is fairly good (it can be approached within a linear tolerance by a simple merge sort), but it is known to be inexact. For example, , but the minimal number of comparisons to sort 13 elements has been proved to be 34
.

Determining the exact number of comparisons needed to sort a given number of entries is a computationally hard problem even for small n, and no simple formula for the solution is known. For some of the few concrete values that have been computed, see .

Lower bound for the average number of comparisons


A similar bound applies to the average number of comparisons. Assuming that
  • all keys are distinct, i.e. every comparison will give either a>b or a, and
  • the input is a random permutation, chosen uniformly from the set of all possible permutations of n elements,

it is impossible to determine which order the input is in with fewer than log2(n!) comparisons on average.

This can be most easily seen using concepts from information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E. Shannon to find fundamental limits on compressing and reliably storing and communicating data...

. The Shannon entropy of such a random permutation is log2(n!) bits. Since a comparison can give only two results, the maximum amount of information it provides is 1 bit. Therefore after k comparisons the remaining entropy of the permutation, given the results of those comparisons, is at least log2(n!) - k bits on average. To perform the sort, complete information is needed, so the remaining entropy must be 0. It follows that k must be at least log2(n!).

Note that this differs from the worst case argument given above, in that it does not allow rounding up to the nearest integer. For example, for n = 3, the lower bound for the worst case is 3, the lower bound for the average case as shown above is approximately 2.58, while the highest lower bound for the average case is 8/3, approximately 2.67.

In the case that multiple items may have the same key, there is no obvious statistical interpretation for the term "average case", so an argument like the above cannot be applied without making specific assumptions about the distribution of keys.