All Topics  
Comparison sort

 

   Email Print
   Bookmark   Link






 

Comparison sort



 
 
A comparison sort is a type of sorting algorithm
Sorting algorithm

In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
 that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator) that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator obey the three defining properties of a total order
Total order

In mathematics and set theory, a total order, linear order, simple order, or ordering is a binary relation on some Set X....
:
  1. if a = b and b = a then a = b (antisymmetry)
  2. if a = b and b = c then a = c (transitivity)
  3. a = b or b = a (totalness or trichotomy)


A metaphor for thinking about comparison sorts is that you have a set of unlabelled weights and a balance scale.






Discussion
Ask a question about 'Comparison sort'
Start a new discussion about 'Comparison sort'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A comparison sort is a type of sorting algorithm
Sorting algorithm

In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
 that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator) that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator obey the three defining properties of a total order
Total order

In mathematics and set theory, a total order, linear order, simple order, or ordering is a binary relation on some Set X....
:
  1. if a = b and b = a then a = b (antisymmetry)
  2. if a = b and b = c then a = c (transitivity)
  3. a = b or b = a (totalness or trichotomy)


A metaphor for thinking about comparison sorts is that you have a set of unlabelled weights and a balance scale. The goal is to line up the weights in order by their weight without any information except that obtained by placing two weights on the scale and seeing which one is heavier (or if they weigh the same).

Examples


Some of the most well-known comparison sorts include:
  • Quicksort
    Quicksort

    Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, average performance, makes comparisons to sort n items. However, in the Best, worst and average case, it makes comparisons....
  • Heapsort
    Heapsort

    Heapsort is a comparison sort sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantage of a worst-case big O notation runtime....
  • Merge sort
    Merge sort

    Merge sort or merge_sort is an Big O notation comparison sort sorting algorithm. In most implementations it is Sorting algorithm#Classification, meaning that it preserves the input order of equal elements in the sorted output....
  • Introsort
    Introsort

    Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with quicksort and switches to heapsort when the recursion depth exceeds a level based on the number of elements being sorted....
  • Insertion sort
    Insertion sort

    Insertion sort is a simple sorting algorithm, a comparison sort in which the sorted array is built one entry at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort....
  • Selection sort
    Selection sort

    Selection sort is a sorting algorithm, specifically an in-place algorithm comparison sort. It has Big O notation complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort....
  • Bubble sort
    Bubble sort

    Bubble sort is a simple sorting algorithm. It works by repeatedly stepping through the list to be sorted, comparing two items at a time and swapping them if they are in the wrong order....


Some examples of sorts which are not comparison sorts include:
  • Radix sort
    Radix sort

    In computer science, radix sort is a sorting algorithm created by Dr. Ian P. Turnipseed that sorts integers by processing individual digits. Because integers can represent strings of characters and specially formatted floating point numbers, radix sort is not limited to integers....
     (examines individual bits of keys)
  • Counting sort
    Counting sort

    Counting sort is a sorting algorithm which takes advantage of knowing the Range of the numbers in the array to be sorted . It uses this range to create an array C of this length....
     (indexes using key values)
  • Bucket sort
    Bucket sort

    Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of bucket s. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm....
     (examines bits of keys)


Performance limits and advantages


There are fundamental limits on the performance of comparison sorts. A comparison sort must have a lower bound of O(n log n) comparison operations in the worst case. This is a consequence of the limited information available through comparisons alone — or, to put it differently, of the vague algebraic structure of totally ordered sets. In this sense, mergesort, heapsort, and introsort are asymptotically optimal
Asymptotically optimal

In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm....
 in terms of the number of comparisons they must perform, although this metric neglects other operations. The three non-comparison sorts above achieve O(n) performance by using operations other than comparisons, allowing them to sidestep this lower bound (assuming elements are constant-sized).

Nevertheless, comparison sorts offer the notable advantage that control over the comparison function allows sorting of many different datatypes and fine control over how the list is sorted. For example, reversing the result of the comparison function allows the list to be sorted in reverse, and it's simple to sort a list of tuple
Tuple

In mathematics, a tuple is a sequence of a specific number of values, called the components of the tuple. These components can be any kind of mathematical objects, where each component of a tuple is a value of a specified type....
s in lexicographic order by just creating a comparison function that compares by each part in sequence: function tupleCompare((lefta, leftb, leftc), (righta, rightb, rightc)) if lefta ? righta return compare(lefta, righta) else if leftb ? rightb return compare(leftb, rightb) else return compare(leftc, rightc)

Balanced ternary
Balanced ternary

Balanced ternary is a Non-standard positional numeral systems , useful for comparison logic. It is a Ternary numeral system system, but unlike the standard ternary system, the digits have the values -1, 0, and 1....
 notation allows comparisons to be made in one step, whose result will be one of "less than", "greater than" or "equal to".

Comparison sorts generally adapt more easily to complex orders such as the order of floating-point numbers. Additionally, once a comparison function is written, any comparison sort can be used without modification; non-comparison sorts typically require specialized versions for each datatype.

This flexibility, together with the efficiency of the above comparison sorting algorithms on modern computers, has led to widespread preference for comparison sorts in most practical work.

Number of comparisons required to sort a list


The number of comparisons that a comparison sort algorithm requires increases in proportion to , where is the number of elements to sort. This bound is asymptotically tight:

Given a list of distinct numbers (we can assume this because this is a worst-case analysis), there are n factorial
Factorial

In mathematics, the factorial of a negative and non-negative numbers integer n, denoted by n!, is the Product of all positive integers less than or equal to n....
 permutations exactly one of which is the list in sorted order. The sort algorithm must gain enough information from the comparisons to identify the correct permutations. If the algorithm always completes after at most f(n) steps, it cannot distinguish more than 2f(n) cases because the keys are distinct and each comparison has only two possible outcomes. Therefore, , or equivalently . From Stirling's approximation
Stirling's approximation

In mathematics, Stirling's approximation is an approximation for large factorials. It is named after James Stirling .The formula is written as...
 we know that is . This provides the lower-bound part of the claim.

An identical upper bound follows from the existence of the algorithms that attain this bound in the worst case.

The above argument provides an absolute, rather than only asymptotic lower bound on the number of comparisons, namely comparisons. This lower bound is fairly good (it can be approached within a linear tolerance by a simple merge sort), but it is known to be inexact. For example, , but the minimal number of comparisons to sort 13 elements has been proved to be 34 .

Determining the exact number of comparisons needed to sort a given number of entries is a computationally hard problem even for small n, and no simple formula for the solution is known. For some of the few concrete values that have been computed, see .

Lower bound for the average number of comparisons

A similar bound applies to the average number of comparisons. Assuming that
  • all keys are distinct, i.e. every comparison will give either a>b or a, and
  • the input is a random permutation, chosen uniformly from the set of all possible permutations of n elements,
it is impossible to determine which order the input is in with fewer than log2(n!) comparisons on average.

This can be most easily seen using concepts from information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
. The Shannon entropy of such a random permutation is log2(n!) bits. Since a comparison can give only two results, the maximum amount of information it provides is 1 bit. Therefore after k comparisons the remaining entropy of the permutation, given the results of those comparisons, is at least log2(n!) - k bits on average. To perform the sort, complete information is needed, so the remaining entropy must be 0. It follows that k must be at least log2(n!).

Note that this differs from the worst case argument given above, in that it does not allow rounding up to the nearest integer. For example, for n = 3, the lower bound for the worst case is 3, the lower bound for the average case as shown above is approximately 2.58, while the highest lower bound for the average case is 8/3, approximately 2.67.

In the case that multiple items may have the same key, there is no obvious statistical interpretation for the term "average case", so an argument like the above cannot be applied without making specific assumptions about the distribution of keys.