All Topics  
Quicksort

 

   Email Print
   Bookmark   Link






 

Quicksort



 
 
Quicksort is a well-known sorting algorithm
Sorting algorithm

In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
 developed by C. A. R. Hoare
C. A. R. Hoare

Sir Charles Antony Richard Hoare , commonly known as Tony Hoare or C.A.R. Hoare, is a United Kingdom computer science, probably best known for the development in 1960 of Quicksort , one of the world's most widely used sorting algorithms....
 that, on average, makes (big O notation
Big O notation

In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
) comparisons to sort n items. However, in the worst case
Best, worst and average case

In computer science, best, worst and average cases of a given algorithm express what the Resource usage is at least, at most and on average, respectively....
, it makes comparisons. Typically, quicksort is significantly faster in practice than other algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data, it is possible to make design choices which minimize the probability of requiring quadratic time.

Quicksort is a comparison sort
Comparison sort

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list....
 and, in efficient implementations, is not a stable sort
Sorting algorithm

In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
.

quicksort algorithm was developed by C. A. R. Hoare
C. A. R. Hoare

Sir Charles Antony Richard Hoare , commonly known as Tony Hoare or C.A.R. Hoare, is a United Kingdom computer science, probably best known for the development in 1960 of Quicksort , one of the world's most widely used sorting algorithms....
 in 1962 while working for the small British scientific computer manufacturer Elliott Brothers
Elliott Brothers (computer company)

Elliott Brothers Ltd was an early computer company of the 1950s–60s in the United Kingdom, tracing its descent from a firm of instrument makers founded by William Elliott in London around 1804....
.

  • Pick an element, called a pivot
    Pivot element

    The pivot or pivot element is the element of a Matrix , which is selected first by an algorithm , to do certain calculations with the matrix....
    , from the list.
  • Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it (equal values can go either way).






    Discussion
    Ask a question about 'Quicksort'
    Start a new discussion about 'Quicksort'
    Answer questions from other users
    Full Discussion Forum



    Recent Posts









    Encyclopedia


    Quicksort is a well-known sorting algorithm
    Sorting algorithm

    In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
     developed by C. A. R. Hoare
    C. A. R. Hoare

    Sir Charles Antony Richard Hoare , commonly known as Tony Hoare or C.A.R. Hoare, is a United Kingdom computer science, probably best known for the development in 1960 of Quicksort , one of the world's most widely used sorting algorithms....
     that, on average, makes (big O notation
    Big O notation

    In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
    ) comparisons to sort n items. However, in the worst case
    Best, worst and average case

    In computer science, best, worst and average cases of a given algorithm express what the Resource usage is at least, at most and on average, respectively....
    , it makes comparisons. Typically, quicksort is significantly faster in practice than other algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data, it is possible to make design choices which minimize the probability of requiring quadratic time.

    Quicksort is a comparison sort
    Comparison sort

    A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list....
     and, in efficient implementations, is not a stable sort
    Sorting algorithm

    In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
    .

    History

    The quicksort algorithm was developed by C. A. R. Hoare
    C. A. R. Hoare

    Sir Charles Antony Richard Hoare , commonly known as Tony Hoare or C.A.R. Hoare, is a United Kingdom computer science, probably best known for the development in 1960 of Quicksort , one of the world's most widely used sorting algorithms....
     in 1962 while working for the small British scientific computer manufacturer Elliott Brothers
    Elliott Brothers (computer company)

    Elliott Brothers Ltd was an early computer company of the 1950s–60s in the United Kingdom, tracing its descent from a firm of instrument makers founded by William Elliott in London around 1804....
    .

    Algorithm


    Quicksort sorts by employing a divide and conquer
    Divide and conquer algorithm

    In computer science, divide and conquer is an important algorithm design paradigm based on multi-branched recursion. A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same type, until these become simple enough to be solved directly....
     strategy to divide a list
    List (computing)

    In computer science, a list is an ordered Multiset of entity/items.In the context of object-oriented programming languages, a list is defined as an instance of an abstract data type , formalizing the concept of an order theoryed Collection class of entity....
     into two sub-lists.

    The steps are:
    1. Pick an element, called a pivot
      Pivot element

      The pivot or pivot element is the element of a Matrix , which is selected first by an algorithm , to do certain calculations with the matrix....
      , from the list.
    2. Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
    3. Recursively
      Recursion (computer science)

      Recursion is a way of thinking about and solving problems. In fact, Recursion_ is one of the central ideas of computer science. Solving a problem using recursion means the solution depends on solutions to smaller instances of the same problem....
       sort the sub-list of lesser elements and the sub-list of greater elements.


    The base case of the recursion are lists of size zero or one, which are always sorted.

    In simple pseudocode
    Pseudocode

    Pseudocode is a compact and informal high-level description of a computer programming algorithm that uses the structural conventions of some programming language, but is intended for human reading rather than machine reading....
    , the algorithm might be expressed as this:

    Notice that we only examine elements by comparing them to other elements. This makes quicksort a comparison sort
    Comparison sort

    A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list....
    . This version is also a stable sort (considering that the "for each" method retrieves elements in original order, and the pivot selected is the last among those of equal value).

    The correctness of the partition algorithm is based on the following two arguments:
    • At each iteration, all the elements processed so far are in the desired position: before the pivot if less than or equal to the pivot's value, after the pivot otherwise (loop invariant).
    • Each iteration leaves one fewer element to be processed (loop variant).


    The correctness of the overall algorithm follows from inductive reasoning: for zero or one element, the algorithm leaves the data unchanged; for a larger data set it produces the concatenation of two parts, elements less than or equal to the pivot and elements greater than it, themselves sorted by the recursive hypothesis.

    The disadvantage of the simple version above is that it requires extra storage space, which is as bad as merge sort
    Merge sort

    Merge sort or merge_sort is an Big O notation comparison sort sorting algorithm. In most implementations it is Sorting algorithm#Classification, meaning that it preserves the input order of equal elements in the sorted output....
    . The additional memory allocations required can also drastically impact speed and cache performance in practical implementations. There is a more complex version which uses an in-place partition algorithm and can achieve the complete sort using space use on average (for the call stack
    Call stack

    In computer science, a call stack is a dynamic Stack data structure that stores information about the active subroutines of a computer program....
    ):

    Partition Example
    This is the in-place partition algorithm. It partitions the portion of the array between indexes left and right, inclusively, by moving all elements less than or equal to a[pivotIndex] to the beginning of the subarray, leaving all the greater elements following them. In the process it also finds the final position for the pivot element, which it returns. It temporarily moves the pivot element to the end of the subarray, so that it doesn't get in the way. Because it only uses exchanges, the final list has the same elements as the original list. Notice that an element may be exchanged multiple times before reaching its final place.

    This form of the partition algorithm is not the original form; multiple variations can be found in various textbooks, such as versions not having the storeIndex. However, this form is probably the easiest to understand.

    Once we have this, writing quicksort itself is easy:

    However, since partition reorders elements within a partition, this version of quicksort is not a stable sort.

    Parallelizations


    Like merge sort
    Merge sort

    Merge sort or merge_sort is an Big O notation comparison sort sorting algorithm. In most implementations it is Sorting algorithm#Classification, meaning that it preserves the input order of equal elements in the sorted output....
    , quicksort can also be easily parallelized
    Parallel algorithm

    In computer science, a parallel algorithm, as opposed to a traditional sequential algorithm, is one which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result....
     due to its divide-and-conquer nature. Individual in-place partition operations are difficult to parallelize, but once divided, different sections of the list can be sorted in parallel. If we have processors, we can divide a list of elements into sublists in average time, then sort each of these in average time. Ignoring the preprocessing, this is linear speedup. Given processors, only time is required overall.

    One advantage of parallel quicksort over other parallel sort algorithms is that no synchronization is required. A new thread is started as soon as a sublist is available for it to work on and it does not communicate with other threads. When all threads complete, the sort is done.

    Other more sophisticated parallel sorting algorithms can achieve even better time bounds. For example, in 1991 David Powers described a parallelized quicksort that can operate in time given enough processors by performing partitioning implicitly.

    Formal analysis


    From the initial description it's not obvious that quicksort takes time on average. It's not hard to see that the partition operation, which simply loops over the elements of the array once, uses time. In versions that perform concatenation, this operation is also .

    In the best case, each time we perform a partition we divide the list into two nearly equal pieces. This means each recursive call processes a list of half the size. Consequently, we can make only nested calls before we reach a list of size 1. This means that the depth of the call tree
    Call stack

    In computer science, a call stack is a dynamic Stack data structure that stores information about the active subroutines of a computer program....
     is . But no two calls at the same level of the call tree process the same part of the original list; thus, each level of calls needs only time all together (each call has some constant overhead, but since there are only calls at each level, this is subsumed in the factor). The result is that the algorithm uses only time.

    An alternate approach is to set up a recurrence relation
    Recurrence relation

    In mathematics, a recurrence relation is an equation that defines a sequence recursion: each term of the sequence is defined as a Function of the preceding terms....
     for the factor, the time needed to sort a list of size . Because a single quicksort call involves factor work plus two recursive calls on lists of size in the best case, the relation would be:

    The master theorem
    Master theorem

    In the analysis of algorithms, the master theorem, which is a specific case of the Akra-Bazzi method, provides a cookbook solution in asymptotic terms for recurrence relations of types that occur in practice....
     tells us that .

    In fact, it's not necessary to divide the list this precisely; even if each pivot splits the elements with 99% on one side and 1% on the other (or any other fixed fraction), the call depth is still limited to , so the total running time is still .

    In the worst case, however, the two sublists have size 1 and (for example, if the array consists of the same element by value), and the call tree becomes a linear chain of nested calls. The th call does work, and . The recurrence relation is:

    This is the same relation as for insertion sort
    Insertion sort

    Insertion sort is a simple sorting algorithm, a comparison sort in which the sorted array is built one entry at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort....
     and selection sort
    Selection sort

    Selection sort is a sorting algorithm, specifically an in-place algorithm comparison sort. It has Big O notation complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort....
    , and it solves to . Given knowledge of which comparisons are performed by the sort, there are adaptive algorithms that are effective at generating worst-case input for quicksort on-the-fly, regardless of the pivot selection strategy.

    Randomized quicksort expected complexity


    Randomized quicksort has the desirable property that it requires only expected
    Expected value

    In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
     time, regardless of the input. But what makes random pivots a good choice?

    Suppose we sort the list and then divide it into four parts. The two parts in the middle will contain the best pivots; each of them is larger than at least 25% of the elements and smaller than at least 25% of the elements. If we could consistently choose an element from these two middle parts, we would only have to split the list at most times before reaching lists of size 1, yielding an algorithm.

    A random choice will only choose from these middle parts half the time. However, this is good enough. Imagine that you are flipping a coin over and over until you get heads. Although this could take a long time, on average only flips are required, and the chance that you won't get heads after flips is highly improbable. By the same argument, quicksort's recursion will terminate on average at a call depth of only . But if its average call depth is , and each level of the call tree processes at most elements, the total amount of work done on average is the product, . Note that the algorithm does not have to verify that the pivot is in the middle half - if we hit it any constant fraction of the times, that is enough for the desired complexity.

    The outline of a formal proof of the expected time complexity follows. Assume that there are no duplicates as duplicates could be handled with linear time pre- and post-processing, or considered cases easier than the analyzed. Choosing a pivot, uniformly at random from to , is then equivalent to choosing the size of one particular partition, uniformly at random from to . With this observation, the continuation of the proof is analogous to the one given in the average complexity section.

    Average complexity


    Even if pivots aren't chosen randomly, quicksort still requires only time over all possible permutations of its input. Because this average is simply the sum of the times over all permutations of the input divided by factorial, it's equivalent to choosing a random permutation of the input. When we do this, the pivot choices are essentially random, leading to an algorithm with the same running time as randomized quicksort.

    More precisely, the average number of comparisons over all permutations of the input sequence can be estimated accurately by solving the recurrence relation:

    Here, is the number of comparisons the partition uses. Since the pivot is equally likely to fall anywhere in the sorted list order, the sum is averaging over all possible splits.

    This means that, on average, quicksort performs only about 39% worse than the ideal number of comparisons, which is its best case. In this sense it is closer to the best case than the worst case. This fast average runtime is another reason for quicksort's practical dominance over other sorting algorithms.

    Space complexity


    The space used by quicksort depends on the version used.

    Quicksort has a space complexity of , even in the worst case, when it is carefully implemented such that
    • in-place partitioning is used. This requires .
    • After partitioning, the partition with the fewest elements is (recursively) sorted first, requiring at most space. Then the other partition is sorted using tail recursion
      Tail recursion

      In computer science, tail recursion is a special case of Recursion_ in which the last operation of the function, the tail call, is a recursive call....
       or iteration. (This idea is commonly attributed to R.Sedgewick )


    The version of quicksort with in-place partitioning uses only constant additional space before making any recursive call. However, if it has made nested recursive calls, it needs to store a constant amount of information from each of them. Since the best case makes at most nested recursive calls, it uses space. The worst case makes nested recursive calls, and so needs space; Sedgewick's improved version using tail recursion requires space in the worst case.

    We are eliding a small detail here, however. If we consider sorting arbitrarily large lists, we have to keep in mind that our variables like left and right can no longer be considered to occupy constant space; it takes bits to index into a list of items. Because we have variables like this in every stack frame, in reality quicksort requires bits of space in the best and average case and space in the worst case. This isn't too terrible, though, since if the list contains mostly distinct elements, the list itself will also occupy bits of space.

    The not-in-place version of quicksort uses space before it even makes any recursive calls. In the best case its space is still limited to , because each level of the recursion uses half as much space as the last, and

    Its worst case is dismal, requiring

    space, far more than the list itself. If the list elements are not themselves constant size, the problem grows even larger; for example, if most of the list elements are distinct, each would require about bits, leading to a best-case and worst-case space requirement.

    Selection-based pivoting

    A selection algorithm
    Selection algorithm

    In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list This includes the cases of finding the minimum, maximum, and median elements....
     chooses the kth smallest of a list of numbers; this is an easier problem in general than sorting. One simple but effective selection algorithm works nearly in the same manner as quicksort, except that instead of making recursive calls on both sublists, it only makes a single tail-recursive call on the sublist which contains the desired element. This small change lowers the average complexity to linear or time, and makes it an in-place algorithm
    In-place algorithm

    In computer science, an in-place algorithm is an algorithm which transforms a data structure using a small, constant amount of extra storage space....
    . A variation on this algorithm brings the worst-case time down to (see selection algorithm
    Selection algorithm

    In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list This includes the cases of finding the minimum, maximum, and median elements....
     for more information).

    Conversely, once we know a worst-case selection algorithm is available, we can use it to find the ideal pivot (the median) at every step of quicksort, producing a variant with worst-case running time. In practical implementations, however, this variant is considerably slower on average.

    Another variant is to choose the Median of Medians as the pivot element instead of the median itself for partitioning the elements. While maintaining the asymptotically optimal run time complexity of (by preventing worst case partitions), it is also considerably faster than the variant that chooses the median as pivot.

    The C implementation of the above algorithm:

    Variants

    There are three variants of quicksort that are worth mentioning:

    • Balanced quicksort: choose a pivot likely to represent the middle of the values to be sorted, and then follow the regular quicksort algorithm.
    • External quicksort: The same as regular quicksort except the pivot is replaced by a buffer. First, read the M/2 first and last elements into the buffer and sort them. Read the next element from the beginning or end to balance writing. If the next element is less than the least of the buffer, write it to available space at the beginning. If greater than the greatest, write it to the end. Otherwise write the greatest or least of the buffer, and put the next element in the buffer. Keep the maximum lower and minimum upper keys written to avoid resorting middle elements that are in order. When done, write the buffer. Recursively sort the smaller partition, and loop to sort the remaining partition.
    • Three-way radix quicksort (also called multikey quicksort): is a combination of radix sort
      Radix sort

      In computer science, radix sort is a sorting algorithm created by Dr. Ian P. Turnipseed that sorts integers by processing individual digits. Because integers can represent strings of characters and specially formatted floating point numbers, radix sort is not limited to integers....
       and quicksort. Pick an element from the array (the pivot) and consider the first character (key) of the string (multikey). Partition the remaining elements into three sets: those whose corresponding character is less than, equal to, and greater than the pivot's character. Recursively sort the "less than" and "greater than" partitions on the same character. Recursively sort the "equal to" partition by the next character (key).


    Comparison with other sorting algorithms

    Quicksort is a space-optimized version of the binary tree sort. Instead of inserting items sequentially into an explicit tree, quicksort organizes them concurrently into a tree that is implied by the recursive calls. The algorithms make exactly the same comparisons, but in a different order.

    The most direct competitor of quicksort is heapsort
    Heapsort

    Heapsort is a comparison sort sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantage of a worst-case big O notation runtime....
    . Heapsort is typically somewhat slower than quicksort, but the worst-case running time is always . Quicksort is usually faster, though there remains the chance of worst case performance except in the introsort
    Introsort

    Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with quicksort and switches to heapsort when the recursion depth exceeds a level based on the number of elements being sorted....
     variant. If it's known in advance that heapsort is going to be necessary, using it directly will be faster than waiting for introsort to switch to it.

    Quicksort also competes with mergesort, another recursive sort algorithm but with the benefit of worst-case running time. Mergesort is a stable sort, unlike quicksort and heapsort, and can be easily adapted to operate on linked list
    Linked list

    In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of node s, each containing arbitrary data Field s and one or two reference s pointing to the next and/or previous nodes....
    s and very large lists stored on slow-to-access media such as disk storage
    Disk storage

    Disk storage is a general category of a computer storage mechanisms, in which data is recorded on planar, round and rotating surfaces . A disk drive is a peripheral device used to record and retrieve information....
     or network attached storage. Although quicksort can be written to operate on linked lists, it will often suffer from poor pivot choices without random access. The main disadvantage of mergesort is that, when operating on arrays, it requires auxiliary space in the best case, whereas the variant of quicksort with in-place partitioning and tail recursion uses only space. (Note that when operating on linked lists, mergesort only requires a small, constant amount of auxiliary storage.)

    Bucket sort
    Bucket sort

    Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of bucket s. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm....
     with two buckets is very similar to quicksort; the pivot in this case is effectively the value in the middle of the value range, which does well on average for uniformly distributed inputs.

    See also

    • Introsort
      Introsort

      Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with quicksort and switches to heapsort when the recursion depth exceeds a level based on the number of elements being sorted....
    • Flashsort
      Flashsort

      Flashsort is a sorting algorithm with extremely good Big O notation efficiency for balanced data sets published in 1998 by Karl-Dietrich Neubert....


    External links

    • – graphical demonstration and discussion of quick sort
    • – graphical demonstration and discussion of 3-way partition quick sort
    • with "level-order" recursive calls to help improve algorithm analysis
    • on LiteratePrograms
    • which allows experimentation with initial state and shows statistics
    • that explains bubble sort and quick sort and compares their performance.


  •