Merge algorithm
Encyclopedia
Merge algorithms are a family of algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s that run sequentially over multiple sorted
Sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

 lists, typically producing more sorted lists as output. This is well-suited for machines with tape drive
Tape drive
A tape drive is a data storage device that reads and performs digital recording, writes data on a magnetic tape. Magnetic tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and long archival stability.A tape drive provides...

s. Use has declined due to large random access memories, and many applications of merge algorithms have faster alternatives when a random-access memory is available.

The general merge algorithm has a set of pointers p0..n that point to positions in a set of lists L0..n. Initially they point to the first item in each list. The algorithm is as follows:

While any of p0..n still point to data inside of L0..n instead of past the end:
  1. do something with the data items p0..n point to in their respective lists
  2. find out which of those pointers points to the item with the lowest key; advance one of those pointers to the next item in its list

Analysis

Merge algorithms generally run in time proportional to the sum of the lengths of the lists; merge algorithms that operate on large numbers of lists at once will multiply the sum of the lengths of the lists by the time to figure out which of the pointers points to the lowest item, which can be accomplished with a heap
Heap (data structure)
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key. This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap...

-based priority queue
Priority queue
A priority queue is an abstract data type in computer programming.It is exactly like a regular queue or stack data structure, but additionally, each element is associated with a "priority"....

 in O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(log n) time, for O(m log n) time, where n is the number of lists being merged and m is the sum of the lengths of the lists. When merging two lists of length m, there is a lower bound of 2m − 1 comparisons required in the worst case.

The classic merge (the one used in merge sort
Merge sort
Merge sort is an O comparison-based sorting algorithm. Most implementations produce a stable sort, meaning that the implementation preserves the input order of equal elements in the sorted output. It is a divide and conquer algorithm...

) outputs the data item with the lowest key at each step; given some sorted lists, it produces a sorted list containing all the elements in any of the input lists, and it does so in time proportional to the sum of the lengths of the input lists.

Language support

The C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

's Standard Template Library
Standard Template Library
The Standard Template Library is a C++ software library which later evolved into the C++ Standard Library. It provides four components called algorithms, containers, functors, and iterators. More specifically, the C++ Standard Library is based on the STL published by SGI. Both include some...

 has the function std::merge, which merges two sorted ranges of iterators, and std::inplace_merge, which merges two consecutive sorted ranges in-place. In addition, the std::list (linked list) class has its own merge method which merges another list into itself. The type of the elements merged must support the less-than (<) operator, or it must be provided with a custom comparator.

Python (programming language)
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

's standard library (since 2.6) also has a merge function in the heapq module, that takes multiple sorted iterables, and merges them into a single iterator.http://docs.python.org/library/heapq.html#heapq.merge

Parallel Merge

In parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

, arrays of sorted values may be merged efficiently using an all nearest smaller values
All nearest smaller values
In computer science, the all nearest smaller values problem involves computing, for each value in a sequence of numbers, the previous smaller value that has the closest position in the sequence to the given value...

computation.

Parallel merge can also be implemented using a divide-and-conquer algorithm, developed and shown in pseudo-code in. This algorithm performs well when combined with a fast sequential merge as a base case for merging of small arrays. Implementation using Intel's Threading Building Blocks (TBB) and Microsoft's Parallel Pattern Library (PPL) to run on multi-core processors is shown to perform well in practice.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK