Soft heap
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, a soft heap is a variant on the simple heap
Heap (data structure)
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key. This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap...

 data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 that has constant amortized
Amortized analysis
In computer science, amortized analysis is a method of analyzing algorithms that considers the entire sequence of operations of the program. It allows for the establishment of a worst-case bound for the performance of an algorithm irrespective of the inputs by looking at all of the operations...

 time for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a certain fixed percentage of values in the heap. The constant time operations are:
  • create(S): Create a new soft heap
  • insert(S, x): Insert an element into a soft heap
  • meld(S, S' ): Combine the contents of two soft heaps into one, destroying both
  • delete(S, x): Delete an element from a soft heap
  • findmin(S): Get the element with minimum key in the soft heap


It was designed by Bernard Chazelle
Bernard Chazelle
Bernard Chazelle is the Eugene Higgins Professor of Computer Science at Princeton University. Much of his work is in computational geometry, where he has found many of the best-known algorithms, such as linear-time triangulation of a simple polygon, as well as many useful complexity results, such...

 in 2000. The term "corruption" in the structure is the result of what Chazelle called "carpooling" in a soft heap. Each node in the soft heap contains a linked-list of keys and one common key. The common key is an upper bound on the values of the keys in the linked-list. Once a key is added to the linked-list, it is considered corrupted because its value is never again relevant in any of the soft heap operations: only the common keys are compared. It is unpredictable which keys will be corrupted in this manner; it is only known that at most a fixed percentage will be corrupted. This is what makes soft heaps "soft"; you can't be sure whether or not any particular value you put into it will be corrupted. The purpose of these corruptions is effectively to lower the information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

 of the data, enabling the data structure to break through information-theoretic
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

 barriers regarding heaps.

Other heaps such as Fibonacci heap
Fibonacci heap
In computer science, a Fibonacci heap is a heap data structure consisting of a collection of trees. It has a better amortized running time than a binomial heap. Fibonacci heaps were developed by Michael L. Fredman and Robert E. Tarjan in 1984 and first published in a scientific journal in 1987...

s achieve most of these bounds without any corruption, but cannot provide a constant-time bound on the critical delete operation. The percentage of values which are corrupted can be chosen freely, but the lower this is set, the more time insertions require (O(log 1/ε) for an error rate of ε).

Applications

Surprisingly, soft heaps are useful in the design of deterministic algorithms, despite their unpredictable nature. They were used to achieve the best complexity to date for finding a minimum spanning tree
Minimum spanning tree
Given a connected, undirected graph, a spanning tree of that graph is a subgraph that is a tree and connects all the vertices together. A single graph can have many different spanning trees...

. They can also be used to easily build an optimal selection algorithm
Selection algorithm
In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list . This includes the cases of finding the minimum, maximum, and median elements. There are O, worst-case linear time, selection algorithms...

, as well as near-sorting algorithms, which are algorithms that place every element near its final position, a situation in which insertion sort
Insertion sort
Insertion sort is a simple sorting algorithm: a comparison sort in which the sorted array is built one entry at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort...

 is fast.

One of the simplest examples is the selection algorithm. Say we want to find the kth largest of a group of n numbers. First, we choose an error rate of 1/3; that is, at most 33% of the keys we insert will be corrupted. Now, we insert all n elements into the heap — at this point, at most n/3 keys are corrupted. Next, we delete the minimum element from the heap about n/3 times. Because this is decreasing the size of the heap, it cannot increase the number of corrupted elements. Thus there are still at most n/3 keys that are corrupted.

Now at least 2n/3 − n/3 = n/3 of the remaining keys are not corrupted, so each must be larger than every element we removed. Let L be the element that we have removed with the largest (actual) value, which is not necessarily the last element that we removed (because the last element we removed could have had its key corrupted, or increased, to a value larger than another element that we have already removed). L is larger than all the other n/3 elements that we removed and smaller than the remaining n/3 uncorrupted elements in the soft heap. Therefore, L divides the elements somewhere between 33%/66% and 66%/33%. We then partition the set about L using the partition algorithm from quicksort and apply the same algorithm again to either the set of numbers less than L or the set of numbers greater than L, neither of which can exceed 2n/3 elements. Since each insertion and deletion requires O(1) amortized time, the total deterministic time is T(n) = T(2n/3) + O(n). Using case 3 of the master theorem
Master theorem
In the analysis of algorithms, the master theorem provides a cookbook solution in asymptotic terms for recurrence relations of types that occur in the analysis of many divide and conquer algorithms...

(with ε=1 and c=2/3), we know that T(n) = Θ(n).

The final algorithm looks like this:

function softHeapSelect(a[1..n], k)
if k = 1 then return minimum(a[1..n])
create(S)
for i from 1 to n
insert(S, a[i])
for i from 1 to n/3
x := findmin(S)
delete(S, x)
xIndex := partition(a, x) // Returns new index of pivot x
if k < xIndex
softHeapSelect(a[1..xIndex-1], k)
else
softHeapSelect(a[xIndex..n], k-xIndex+1)
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK