Binary search tree

Binary search tree

Discussion
 Ask a question about 'Binary search tree' Start a new discussion about 'Binary search tree' Answer questions from other users Full Discussion Forum

Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, a binary search tree (BST), which may sometimes also be called an ordered or sorted binary tree, is a node-based
Node (computer science)
A node is a record consisting of one or more fields that are links to other nodes, and a data field. The link and data fields are often implemented by pointers or references although it is also quite common for the data to be embedded directly in the node. Nodes are used to build linked, often...

binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...

data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

which has the following properties:
• The left subtree of a node contains only nodes with keys less than the node's key.
• The right subtree of a node contains only nodes with keys greater than the node's key.
• Both the left and right subtrees must also be binary search trees.

Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their associated records.

The major advantage of binary search trees over other data structures is that the related sorting algorithm
Sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

s and search algorithm
Search algorithm
In computer science, a search algorithm is an algorithm for finding an item with specified properties among a collection of items. The items may be stored individually as records in a database; or may be elements of a search space defined by a mathematical formula or procedure, such as the roots...

s such as in-order traversal can be very efficient.

Binary search trees are a fundamental data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

used to construct more abstract data structures such as sets
Set (computer science)
In computer science, a set is an abstract data structure that can store certain values, without any particular order, and no repeated values. It is a computer implementation of the mathematical concept of a finite set...

, multisets, and associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

s.

Operations

Operations on a binary search tree require comparisons between nodes. These comparisons are made with calls to a comparator, which is a subroutine
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

that computes the total order (linear order) on any two values. This comparator can be explicitly or implicitly defined, depending on the language in which the BST is implemented.

Searching

Searching a binary search tree for a specific value can be a recursive
Recursion (computer science)
Recursion in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem. The approach can be applied to many types of problems, and is one of the central ideas of computer science....

or iterative process. This explanation covers a recursive method.

We begin by examining the root node. If the tree is null, the value we are searching for does not exist in the tree. Otherwise, if the value equals the root, the search is successful. If the value is less than the root, search the left subtree. Similarly, if it is greater than the root, search the right subtree. This process is repeated until the value is found or the indicated subtree is null. If the searched value is not found before a null subtree is reached, then the item must not be present in the tree.

Here is the search algorithm in the Python programming language
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

:
1. 'node' refers to the parent-node in this case

def search_binary_tree(node, key):
if node is None:
if key < node.key:
return search_binary_tree(node.leftChild, key)
elif key > node.key:
return search_binary_tree(node.rightChild, key)
else: # key is equal to node key
return node.value # found key

… or equivalent Haskell
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

:

searchBinaryTree _ NullNode = Nothing
searchBinaryTree key (Node nodeKey nodeValue (leftChild, rightChild)) =
case compare key nodeKey of
LT -> searchBinaryTree key leftChild
GT -> searchBinaryTree key rightChild
EQ -> Just nodeValue

This operation requires O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(log n) time in the average case, but needs O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(n) time in the worst case, when the unbalanced tree resembles a linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

(degenerate tree).

Assuming that BinarySearchTree is a class with a member function "search(int)" and a pointer to the root node, the algorithm is also easily implemented in terms of an iterative approach. The algorithm enters a loop, and decides whether to branch left or right depending on the value of the node at each parent node.

bool BinarySearchTree::search(int val)
{
Node *next = this->root;

while (next != NULL) {
if (val next->value) {
return true;
} else if (val < next->value) {
next = next->left;
} else {
next = next->right;
}
}

return false;
}

Insertion

Insertion begins as a search would begin; if the root is not equal to the value, we search the left or right subtrees as before. Eventually, we will reach an external node and add the value as its right or left child, depending on the node's value. In other words, we examine the root and recursively insert the new node to the left subtree if the new value is less than the root, or the right subtree if the new value is greater than or equal to the root.

Here's how a typical binary search tree insertion might be performed in C++:

/* Inserts the node pointed to by "newNode" into the subtree rooted at "treeNode" */
void InsertNode(Node* &treeNode, Node *newNode)
{
if (treeNode NULL)
treeNode = newNode;
else if (newNode->key < treeNode->key)
InsertNode(treeNode->left, newNode);
else
InsertNode(treeNode->right, newNode);
}

The above "destructive" procedural variant modifies the tree in place. It uses only constant space, but the previous version of the tree is lost. Alternatively, as in the following Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

example, we can reconstruct all ancestors of the inserted node; any reference to the original tree root remains valid, making the tree a persistent data structure
Persistent data structure
In computing, a persistent data structure is a data structure which always preserves the previous version of itself when it is modified; such data structures are effectively immutable, as their operations do not update the structure in-place, but instead always yield a new updated structure...

:

def binary_tree_insert(node, key, value):
if node is None:
return TreeNode(None, key, value, None)
if key node.key:
return TreeNode(node.left, key, value, node.right)
if key < node.key:
return TreeNode(binary_tree_insert(node.left, key, value), node.key, node.value, node.right)
else:
return TreeNode(node.left, node.key, node.value, binary_tree_insert(node.right, key, value))

The part that is rebuilt uses Θ(log n) space in the average case and O(n) in the worst case (see big-O notation).

In either version, this operation requires time proportional to the height of the tree in the worst case, which is O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(log n) time in the average case over all trees, but O(n) time in the worst case.

Another way to explain insertion is that in order to insert a new node in the tree, its value is first compared with the value of the root. If its value is less than the root's, it is then compared with the value of the root's left child. If its value is greater, it is compared with the root's right child. This process continues, until the new node is compared with a leaf node, and then it is added as this node's right or left child, depending on its value.

There are other ways of inserting nodes into a binary tree, but this is the only way of inserting nodes at the leaves and at the same time preserving the BST structure.

Here is an iterative approach to inserting into a binary search tree in Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

:

private Node m_root;

public void insert(int data) {
if (m_root null) {
m_root = new TreeNode(data, null, null);
return;
}
Node root = m_root;
while (root != null) {
// Not the same value twice
if (data

root.getData) { return; } else if (data < root.getData) { // insert left if (root.getLeft

null) {
root.setLeft(new TreeNode(data, null, null));
return;
} else {
root = root.getLeft;
}
} else {
// insert right
if (root.getRight null) {
root.setRight(new TreeNode(data, null, null));
return;
} else {
root = root.getRight;
}
}
}
}

Below is a recursive approach to the insertion method.

private Node m_root;

public void insert(int data){
if (m_root null) {
m_root = TreeNode(data, null, null);
}else{
internalInsert(m_root, data);
}
}

private static void internalInsert(Node node, int data){
// Not the same value twice
if (data

node.getValue) { return; } else if (data < node.mValue) { if (node.getLeft

null) {
node.setLeft(new TreeNode(data, null, null));
}else{
internalInsert(node.getLeft, data);
}
}else{
if (node.getRight null) {
node.setRight(new TreeNode(data, null, null));
}else{
internalInsert(node.getRight, data);
}
}
}

Deletion

There are three possible cases to consider:
• Deleting a leaf (node with no children): Deleting a leaf is easy, as we can simply remove it from the tree.
• Deleting a node with one child: Remove the node and replace it with its child.
• Deleting a node with two children: Call the node to be deleted N. Do not delete N. Instead, choose either its in-order
Tree traversal
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited...

successor node or its in-order predecessor node, R. Replace the value of N with the value of R, then delete R.

As with all binary trees, a node's in-order successor is the left-most child of its right subtree, and a node's in-order predecessor is the right-most child of its left subtree. In either case, this node will have zero or one children. Delete it according to one of the two simpler cases above.
Consistently using the in-order successor or the in-order predecessor for every instance of the two-child case can lead to an unbalanced tree, so good implementations add inconsistency to this selection.

Running Time Analysis:
Although this operation does not always traverse the tree down to a leaf, this is always a possibility; thus in the worst case it requires time proportional to the height of the tree. It does not require more even when the node has two children, since it still follows a single path and does not visit any node twice.

Here is the code in Python:

def findMin(self):

Finds the smallest element that is a child of *self*

current_node = self
while current_node.left_child:
current_node = current_node.left_child
return current_node

def replace_node_in_parent(self, new_value=None):

Removes the reference to *self* from *self.parent* and replaces it with *new_value*.

if self.parent:
if self self.parent.left_child:
self.parent.left_child = new_value
else:
self.parent.right_child = new_value
if new_value:
new_value.parent = self.parent

def binary_tree_delete(self, key):
if key < self.key:
self.left_child.binary_tree_delete(key)
elif key > self.key:
self.right_child.binary_tree_delete(key)
else: # delete the key here
if self.left_child and self.right_child: # if both children are present
# get the smallest node that's bigger than *self*
successor = self.right_child.findMin
self.key = successor.key
# if *successor* has a child, replace it with that
# at this point, it can only have a *right_child*
# if it has no children, *right_child* will be "None"
successor.replace_node_in_parent(successor.right_child)
elif self.left_child or self.right_child: # if the node has only one child
if self.left_child:
self.replace_node_in_parent(self.left_child)
else:
self.replace_node_in_parent(self.right_child)
else: # this node has no children
self.replace_node_in_parent(None)

Source code in C++ (from http://www.algolist.net/Data_structures/Binary_search_tree). This URL also explains the operation nicely using diagrams.

bool BinarySearchTree::remove(int value) {
if (root

NULL) return false; else { if (root->getValue

value) {
BSTNode auxRoot(0);
auxRoot.setLeftChild(root);
BSTNode* removedNode = root->remove(value, &auxRoot);
root = auxRoot.getLeft;
if (removedNode != NULL) {
delete removedNode;
return true;
} else
return false;
} else {
BSTNode* removedNode = root->remove(value, NULL);
if (removedNode != NULL) {
delete removedNode;
return true;
} else
return false;
}
}
}

BSTNode* BSTNode::remove(int value, BSTNode *parent) {
if (value < this->value) {
if (left != NULL)
return left->remove(value, this);
else
return NULL;
} else if (value > this->value) {
if (right != NULL)
return right->remove(value, this);
else
return NULL;
} else {
if (left != NULL && right != NULL) {
this->value = right->minValue;
return right->remove(this->value, this);
} else if (parent->left

this) { parent->left = (left != NULL) ? left : right; return this; } else if (parent->right

this) {
parent->right = (left != NULL) ? left : right;
return this;
}
}
}

int BSTNode::minValue {
if (left NULL)
return value;
else
return left->minValue;
}

Traversal

Once the binary search tree has been created, its elements can be retrieved in-order by recursively
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...

traversing the left subtree of the root node, accessing the node itself, then recursively traversing the right subtree of the node, continuing this pattern with each node in the tree as it's recursively accessed. As with all binary trees, one may conduct a pre-order traversal or a post-order traversal, but neither are likely to be useful for binary search trees.

The code for in-order traversal in Python is given below. It will call callback for every node in the tree.

def traverse_binary_tree(node, callback):
if node is None:
return
traverse_binary_tree(node.leftChild, callback)
callback(node.value)
traverse_binary_tree(node.rightChild, callback)

Traversal requires Ω(n) time, since it must visit every node. This algorithm is also O(n), so it is asymptotically optimal
Asymptotically optimal
In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm...

.
The Code for in-order traversal in Language C is given below.

void InOrderTraversal(struct Node *n)
{
struct Node *Cur, *Pre;
if(nNULL)
return;

Cur = n;
while(Cur != NULL)
{
if(Cur->lptr

NULL) { printf("\t%d",Cur->val); Cur= Cur->rptr; } else { Pre = Cur->lptr; while(Pre->rptr !=NULL && Pre->rptr != Cur) Pre = Pre->rptr; if (Pre->rptr

NULL)
{
Pre->rptr = Cur;
Cur = Cur->lptr;
}
else
{
Pre->rptr = NULL;
printf("\t%d",Cur->val);
Cur = Cur->rptr;
}
}
}
}

Sort

A binary search tree can be used to implement a simple but efficient sorting algorithm
Sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

. Similar to heapsort
Heapsort
Heapsort is a comparison-based sorting algorithm to create a sorted array , and is part of the selection sort family. Although somewhat slower in practice on most machines than a well implemented quicksort, it has the advantage of a more favorable worst-case O runtime...

, we insert all the values we wish to sort into a new ordered data structure—in this case a binary search tree—and then traverse it in order, building our result:

def build_binary_tree(values):
tree = None
for v in values:
tree = binary_tree_insert(tree, v)
return tree

def get_inorder_traversal(root):

Returns a list containing all the values in the tree, starting at *root*.
Traverses the tree in-order(leftChild, root, rightChild).

result = []
traverse_binary_tree(root, lambda element: result.append(element))
return result

The worst-case time of `build_binary_tree` is —if you feed it a sorted list of values, it chains them into a linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

with no left subtrees. For example, `build_binary_tree([1, 2, 3, 4, 5])` yields the tree `(1 (2 (3 (4 (5)))))`.

There are several schemes for overcoming this flaw with simple binary trees; the most common is the self-balancing binary search tree
Self-balancing binary search tree
In computer science, a self-balancing binary search tree is any node based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions....

. If this same procedure is done using such a tree, the overall worst-case time is O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(nlog n), which is asymptotically optimal
Asymptotically optimal
In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm...

for a comparison sort
Comparison sort
A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list...

. In practice, the poor cache
CPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...

performance and added overhead in time and space for a tree-based sort (particularly for node allocation) make it inferior to other asymptotically optimal sorts such as heapsort
Heapsort
Heapsort is a comparison-based sorting algorithm to create a sorted array , and is part of the selection sort family. Although somewhat slower in practice on most machines than a well implemented quicksort, it has the advantage of a more favorable worst-case O runtime...

for static list sorting. On the other hand, it is one of the most efficient methods of incremental sorting, adding items to a list over time while keeping the list sorted at all times.

Types

There are many types of binary search trees. AVL tree
AVL tree
In computer science, an AVL tree is a self-balancing binary search tree, and it was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one. Lookup, insertion, and deletion all take O time in both the average and worst...

s and red-black tree
Red-black tree
A red–black tree is a type of self-balancing binary search tree, a data structure used in computer science, typically to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer and named "symmetric binary B-tree," but acquired its modern name in a paper in 1978 by...

s are both forms of self-balancing binary search tree
Self-balancing binary search tree
In computer science, a self-balancing binary search tree is any node based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions....

s. A splay tree
Splay tree
A splay tree is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O amortized time. For many sequences of nonrandom operations, splay trees perform...

is a binary search tree that automatically moves frequently accessed elements nearer to the root. In a treap
Treap
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys...

("tree heap
Heap (data structure)
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key. This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap...

"), each node also holds a (randomly chosen) priority and the parent node has higher priority than its children. Tango Trees
Tango Trees
A Tango tree is an online binary search tree that is O-competitive proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Patrascu in 2004.-Overview:...

are trees optimized for fast searches.

Two other titles describing binary search trees are that of a complete and degenerate tree.

A complete tree is a tree with n levels, where for each level d <= n - 1, the number of existing nodes at level d is equal to 2d. This means all possible nodes exist at these levels. An additional requirement for a complete binary tree is that for the nth level, while every node does not have to exist, the nodes that do exist must fill from left to right.

A degenerate tree is a tree where for each parent node, there is only one associated child node. What this means is that in a performance measurement, the tree will essentially behave like a linked list data structure.

Performance comparisons

D. A. Heger (2004) presented a performance comparison of binary search trees. Treap
Treap
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys...

was found to have the best average performance, while red-black tree
Red-black tree
A red–black tree is a type of self-balancing binary search tree, a data structure used in computer science, typically to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer and named "symmetric binary B-tree," but acquired its modern name in a paper in 1978 by...

was found to have the smallest amount of performance fluctuations.

Optimal binary search trees

If we don't plan on modifying a search tree, and we know exactly how often each item will be accessed, we can construct an optimal binary search tree, which is a search tree where the average cost of looking up an item (the expected search cost) is minimized.

Even if we only have estimates of the search costs, such a system can considerably speed up lookups on average. For example, if you have a BST of English words used in a spell checker
Spell checker
In computing, a spell checker is an application program that flags words in a document that may not be spelled correctly. Spell checkers may be stand-alone capable of operating on a block of text, or as part of a larger application, such as a word processor, email client, electronic dictionary,...

, you might balance the tree based on word frequency in text corpora
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

, placing words like "the" near the root and words like "agerasia" near the leaves. Such a tree might be compared with Huffman trees, which similarly seek to place frequently-used items near the root in order to produce a dense information encoding; however, Huffman trees only store data elements in leaves and these elements need not be ordered.

If we do not know the sequence in which the elements in the tree will be accessed in advance, we can use splay tree
Splay tree
A splay tree is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O amortized time. For many sequences of nonrandom operations, splay trees perform...

s which are asymptotically as good as any static search tree we can construct for any particular sequence of lookup operations.

Alphabetic trees are Huffman trees with the additional constraint on order, or, equivalently, search trees with the modification that all elements are stored in the leaves. Faster algorithms exist for optimal alphabetic binary trees (OABTs).

Example:
```
procedure Optimum Search Tree(f, f´, c):
for j = 0 to n do
c[j, j] = 0, F[j, j] = f´j
for d = 1 to n do
for i = 0 to (n − d) do
j = i + d
F[i, j] = F[i, j − 1] + f´ + f´j
c[i, j] = MIN(i
```

• Donald Knuth
Donald Knuth
Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Section 6.2.2: Binary Tree Searching, pp. 426–458.
• Thomas H. Cormen
Thomas H. Cormen
Thomas H. Cormen is the co-author of Introduction to Algorithms, along with Charles Leiserson, Ron Rivest, and Cliff Stein. He is a Full Professor of computer science at Dartmouth College and currently Chair of the Dartmouth College Department of Computer Science. Between 2004 and 2008 he directed...

, Charles E. Leiserson
Charles E. Leiserson
Charles Eric Leiserson is a computer scientist, specializing in the theory of parallel computing and distributed computing, and particularly practical applications thereof; as part of this effort, he developed the Cilk multithreaded language...

, Ronald L. Rivest, and Clifford Stein
Clifford Stein
Clifford Stein, a computer scientist, is currently a professor of industrial engineering and operations research at Columbia University in New York, NY, where he also holds an appointment in the Department of Computer Science. Stein is chair of the Industrial Engineering and Operations Research...

. Introduction to Algorithms
Introduction to Algorithms
Introduction to Algorithms is a book by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. It is used as the textbook for algorithms courses at many universities. It is also one of the most commonly cited references for algorithms in published papers, with over 4600...

, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapter 12: Binary search trees, pp. 253–272. Section 15.5: Optimal binary search trees, pp. 356–363.