Binary tree

# Binary tree

Overview

Discussion
 Ask a question about 'Binary tree' Start a new discussion about 'Binary tree' Answer questions from other users Full Discussion Forum

Encyclopedia

In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, a binary tree is a tree data structure
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to the "root" node (the ancestor of all nodes), if it exists. Any node in the data structure can be reached by starting at root node and repeatedly following references to either the left or right child.

Binary trees are used to implement binary search tree
Binary search tree
In computer science, a binary search tree , which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structurewhich has the following properties:...

s and binary heap
Binary heap
A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:*The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one are fully filled, and, if the last level of...

s.

## Definitions for rooted trees

• A directed edge refers to the link from the parent to the child (the arrows in the picture of the tree).
• The root node of a tree is the node
Node (computer science)
A node is a record consisting of one or more fields that are links to other nodes, and a data field. The link and data fields are often implemented by pointers or references although it is also quite common for the data to be embedded directly in the node. Nodes are used to build linked, often...

with no parents. There is at most one root node in a rooted tree.
• A leaf node has no children.
• The depth of a node n is the length of the path from the root to the node. The set of all nodes at a given depth is sometimes called a level of the tree. The root node is at depth zero.
• The height of a tree is the length of the path from the root to the deepest node in the tree. A (rooted) tree with only one node (the root) has a height of zero.
• Siblings are nodes that share the same parent node.
• A node p is an ancestor of a node q if it exists on the path from q to the root. The node q is then termed a descendant of p.
• The size of a node is the number of descendants it has including itself.
• In-degree of a node is the number of edges arriving at that node.
• Out-degree of a node is the number of edges leaving that node.
• The root is the only node in the tree with In-degree = 0.

## Types of binary trees

• A rooted binary tree is a tree with a root node in which every node has at most two children.
• A full binary tree (sometimes proper binary tree or 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children.
• A perfect binary tree is a full binary tree in which all leaves are at the same depth or same level, and in which every parent has two children. (This is ambiguously also called a complete binary tree.)
• A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.

• An infinite complete binary tree is a tree with a countably infinite number of levels, in which every node has two children, so that there are 2d nodes at level d. The set of all nodes is countably infinite, but the set of all infinite paths from the root is uncountable: it has the cardinality of the continuum
Cardinality of the continuum
In set theory, the cardinality of the continuum is the cardinality or “size” of the set of real numbers \mathbb R, sometimes called the continuum. It is an infinite cardinal number and is denoted by |\mathbb R| or \mathfrak c ....

. These paths corresponding by an order preserving bijection to the points of the Cantor set
Cantor set
In mathematics, the Cantor set is a set of points lying on a single line segment that has a number of remarkable and deep properties. It was discovered in 1875 by Henry John Stephen Smith and introduced by German mathematician Georg Cantor in 1883....

, or (through the example of the Stern–Brocot tree) to the set of positive irrational number
Irrational number
In mathematics, an irrational number is any real number that cannot be expressed as a ratio a/b, where a and b are integers, with b non-zero, and is therefore not a rational number....

s.
• A balanced binary tree is commonly defined as a binary tree in which the height of the two subtrees of every node never differ by more than 1, although in general it is a binary tree where no leaf is much farther away from the root than any other leaf. (Different balancing schemes allow different definitions of "much farther"). Binary trees that are balanced according to this definition have a predictable depth (how many nodes are traversed from the root to a leaf, root counting as node 0 and subsequent as 1, 2, ..., depth). This depth is equal to the integer part of where is the number of nodes on the balanced tree. Example 1: balanced tree with 1 node, (depth = 0). Example 2: balanced tree with 3 nodes, (depth=1). Example 3: balanced tree with 5 nodes, (depth of tree is 2 nodes).
• A rooted complete binary tree can be identified with a free magma.
• A degenerate tree is a tree where for each parent node, there is only one associated child node. This means that in a performance measurement, the tree will behave like a linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

data structure.

Note that this terminology often varies in the literature, especially with respect to the meaning "complete" and "full".

## Properties of binary trees

• The number of nodes in a perfect binary tree can be found using this formula: where is the height of the tree.
• The number of nodes in a complete binary tree is at least and at most where is the height of the tree.
• The number of leaf nodes in a perfect binary tree can be found using this formula: where is the height of the tree.
• The number of nodes in a perfect binary tree can also be found using this formula: where is the number of leaf nodes in the tree.
• The number of null links (absent children of nodes) in a complete binary tree of n nodes is (n+1).
• The number of internal nodes in a Complete Binary Tree of n nodes is .
• For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2 + 1.
Proof:
Let n = the total number of nodes
B = number of branches
n0, n1, n2 represent the number of nodes with no children, a single child, and two children respectively.

B = n - 1 (since all nodes except the root node come from a single branch)
B = n1 + 2*n2
n = n1+ 2*n2 + 1
n = n0 + n1 + n2
n1+ 2*n2 + 1 = n0 + n1 + n2

## > n0 = n2 + 1

Common operations
There are a variety of different operations that can be performed on binary trees. Some are mutator
Mutator method
In computer science, a mutator method is a method used to control changes to a variable.The mutator method, sometimes called a "setter", is most often used in object-oriented programming, in keeping with the principle of encapsulation...

operations, while others simply return useful information about the tree.

### Insertion

Nodes can be inserted into binary trees in between two other nodes or added after an external node. In binary trees, a node that is inserted is specified as to which child it is.

#### External nodes

Say that the external node being added on to is node A. To add a new node after node A, A assigns the new node as one of its children and the new node assigns node A as its parent.

#### Internal nodes

Insertion on internal nodes is slightly more complex than on external nodes. Say that the internal node is node A and that node B is the child of A. (If the insertion is to insert a right child, then B is the right child of A, and similarly with a left child insertion.) A assigns its child to the new node and the new node assigns its parent to A. Then the new node assigns its child to B and B assigns its parent as the new node.

### Deletion

Deletion is the process whereby a node is removed from the tree. Only certain nodes in a binary tree can be removed unambiguously.

#### Node with zero or one children

Say that the node to delete is node A. If a node has no children (external node), deletion is accomplished by setting the child of A's parent to null. If it has one child, set the parent of A's child to A's parent and set the child of A's parent to A's child.

#### Node with two children

In a binary tree, a node with two children cannot be deleted unambiguously. However, in certain binary trees these nodes can be deleted, including binary search tree
Binary search tree
In computer science, a binary search tree , which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structurewhich has the following properties:...

s.

### Iteration

Often, one wishes to visit each of the nodes in a tree and examine the value there, a process called iteration
Iteration
Iteration means the act of repeating a process usually with the aim of approaching a desired goal or target or result. Each repetition of the process is also called an "iteration," and the results of one iteration are used as the starting point for the next iteration.-Mathematics:Iteration in...

or enumeration
Enumeration
In mathematics and theoretical computer science, the broadest and most abstract definition of an enumeration of a set is an exact listing of all of its elements . The restrictions imposed on the type of list used depend on the branch of mathematics and the context in which one is working...

. There are several common orders in which the nodes can be visited, and each has useful properties that are exploited in algorithms based on binary trees:
• Pre-Order: Root first, Left child, Right child
• Post-Order: Left Child, Right child, root
• In-Order: Left child, root, right child.

#### Pre-order, in-order, and post-order traversal

Pre-order, in-order, and post-order traversal visit each node in a tree by recursively visiting each node in the left and right subtrees of the root. If the root node is visited before its subtrees, this is pre-order; if after, post-order; if between, in-order. In-order traversal is useful in binary search tree
Binary search tree
In computer science, a binary search tree , which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structurewhich has the following properties:...

s, where this traversal visits the nodes in increasing order.

#### Depth-first order

In depth-first order, we always attempt to visit the node farthest from the root that we can, but with the caveat that it must be a child of a node we have already visited. Unlike a depth-first search on graphs, there is no need to remember all the nodes we have visited, because a tree cannot contain cycles. Pre-order is a special case of this. See depth-first search
Depth-first search
Depth-first search is an algorithm for traversing or searching a tree, tree structure, or graph. One starts at the root and explores as far as possible along each branch before backtracking....

Contrasting with depth-first order is breadth-first order, which always attempts to visit the node closest to the root that it has not already visited. See breadth-first search
In graph theory, breadth-first search is a graph search algorithm that begins at the root node and explores all the neighboring nodes...

for more information. Also called a level-order traversal.
Type theory

## In type theoryType theoryIn mathematics, logic and computer science, type theory is any of several formal systems that can serve as alternatives to naive set theory, or the study of such formalisms in general..., a binary tree with nodes of type A is defined inductively as TA = μα. 1 + A × α × α.

Definition in graph theory
For each binary tree data structure, there is equivalent rooted binary tree in graph theory.

Graph theorists
Graph theory
In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. A "graph" in this context refers to a collection of vertices or 'nodes' and a collection of edges that connect pairs of...

use the following definition: A binary tree is a connected acyclic graph such that the degree
Degree (graph theory)
In graph theory, the degree of a vertex of a graph is the number of edges incident to the vertex, with loops counted twice. The degree of a vertex v is denoted \deg. The maximum degree of a graph G, denoted by Δ, and the minimum degree of a graph, denoted by δ, are the maximum and minimum degree...

of each vertex
Vertex (graph theory)
In graph theory, a vertex or node is the fundamental unit out of which graphs are formed: an undirected graph consists of a set of vertices and a set of edges , while a directed graph consists of a set of vertices and a set of arcs...

is no more than three. It can be shown that in any binary tree of two or more nodes, there are exactly two more nodes of degree one than there are of degree three, but there can be any number of nodes of degree two. A rooted binary tree is such a graph that has one of its vertices of degree no more than two singled out as the root.

With the root thus chosen, each vertex will have a uniquely defined parent, and up to two children; however, so far there is insufficient information to distinguish a left or right child. If we drop the connectedness requirement, allowing multiple connected components
Connected component (graph theory)
In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices. For example, the graph shown in the illustration on the right has three connected components...

in the graph, we call such a structure a forest.

Another way of defining binary trees is a recursive definition on directed graphs. A binary tree is either:
• A single vertex.
• A graph formed by taking two binary trees, adding a vertex, and adding an edge directed from the new vertex to the root of each binary tree.

This also does not establish the order of children, but does fix a specific root node.
Combinatorics
In combinatorics one considers the problem of counting the number of full binary trees of a given size. Here the trees have no values attached to their nodes (this would just multiply the number of possible trees by an easily determined factor), and trees are distinguished only by their structure; however the left and right child of any node are distinguished (if they are different trees, then interchanging them will produce a tree distinct from the original one). The size of the tree is taken to be the number n of internal nodes (those with two children); the other nodes are leaf nodes and there are of them. The number of such binary trees of size n is equal to the number of ways of fully parenthesizing a string of symbols (representing leaves) separated by n binary operators (representing internal nodes), so as to determine the argument subexpressions of each operator. For instance for one has to parenthesize a string like , which is possible in five ways:

The correspondence to binary trees should be obvious, and the addition of redundant parentheses (around an already parenthesized expression or around the full expression) is disallowed (or at least not counted as producing a new possibility).

There is a unique binary tree of size 0 (consisting of a single leaf), and any other binary tree is characterized by the pair of its left and right children; if these have sizes i and j respectively, the full tree has size . Therefore the number of binary trees of size n has the following recursive description , and for any positive integer n. It follows that is the Catalan number
Catalan number
In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involvingrecursively defined objects...

of index n.

The above parenthesized strings should not be confused with the set of words of length 2n in the Dyck language
Dyck language
In the theory of formal languages of computer science, mathematics, and linguistics, the Dyck language is the language consisting of balanced strings of parentheses [ and ]. It is important in the parsing of expressions that must have a correctly nested sequence of parentheses, such as arithmetic...

, which consist only of parentheses in such a way that they are properly balanced. The number of such strings satisfies the same recursive description (each Dyck word of length 2n is determined by the Dyck subword enclosed by the initial '(' and its matching ')' together with the Dyck subword remaining after that closing parenthesis, whose lengths 2i and 2j satisfy ); this number is therefore also the Catalan number . So there are also five Dyck words of length 10:
.

These Dyck words do not correspond in an obvious way to binary trees. A bijective correspondence can nevertheless be defined as follows: enclose the Dyck word in a extra pair of parentheses, so that the result can be interpreted as a Lisp
Lisp
A lisp is a speech impediment, historically also known as sigmatism. Stereotypically, people with a lisp are unable to pronounce sibilants , and replace them with interdentals , though there are actually several kinds of lisp...

list expression (with the empty list as only occurring atom); then the dotted-pair expression for that proper list is a fully parenthesized expression (with NIL as symbol and '.' as operator) describing the corresponding binary tree (which is in fact the internal representation of the proper list).

The ability to represent binary trees as strings of symbols and parentheses implies that binary trees can represent the elements of a free magma on a singleton set.
Methods for storing binary trees
Binary trees can be constructed from programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

primitives in several ways.

### Nodes and references

In a language with records
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...

and reference
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...

s, binary trees are typically constructed by having a tree node structure which contains some data and references to its left child and its right child. Sometimes it also contains a reference to its unique parent. If a node has fewer than two children, some of the child pointers may be set to a special null value, or to a special sentinel node
Sentinel node
A sentinel node is a specifically designated node used with linked lists and trees as a traversal path terminator. A sentinel node does not hold or reference any data managed by the data structure...

.

In languages with tagged union
Tagged union
In computer science, a tagged union, also called a variant, variant record, discriminated union, or disjoint union, is a data structure used to hold a value that could take on several different, but fixed types. Only one of the types can be in use at any one time, and a tag field explicitly...

s such as ML, a tree node is often a tagged union of two types of nodes, one of which is a 3-tuple of data, left child, and right child, and the other of which is a "leaf" node, which contains no data and functions much like the null value in a language with pointers.

### Arrays

Binary trees can also be stored in breadth-first order as an implicit data structure
Implicit data structure
In computer science, an implicit data structure is a data structure that uses very little memory besides the actual data elements i.e. very little information other than main data is stored in these structures. These are storage schemes which retain no pointers and represent the file of n k-key...

in arrays, and if the tree is a complete binary tree, this method wastes no space. In this compact arrangement, if a node has an index i, its children are found at indices (for the left child) and (for the right), while its parent (if any) is found at index (assuming the root has index zero). This method benefits from more compact storage and better locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

, particularly during a preorder traversal. However, it is expensive to grow and wastes space proportional to 2h - n for a tree of height h with n nodes.

This method of storage is often used for binary heaps
Binary heap
A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:*The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one are fully filled, and, if the last level of...

. No space is wasted because nodes are added in breadth-first order.

### Succinct encodings

A succinct data structure
Succinct data structure
In computer science, a succinct data structure is data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, trees, and planar...

is one which takes the absolute minimum possible space, as established by information theoretical
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

lower bounds. The number of different binary trees on nodes is , the th Catalan number
Catalan number
In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involvingrecursively defined objects...

(assuming we view trees with identical structure as identical). For large , this is about ; thus we need at least about bits to encode it. A succinct binary tree therefore would occupy only 2 bits per node.

One simple representation which meets this bound is to visit the nodes of the tree in preorder, outputting "1" for an internal node and "0" for a leaf. http://theory.csail.mit.edu/classes/6.897/spring03/scribe_notes/L12/lecture12.pdf If the tree contains data, we can simply simultaneously store it in a consecutive array in preorder. This function accomplishes this:

function EncodeSuccinct(node n, bitstring structure, array data) {
if n = nil then
append 0 to structure;
else
append 1 to structure;
append n.data to data;
EncodeSuccinct(n.left, structure, data);
EncodeSuccinct(n.right, structure, data);
}

The string structure has only bits in the end, where is the number of (internal) nodes; we don't even have to store its length. To show that no information is lost, we can convert the output back to the original tree like this:

function DecodeSuccinct(bitstring structure, array data) {
remove first bit of structure and put it in b
if b = 1 then
create a new node n
remove first element of data and put it in n.data
n.left = DecodeSuccinct(structure, data)
n.right = DecodeSuccinct(structure, data)
return n
else
return nil
}

More sophisticated succinct representations allow not only compact storage of trees but even useful operations on those trees directly while they're still in their succinct form.

### Encoding general trees as binary trees

There is a one-to-one mapping between general ordered trees and binary trees, which in particular is used by Lisp to represent general ordered trees as binary trees. To convert a general ordered tree to binary tree, we only need to represent the general tree in left child-sibling way. The result of this representation will be automatically binary tree, if viewed from a different perspective. Each node N in the ordered tree corresponds to a node N' in the binary tree; the left child of N' is the node corresponding to the first child of N, and the right child of N' is the node corresponding to N 's next sibling --- that is, the next node in order among the children of the parent of N. This binary tree representation of a general order tree is sometimes also referred to as a left child-right sibling binary tree
Left child-right sibling binary tree
In computer science, a left child-right sibling binary tree is a binary tree representation of a k-ary tree. The process of converting from a k-ary tree to an LC-RS binary tree is not reversible in general without additional information....

(LCRS tree), or a doubly chained tree, or a Filial-Heir chain.

One way of thinking about this is that each node's children are in a linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

, chained together with their right fields, and the node only has a pointer to the beginning or head of this list, through its left field.

For example, in the tree on the left, A has the 6 children {B,C,D,E,F,G}. It can be converted into the binary tree on the right.

The binary tree can be thought of as the original tree tilted sideways, with the black left edges representing first child and the blue right edges representing next sibling. The leaves of the tree on the left would be written in Lisp as:
I J) C D ((P) (Q)) F (M))

which would be implemented in memory as the binary tree on the right, without any letters on those nodes that have a left child.
• 2-3 tree
2-3 tree
In computer science, a 2-3 tree is a type of data structure, a tree where every node with children has either two children and one data element or three children and two data elements...

• 2-3-4 tree
2-3-4 tree
In computer science, a 2-3-4 tree is a self-balancing data structure that is commonly used to implement dictionaries...

• AA tree
AA tree
An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named for Arne Andersson, their inventor....

• B-tree
B-tree
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...

• Binary space partitioning
Binary space partitioning
In computer science, binary space partitioning is a method for recursively subdividing a space into convex sets by hyperplanes. This subdivision gives rise to a representation of the scene by means of a tree data structure known as a BSP tree.Originally, this approach was proposed in 3D computer...

• Elastic binary tree
• Huffman tree
• Kraft's inequality
Kraft's inequality
In coding theory, Kraft's inequality, named after Leon Kraft, gives a necessary and sufficient condition for the existence of a uniquely decodable code for a given set of codeword lengths...

• Random binary tree
Random binary tree
In computer science and probability theory, a random binary tree refers to a binary tree selected at random from some probability distribution on binary trees...

• Recursion (computer science)
Recursion (computer science)
Recursion in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem. The approach can be applied to many types of problems, and is one of the central ideas of computer science....

• Red-black tree
Red-black tree
A red–black tree is a type of self-balancing binary search tree, a data structure used in computer science, typically to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer and named "symmetric binary B-tree," but acquired its modern name in a paper in 1978 by...

• Rope (computer science)
Rope (computer science)
In computer programming a rope, or cord, is a data structure for efficiently storing and manipulating a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done...

• Self-balancing binary search tree
Self-balancing binary search tree
In computer science, a self-balancing binary search tree is any node based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions....

• Strahler number
Strahler number
In mathematics, the Strahler number or Horton–Strahler number of a mathematical tree is a numerical measure of its branching complexity....