Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems.... , a kd-tree (short for k-dimensional tree) is a space-partitioning
Space partitioning
In mathematics, space partitioning is the process of dividing a space into two or more disjoint subsets . In other words, space partitioning divides a space into non-overlapping regions.... data structure
Data structure
A data structure in computer science is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data.... for organizing point
Point (geometry)
In geometry, topology and related branches of mathematics a spatial point describes a specific object within a given space that consists of neither volume, area, length, nor any other higher dimensional analogue.... s in a k-dimensional space
Euclidean space
Around 300 Before Christ, the Ancient Greece mathematician Euclid undertook a study of relationships among distances and angles, first in a plane and then in space.... . kd-trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbour searches).
Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems.... , a kd-tree (short for k-dimensional tree) is a space-partitioning
Space partitioning
In mathematics, space partitioning is the process of dividing a space into two or more disjoint subsets . In other words, space partitioning divides a space into non-overlapping regions.... data structure
Data structure
A data structure in computer science is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data.... for organizing point
Point (geometry)
In geometry, topology and related branches of mathematics a spatial point describes a specific object within a given space that consists of neither volume, area, length, nor any other higher dimensional analogue.... s in a k-dimensional space
Euclidean space
Around 300 Before Christ, the Ancient Greece mathematician Euclid undertook a study of relationships among distances and angles, first in a plane and then in space.... . kd-trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbour searches). kd-trees are a special case of BSP trees.
In computer science, a binary tree is a Tree in which each node has at most two child node. Typically the child nodes are called left and right.... in which every node is a k-dimensional point. Every non-leaf node generates a splitting hyperplane
Hyperplane
A hyperplane is a concept in geometry. It is a higher-dimensional generalization of the concepts of a line in the plane and a plane in 3-dimensional space.... that divides the space into two subspaces. Points left to the hyperplane represent the left sub-tree of that node and the points right to the hyperplane by the right sub-tree. The hyperplane direction is chosen in the following way: every node split to sub-trees is associated with one of the k-dimensions, such that the hyperplane is perpendicular to that dimension vector. So, for example, if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right sub tree.
Operations on
kd-trees
Construction
Since there are many possible ways to choose axis-aligned splitting planes, there are many different ways to construct kd-trees. The canonical method of kd-tree construction has the following constraints:
As one moves down the tree, one cycles through the axes used to select the splitting planes. (For example, the root would have an
x-aligned plane, the root's children would both have y-aligned planes, the root's grandchildren would all have z-aligned planes, the next level would have an x-aligned plane, and so on.)
In probability theory and statistics, a median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half.... of the points being put into the subtree, with respect to their coordinates in the axis being used to create the splitting plane. (Note the assumption that we feed the entire set of points into the algorithm up-front.)
This method leads to a balanced kd-tree, in which each leaf node is about the same distance from the root. However, balanced trees are not necessarily optimal for all applications.
Note also that it is not required to select the median point. In that case, the result is simply that there is no guarantee that the tree will be balanced. A simple heuristic to avoid coding a complex linear-time median-finding algorithm nor using an O(n log n) sort is to use sort to find the median of a fixed number of randomly selected points to serve as the cut line. Practically this technique often results in nicely balanced trees.
In mathematics, computing, linguistics and related subjects, an algorithm is a sequence of finite instructions, often used for calculation and data processing.... will construct a balanced kd-tree containing those points.
function kdtree (list of points pointList, int depth)
It is common that points "after" the median include only ones that are greater than or equal to the median. Another approach is to define a "superkey" function that compares the points in other dimensions. Lastly, it may be acceptable to let points equal to the median lie on either side.
Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive.... is as follows:
class Node:pass
def kdtree(pointList, depth=0):
if not pointList:
return
# Select axis based on depth so that axis cycles through all valid values
k = len(pointList[0]) # assumes all points have the same dimension
axis = depth % k
# Sort point list and choose median as pivot element
pointList.sort(key=lambda point: point[axis])
median = len(pointList)/2 # choose median
In Computer science, a predicate that, if true, will remain true throughout a specific sequence of Operation , is called invariant to that sequence.... that for any node, all the nodes in the left subtree are on one side of a splitting plane
Plane (mathematics)
In mathematics, a plane is a curvature surface. Planes can arise as subspaces of some higher dimensional space, as with the walls of a room, or they may enjoy an independent existence in their own right, as in the setting of Euclidean geometry.... , and all the nodes in the right subtree are on the other side. Points that lie on the splitting plane may appear on either side. The splitting plane of a node goes through the point associated with that node (referred to in the code as node.location).
Adding elements
One adds a new point to a kd-tree in the same way as one adds an element to any other search tree
Binary search tree
In computer science, a binary search tree is a binary tree data structurewhich has the following properties:* Each node has a distinct value.... . First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane. Once you get to a leaf node, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new node.
Removing elements
To remove a point from an existing kd-tree, without breaking the invariant, the easiest way is to form the set of all nodes and leaves from the children of the target node, and recreate that part of tree. This differs from regular search trees in that no child can be selected for a "promotion", since the splitting plane for lower-level nodes is not along the required axis for the current tree level.
Balancing
Balancing a kd-tree requires care. Because kd-trees are sorted in multiple dimensions, the tree rotation
Tree rotation
A tree rotation is an operation on a binary search tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down.... technique cannot be used to balance them — this may break the invariant.
Nearest neighbor search
The nearest neighbor (NN) algorithm aims to find the point in the tree which is nearest to a given input point. This search can be done efficiently by using the tree properties to quickly eliminate large portions of the search space.
Searching for a nearest neighbor in a kd-tree proceeds as follows:
Starting with the root node, the algorithm moves down the tree recursively, in the same way that it would if the search point were being inserted (i.e. it goes right or left depending on whether the point is greater or less than the current node in the split dimension).
Once the algorithm reaches a leaf node, it saves that node point as the "current best"
The algorithm unwinds the recursion of the tree, performing the following steps at each node:
If the current node is closer than the current best, then it becomes the current best.
The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane
Hyperplane
A hyperplane is a concept in geometry. It is a higher-dimensional generalization of the concepts of a line in the plane and a plane in 3-dimensional space.... with a hypersphere
Hypersphere
In mathematics, an n-sphere is a generalization of the surface of an ordinary sphere to arbitrary dimension. For any natural number n, an n-sphere of radius r is defined as the set of points in -dimensional Euclidean space which are at distance r from a central point, where the radius r may be any positive real num... around the search node that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the difference between the splitting coordinate and the search point is less than the distance from the search point to the current best.
If the sphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.
If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.
When the algorithm finishes this process for the root node, then the search is complete.
Generally the algorithm uses squared distances for comparison to avoid computing square roots. Additionally, it can save computation by holding the squared current best distance in a variable for comparison.
Finding the nearest point is an O(log N) operation in the case of randomly distributed points if N. Analyses of binary search trees has found that the worst case search time for an k-dimensional KD tree containing M nodes is given by the following equation.
These asymptotic behaviors only apply when N is much greater than the number of dimensions. In very high dimensional spaces, the curse of dimensionality causes the algorithm to need to visit many more branches than in lower dimensional spaces. In particular, when the number of points is only slightly higher than the number of dimensions, the algorithm is only slightly better than a linear search of all of the points.
The algorithm can be extended in several ways by simple modifications. It can provide the k-Nearest Neighbors to a point by maintaining k current bests instead of just one. Branches are only eliminated when they can't have points closer than any of the k current bests.
It can also be converted to an approximiation algorithm to run faster. For example, approximate nearest neighbour searching can be achieved by simply setting an upper bound on the number points to examine in the tree, or by interrupting the search process based upon a real time clock (which may be more appropriate in hardware implementations). Nearest neighbour for points that are in the tree already can be achieved by not updating the refinement for nodes that give zero distance as the result, this has the downside of discarding points that are not unique, but are co-located with the original search point.
Approximate nearest neighbor is useful in real time applications such as robotics due to the significant speed increase gained by not searching for the best point exhaustively. One of its implementations is Best Bin First
Best Bin First
Best Bin First is a computer algorithm which is designed to efficiently find an approximate solution to the Nearest neighbor search in very high dimensional spaces.... .
High-Dimensional Data
kd-trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces. As a general rule, if the dimensionality is D, then number of points in the data, N, should be N >> 2D. Otherwise, when kd-trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search. The problem of finding NN in high-dimensional data is thought to be NP-hard
NP-hard
NP-hard , in computational complexity theory, is a class of problems informally "at least as hard as the hardest problems in NP ." A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial-time Turing reduction to H, i.e.... ., and approximate nearest-neighbour methods are used instead.
In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions.... (n log 2n) time if an O
Big O notation
In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions.... (n log n) sort is used to compute the median at each level. The complexity is O
Big O notation
In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions.... (n log n) if a linear median-finding
Selection algorithm
In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list This includes the cases of finding the minimum, maximum, and median elements.... algorithm such as the one described in Cormen et al is used.
Inserting a new point into a balanced
kd-tree takes O(log n) time.
Removing a point from a balanced
kd-tree takes O(log n) time.
Querying an axis-parallel range in a balanced
kd-tree takes O(n1-1/d +k) time, where k is the number of the reported points, and d the dimension of the kd-tree.
Variations
Instead of points
Instead of points, a kd-tree can also contain rectangle
Rectangle
In geometry, a rectangle is a Closed set planar quadrilateral with four right angles. A rectangle with vertices ABCD would be denoted as .A rectangle with adjacent sides of lengths a and b has area ab and diagonals of equal length .... s or hyperrectangles. A 2D rectangle is considered a 4D object (xlow, xhigh, ylow, yhigh). Thus range search becomes the problem of returning all rectangles intersecting the search rectangle. The tree is constructed the usual way with all the rectangles at the leaves. In an orthogonal range search, the opposite coordinate is used when comparing against the median. For example, if the current level is split along xhigh, we check the xlow coordinate of the search rectangle. If the median is less than the xlow coordinate of the search rectangle, then no rectangle in the left branch can ever intersect with the search rectangle and so can be pruned. Otherwise both branches should be traversed. See also interval tree
Interval tree
In computer science, an interval tree, also called a segment tree or segtree, is an ordered tree data structure data structure to hold intervals.... , which is a 1-dimensional special case.
An implicit kd-tree is a kd-tree defined implicitly above a rectilinear grid. Its splitting Plane s' positions and orientations are not given explicitly but implicitly by some recursive splitting-function defined on the hyperrectangles belonging to the tree's node s....
A min/max kd-tree is a kd tree with two scalar values - a minimum and a maximum - assigned to its nodes. The minimum/maximum of an inner node is equal the minimum/maximum of its children's minima/maxima....
A quadtree is a tree data structure in which each internal node has up to four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions....
An octree is a tree data structure in which each internal node has up to eight children. Octrees are most often used to partition a three dimensional space by recursively subdividing it into eight octants....
A bounding interval hierarchy is a partitioning data structure similar to that of bounding volume hierarchies or kd-trees. Bounding interval hierarchies can be used in high performance Ray tracing and may be especially useful for dynamic scenes....
Nearest neighbor search , also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces....
In computational geometry, Klee's measure problem is the problem of determining how efficiently the measure of a union of rectangular ranges can be computed....
A kd-trie is a spatial data structure for indexing points in k-dimensional space. It is a variation on the Kd-tree that only stores points in the leaf nodes, sometimes in small collections called bins or buckets....