K-d Tree

Subscribe Send me a message home page tags


In this post we provide a brief description of K-d tree. K-d tree is a geometric data structure that provides efficient support for the following queries

Related Readings


There are two common ways to implement the K-d tree. The main difference is the choice of the separator. We could use the point to

represent the separator line, which implies the node in the K-d tree is the point in the data set. Or we could use nodes in K-d tree to represent the separator line directly. The two approaches are illustrated in the diagram below:


To construct the tree, we first need to sort points in each dimension. Soring takes \(O(nlogn)\) time and there are \(d\) dimensions so overall it takes \(O(dnlogn)\) to sort points. Then we choose an axis and split the tree evenly. One way to choose the axis is the round-robin approach. Suppose we use x-axis to split the tree. As we can see in the diagram below, we will have two groups of the points: pink group and yellow group. Because we use x-axis to split the tree, the points in the same group are grouped automatically in the x row. It's not the case for the y row. Therefore, we need to regroup points for other dimensions. Then we can process the subtree recursively. The regrouping process takes \(O(dn)\) time. We have the following formula for time complexity

$$ T(n,d) \leq 2T(n/2, d) + O(dn) $$

which gives \(T(n,d) \approx O(dnlogn)\).


Application: Nearest Neighbor Search

There are two components in the nearest neighbor search algorithm:


On average, the time complexity of nearest neighbot search is O(log N) but the worst case is N even if K-d tree is balanced.

----- END -----

Welcome to join reddit self-learning community.
Send me a message Subscribe to blog updates

Want some fun stuff?