论文标题

可扩展的$ K $ -D树用于分布式数据

Scalable $k$-d trees for distributed data

论文作者

Chakravorty, Aritra, Cleveland, William S., Wolfe, Patrick J.

论文摘要

被称为$ k $ d树的数据结构在科学计算中具有许多应用,尤其是在现代统计和数据科学领域,例如决策树,聚类,最近的邻居搜索,本地回归等等的范围搜索。在本文中,我们提出了一种可扩展的机制来构造$ k $ d树作为分布式数据,基于数据的每个递归细分的近似值。我们使用这种方法提供了近似质量的理论保证,以及一项模拟研究,量化了我们在实践中提出的方法的准确性和可扩展性。

Data structures known as $k$-d trees have numerous applications in scientific computing, particularly in areas of modern statistics and data science such as range search in decision trees, clustering, nearest neighbors search, local regression, and so forth. In this article we present a scalable mechanism to construct $k$-d trees for distributed data, based on approximating medians for each recursive subdivision of the data. We provide theoretical guarantees of the quality of approximation using this approach, along with a simulation study quantifying the accuracy and scalability of our proposed approach in practice.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源