论文标题

许多核心系统上的高性能多元地理空间统计

High Performance Multivariate Geospatial Statistics on Manycore Systems

论文作者

Salvaña, Mary Lai O., Abdulah, Sameh, Huang, Huang, Ltaief, Hatem, Sun, Ying, Genton, Marc G., Keyes, David E.

论文摘要

建模和推断空间关系以及预测环境数据的缺失值是地理空间统计学家的一些主要任务。这些常规任务是使用多元地理空间模型和Cokriging技术来完成的。后者需要评估昂贵的高斯对数似然函数,这阻碍了大型多元空间数据集的多元地理空间模型的采用。但是,这种大规模的cokriging挑战为地理空间统计社区的超级计算实现提供了肥沃的基础,因为这对于规模计算能力至关重要,以匹配来自广泛使用不同数据收集技术的环境数据的增长。在本文中,我们开发和部署了对并行硬件体系结构的大规模多元空间建模和推断。为了解决矩阵操作的增加复杂性和并行系统中的大量并发性,我们利用基于任务的编程模型来利用低级矩阵近似技术,并使用动态运行时系统安排异步计算任务。所提出的框架提供了高斯对数似然函数的致密和近似计算。它显示了各种计算机系统上的准确性鲁棒性和性能可伸缩性。使用合成数据集和实际数据集,与精确计算相比,低级别矩阵近似显示出更好的性能,同时在参数估计和预测准确性中保留了应用要求。我们还提出了一种新型算法,以评估在线参数估计后的预测准确性。该算法量化了预测性能,并为测量多元空间建模中几种近似技术的效率和准确性提供了基准。

Modeling and inferring spatial relationships and predicting missing values of environmental data are some of the main tasks of geospatial statisticians. These routine tasks are accomplished using multivariate geospatial models and the cokriging technique. The latter requires the evaluation of the expensive Gaussian log-likelihood function, which has impeded the adoption of multivariate geospatial models for large multivariate spatial datasets. However, this large-scale cokriging challenge provides a fertile ground for supercomputing implementations for the geospatial statistics community as it is paramount to scale computational capability to match the growth in environmental data coming from the widespread use of different data collection technologies. In this paper, we develop and deploy large-scale multivariate spatial modeling and inference on parallel hardware architectures. To tackle the increasing complexity in matrix operations and the massive concurrency in parallel systems, we leverage low-rank matrix approximation techniques with task-based programming models and schedule the asynchronous computational tasks using a dynamic runtime system. The proposed framework provides both the dense and the approximated computations of the Gaussian log-likelihood function. It demonstrates accuracy robustness and performance scalability on a variety of computer systems. Using both synthetic and real datasets, the low-rank matrix approximation shows better performance compared to exact computation, while preserving the application requirements in both parameter estimation and prediction accuracy. We also propose a novel algorithm to assess the prediction accuracy after the online parameter estimation. The algorithm quantifies prediction performance and provides a benchmark for measuring the efficiency and accuracy of several approximation techniques in multivariate spatial modeling.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源