论文标题
具有高维相关结果的双重分布的监督学习和推断
Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes
论文作者
论文摘要
本文提出了一个统一的框架,用于使用划分和相关结果的划分和诱使方法进行监督学习和推理程序。我们提出了可以在完全分布和并行化的计算方案中实现的一般估计器类别。通过在结果和受试者水平上划分数据,使用广泛的监督学习程序将关注的参数从数据块中估算出感兴趣的参数,并将封闭形式估计的封闭量估算值结合到封闭形式的估计方法(估计note to note to note to note formements)(1982)(1982)(1982)(1982)(1982)(1982)(1982)(''将在公共服务器上重新加载的数据。我们通过研究使用固定和不同数据划分数量的组合估计量的渐近行为,为使用具有相关结果的分布式估计量提供了严格的理论理由。仿真说明了所提出方法的有限样本性能,我们提供了一个r套件以易于实施。
This paper presents a unified framework for supervised learning and inference procedures using the divide-and-conquer approach for high-dimensional correlated outcomes. We propose a general class of estimators that can be implemented in a fully distributed and parallelized computational scheme. Modelling, computational and theoretical challenges related to high-dimensional correlated outcomes are overcome by dividing data at both outcome and subject levels, estimating the parameter of interest from blocks of data using a broad class of supervised learning procedures, and combining block estimators in a closed-form meta-estimator asymptotically equivalent to estimates obtained by Hansen (1982)'s generalized method of moments (GMM) that does not require the entire data to be reloaded on a common server. We provide rigorous theoretical justifications for the use of distributed estimators with correlated outcomes by studying the asymptotic behaviour of the combined estimator with fixed and diverging number of data divisions. Simulations illustrate the finite sample performance of the proposed method, and we provide an R package for ease of implementation.