论文标题
嵌套数据的贝叶斯非参数分析的常见原子模型
A Common Atom Model for the Bayesian Nonparametric Analysis of Nested Data
论文作者
论文摘要
使用高维数据进行有针对性的治疗干预措施需要新的方法来表征特定人群亚组之间观察到的异质性。特别是,需要进行部分交换数据的模型来推断嵌套数据集,其中假定观测值以不同的单位组织,并且需要一些信息来学习单位的独特功能。在此手稿中,我们提出了一个嵌套的共同原子模型(CAM),该模型特别适合分析嵌套数据集,其中预期单元的分布仅在每个单元采样的观测值的一小部分中有所不同。所提出的CAM允许在分布和观察水平上进行两层聚类,并通过使用计算高效的嵌套切片算法来适应可扩展的后验推断。我们进一步讨论了如何扩展所提出的建模框架以处理离散测量结果,并从饮食交换研究中对真实微生物组数据集进行后验推断,以研究肠道微生物群组成的变化如何与不同的饮食习惯有关。我们进一步研究了通过模拟研究捕获人群中真正的分布结构的模型的性能。
The use of high-dimensional data for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested Common Atoms Model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice-sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.