高维混合成员模型中的灵活正则估计

论文标题

高维混合成员模型中的灵活正则估计

Flexible Regularized Estimation in High-Dimensional Mixed Membership Models

论文作者

Marco, Nicholas, Şentürk, Damla, Jeste, Shafali, DiStefano, Charlotte, Dickinson, Abigail, Telesca, Donatello

论文摘要

混合成员模型是有限混合模型的扩展，每个观察结果都可以部分属于一个以上的混合物组件。提出了用于高维连续数据的混合成员模型的概率框架，以关注可伸缩性和可解释性。混合成员资格的新型概率表示基于依赖性多元高斯随机矢量的凸组合。在这种情况下，通过张量协方差结构的近似，通过通过收缩先验施加的自适应正则化来确保可伸缩性。有条件的弱后验一致性是在不受约束的模型上建立的，从而允许简单的后验采样方案，同时保留我们模型的许多理论特性。该模型是由两个生物医学案例研究激励的：一项关于自闭症谱系障碍儿童功能性脑成像（ASD）的案例研究，以及乳腺癌组织中基因表达数据的案例研究。这些应用程序强调了聚类分析中每个观察结果的典型假设来自一个均匀的亚组，通常在几种应用中可能是限制的，从而导致对数据特征的不自然解释。

Mixed membership models are an extension of finite mixture models, where each observation can partially belong to more than one mixture component. A probabilistic framework for mixed membership models of high-dimensional continuous data is proposed with a focus on scalability and interpretability. The novel probabilistic representation of mixed membership is based on convex combinations of dependent multivariate Gaussian random vectors. In this setting, scalability is ensured through approximations of a tensor covariance structure through multivariate eigen-approximations with adaptive regularization imposed through shrinkage priors. Conditional weak posterior consistency is established on an unconstrained model, allowing for a simple posterior sampling scheme while keeping many of the desired theoretical properties of our model. The model is motivated by two biomedical case studies: a case study on functional brain imaging of children with autism spectrum disorder (ASD) and a case study on gene expression data from breast cancer tissue. These applications highlight how the typical assumption made in cluster analysis, that each observation comes from one homogeneous subgroup, may often be restrictive in several applications, leading to unnatural interpretations of data features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题