蛋白质进化的剖面混合物模型的参数可识别性

论文标题

蛋白质进化的剖面混合物模型的参数可识别性

Parameter identifiability for a profile mixture model of protein evolution

论文作者

Yourdkhani, Samaneh, Allman, Elizabeth S., Rhodes, John A.

论文摘要

剖面混合模型是蛋白质进化的模型，描述了序列数据，其中假定位点遵循单个进化树上的许多相关替代过程。这些过程部分取决于不同的氨基酸分布或轮廓，在对齐序列中的位点上有所不同。对于任何随机模型的基本问题，必须对基于模型的推断的合理性进行积极回答，是参数是否可以从确定的概率分布中识别。在这里，我们表明，在很可能将其用于经验分析的情况下，配置文件混合模型具有可识别的参数。特别是，对于与9个或更多分类单元相关的树，当配置文件的数量小于74时，树拓扑和所有数值参数均可识别。

A Profile Mixture Model is a model of protein evolution, describing sequence data in which sites are assumed to follow many related substitution processes on a single evolutionary tree. The processes depend in part on different amino acid distributions, or profiles, varying over sites in aligned sequences. A fundamental question for any stochastic model, which must be answered positively to justify model-based inference, is whether the parameters are identifiable from the probability distribution they determine. Here we show that a Profile Mixture Model has identifiable parameters under circumstances in which it is likely to be used for empirical analyses. In particular, for a tree relating 9 or more taxa, both the tree topology and all numerical parameters are generically identifiable when the number of profiles is less than 74.

下载PDF全文

下载文献需遵守相关版权规定

论文标题