论文标题
蛋白质进化的剖面混合物模型的参数可识别性
Parameter identifiability for a profile mixture model of protein evolution
论文作者
论文摘要
剖面混合模型是蛋白质进化的模型,描述了序列数据,其中假定位点遵循单个进化树上的许多相关替代过程。这些过程部分取决于不同的氨基酸分布或轮廓,在对齐序列中的位点上有所不同。对于任何随机模型的基本问题,必须对基于模型的推断的合理性进行积极回答,是参数是否可以从确定的概率分布中识别。在这里,我们表明,在很可能将其用于经验分析的情况下,配置文件混合模型具有可识别的参数。特别是,对于与9个或更多分类单元相关的树,当配置文件的数量小于74时,树拓扑和所有数值参数均可识别。
A Profile Mixture Model is a model of protein evolution, describing sequence data in which sites are assumed to follow many related substitution processes on a single evolutionary tree. The processes depend in part on different amino acid distributions, or profiles, varying over sites in aligned sequences. A fundamental question for any stochastic model, which must be answered positively to justify model-based inference, is whether the parameters are identifiable from the probability distribution they determine. Here we show that a Profile Mixture Model has identifiable parameters under circumstances in which it is likely to be used for empirical analyses. In particular, for a tree relating 9 or more taxa, both the tree topology and all numerical parameters are generically identifiable when the number of profiles is less than 74.