论文标题
使用蛋白质核心包装的物理特征来区分实际蛋白质和诱饵
Using physical features of protein core packing to distinguish real proteins from decoys
论文作者
论文摘要
始终将真实蛋白质结构与计算产生的模型诱饵区分开的能力尚未解决。区分真实蛋白质结构和诱饵的一种途径是描述指定真实蛋白质的重要物理特征。例如,长期以来,人们一直认为,蛋白质的疏水核心对其稳定性有显着贡献。作为与实际蛋白质结构进行比较的诱饵数据集,我们研究了两年一次的CASP竞争(特别是CASP11、12和13)的提交,其中研究人员试图预测仅知道其氨基酸序列的蛋白质结构。我们的分析表明,许多提交的核心都不概括定义真实蛋白质的特征。特别是,模型结构看起来更密集堆积(由于原子质的重叠率不佳),核心中的残基太少,并且在整个结构中具有不当的疏水残基分布。基于这些观察结果,我们开发了一种深度学习方法,该方法结合了蛋白质核心的关键物理特征,以预测计算模型在不了解目标序列结构的情况下对真实蛋白质结构的概括程度。通过识别蛋白质结构的重要特征,我们的方法能够从CASP竞争中对诱饵进行排名,即使不是比最先进的方法更好,这些方法包含许多其他功能。
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. As a dataset of decoys to compare with real protein structures, we studied submissions to the bi-annual CASP competition (specifically CASP11, 12, and 13), in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence. Our analysis reveals that many of the submissions possess cores that do not recapitulate the features that define real proteins. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a deep learning method, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoys from the CASP competitions equally well, if not better than, state-of-the-art methods that incorporate many additional features.