论文标题
分层,旋转等值的神经网络,以选择蛋白质复合物的结构模型
Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes
论文作者
论文摘要
预测多蛋白质复合物的结构是生物化学的巨大挑战,对基础科学和药物发现产生了重大影响。计算结构预测方法通常利用预定义的结构特征将准确的结构模型与较不准确的结构模型区分开。这就提出了一个问题,即是否可以直接从蛋白质复合物的原子坐标中学习精确模型的特征,而没有先前的假设。在这里,我们介绍了一种机器学习方法,该方法直接从所有原子的3D位置学习,以识别蛋白质复合物的准确模型,而无需使用任何预先计算的物理学启发或统计术语。我们的神经网络结构结合了多种成分,可以从包含数万个原子的分子结构中端对端学习:基于点的原子表示,对旋转和翻译,局部卷积和层次结构亚采样操作。当与先前开发的评分功能结合使用时,我们的网络大大改善了大量可能模型中准确的结构模型的识别。我们的网络也可以用来用绝对术语来预测给定结构模型的准确性。我们提供的架构很容易适用于涉及大型原子系统3D结构学习的其他任务。
Predicting the structure of multi-protein complexes is a grand challenge in biochemistry, with major implications for basic science and drug discovery. Computational structure prediction methods generally leverage pre-defined structural features to distinguish accurate structural models from less accurate ones. This raises the question of whether it is possible to learn characteristics of accurate models directly from atomic coordinates of protein complexes, with no prior assumptions. Here we introduce a machine learning method that learns directly from the 3D positions of all atoms to identify accurate models of protein complexes, without using any pre-computed physics-inspired or statistical terms. Our neural network architecture combines multiple ingredients that together enable end-to-end learning from molecular structures containing tens of thousands of atoms: a point-based representation of atoms, equivariance with respect to rotation and translation, local convolutions, and hierarchical subsampling operations. When used in combination with previously developed scoring functions, our network substantially improves the identification of accurate structural models among a large set of possible models. Our network can also be used to predict the accuracy of a given structural model in absolute terms. The architecture we present is readily applicable to other tasks involving learning on 3D structures of large atomic systems.