论文标题

群体自拍照:基于碎片的强大分​​子弦表示

Group SELFIES: A Robust Fragment-Based Molecular String Representation

论文作者

Cheng, Austin, Cai, Andy, Miret, Santiago, Malkomes, Gustavo, Phielipp, Mariano, Aspuru-Guzik, Alán

论文摘要

我们介绍了群体自拍照,这是一种分子字符串表示,利用组令牌代表官能团或整个子结构,同时保持化学鲁棒性保证。分子弦表示,例如微笑和自拍照,是化学语言模型,深层生成模型和进化方法中分子产生和优化的基础。虽然微笑和自拍照利用原子表征,但团体自拍照通过使组令牌能够为代币提供了自拍照保证的基础,从而为表示形式创造了额外的灵活性。此外,集体自拍照中的群体令牌可以利用捕获有意义的化学基序的分子碎片的感应偏见。我们的实验证明了捕获化学基序和灵活性的优势,这表明群体自拍照改善了共同分子数据集的分布学习。进一步的实验还表明,与常规自拍照字符串相比,对组自拍照字符串的随机抽样可提高产生的分子的质量。我们可以在线获得我们的集体自拍照的开源实施,我们希望这将帮助未来的分子生成和优化研究。

We introduce Group SELFIES, a molecular string representation that leverages group tokens to represent functional groups or entire substructures while maintaining chemical robustness guarantees. Molecular string representations, such as SMILES and SELFIES, serve as the basis for molecular generation and optimization in chemical language models, deep generative models, and evolutionary methods. While SMILES and SELFIES leverage atomic representations, Group SELFIES builds on top of the chemical robustness guarantees of SELFIES by enabling group tokens, thereby creating additional flexibility to the representation. Moreover, the group tokens in Group SELFIES can take advantage of inductive biases of molecular fragments that capture meaningful chemical motifs. The advantages of capturing chemical motifs and flexibility are demonstrated in our experiments, which show that Group SELFIES improves distribution learning of common molecular datasets. Further experiments also show that random sampling of Group SELFIES strings improves the quality of generated molecules compared to regular SELFIES strings. Our open-source implementation of Group SELFIES is available online, which we hope will aid future research in molecular generation and optimization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源