论文标题
使用高斯过程回归预测有机和无机化合物的第二个病毒系数
Predicting second virial coefficients of organic and inorganic compounds using Gaussian Process Regression
论文作者
论文摘要
我们表明,通过使用直观且可访问的分子特征,可以使用高斯过程回归预测有机和无机化合物的温度依赖性第二病毒系数。特别是,我们基于与分子 - 分子相互作用的表征相关的固有分子特性,拓扑和物理特性的特征的低维表示。该特征用于预测插值方案中的第二个病毒系数,其相对误差$ \ lyssim 1 \%$,并将预测到数据集中每个化合物的训练范围以外的温度,相对误差为2.14 \%。此外,该模型的预测能力扩展到训练过程中未见的有机分子,得出的预测相对误差为2.66 \%。因此,除了坚固之外,当前的高斯过程回归模型对于多种有机和无机化合物也可以扩展。
We show that by using intuitive and accessible molecular features it is possible to predict the temperature-dependent second virial coefficient of organic and inorganic compounds using Gaussian process regression. In particular, we built a low dimensional representation of features based on intrinsic molecular properties, topology and physical properties relevant for the characterization of molecule-molecule interactions. The featurization was used to predict second virial coefficients in the interpolative regime with a relative error $\lesssim 1\% $ and to extrapolate the prediction to temperatures outside of the training range for each compound in the dataset with a relative error of 2.14\%. Additionally, the model's predictive abilities were extended to organic molecules unseen in the training process, yielding a prediction with a relative error of 2.66\%. Therefore, apart from being robust, the present Gaussian process regression model is extensible to a variety of organic and inorganic compounds.