论文标题
自行车星报:通过利用大型数据集弥合理论和数据之间的差距
Cycle-StarNet: Bridging the gap between theory and data by leveraging large datasets
论文作者
论文摘要
出色的光谱数据采集的进步使得有必要在有效的数据分析技术方面取得类似的改进。当前用于分析光谱的自动化方法是(a)数据驱动的,它需要先验了解恒星参数和元素丰度,或者基于(b)基于理论合成模型,这些模型易于理论与实践之间的差距。在这项研究中,我们提出了一种混合生成结构域的适应方法,该方法通过将无监督的学习应用于大型光谱调查,将模拟的恒星光谱转化为逼真的光谱。我们将技术应用于R = 22,500和Kurucz合成模型的Apogee H波段光谱。作为概念证明,提出了两个案例研究。首先是合成数据的校准,以与观察结果一致。为此,将合成模型变成了类似于观察结果的光谱,从而减少了理论和观察之间的差距。拟合观察到的光谱显示,在归一化通量中,平均降低$χ_r^2 $从1.97降低到1.22,而平均残差从0.16降低至-0.01。第二个案例研究是鉴定合成建模中缺失光谱线的元素源。模拟数据集用于表明当一个域中缺少吸收线时可以恢复吸收线。该方法可以应用于使用大型数据集的其他字段,目前通过建模准确性受到限制。本研究中使用的代码可在GitHub上公开提供。
The advancements in stellar spectroscopy data acquisition have made it necessary to accomplish similar improvements in efficient data analysis techniques. Current automated methods for analyzing spectra are either (a) data-driven, which requires prior knowledge of stellar parameters and elemental abundances, or (b) based on theoretical synthetic models that are susceptible to the gap between theory and practice. In this study, we present a hybrid generative domain adaptation method that turns simulated stellar spectra into realistic spectra by applying unsupervised learning to large spectroscopic surveys. We apply our technique to the APOGEE H-band spectra at R=22,500 and the Kurucz synthetic models. As a proof of concept, two case studies are presented. The first of which is the calibration of synthetic data to become consistent with observations. To accomplish this, synthetic models are morphed into spectra that resemble observations, thereby reducing the gap between theory and observations. Fitting the observed spectra shows an improved average reduced $χ_R^2$ from 1.97 to 1.22, along with a reduced mean residual from 0.16 to -0.01 in normalized flux. The second case study is the identification of the elemental source of missing spectral lines in the synthetic modelling. A mock dataset is used to show that absorption lines can be recovered when they are absent in one of the domains. This method can be applied to other fields, which use large data sets and are currently limited by modelling accuracy. The code used in this study is made publicly available on github.