学会在不看到射电天文学中检测RFI

论文标题

学会在不看到射电天文学中检测RFI

Learning to detect RFI in radio astronomy without seeing it

论文作者

Mesarcik, Michael, Boonstra, Albert-Jan, Ranguelova, Elena, van Nieuwpoort, Rob V.

论文摘要

射频干扰（RFI）破坏了天文测量，从而影响射电望远镜的性能。为了解决这个问题，已经提出了监督的分割模型作为RFI检测的候选解决方案。但是，由于注释的过高成本，大型标记数据集的不可用，这使得这些解决方案无法使用。为了解决这些缺点，我们将重点放在逆问题上。仅在未经污染的排放中进行培训模型，从而学习将RFI与所有已知的天文信号和系统噪声区分开。我们使用最近的lantent-neighbours（NLN） - 一种算法，在生成自动编码模型的潜在空间中利用重建和潜在距离与最接近的近端进行新颖性检测。使用RFI标志的形式（由经典RFI标志方法生成的形式）从大多数射电天文数据档案中获得的RFI标志形式（由经典的RFI标志方法生成）选择未污染的区域，无需额外的成本。我们在两个独立数据集上评估了性能，一个数据集从HERA望远镜中进行了模拟，另一个由Lofar望远镜进行了实际观察。此外，我们提供了一个小的专家标签的Lofar数据集（即强标标签），以评估我们的和其他方法。使用固定阈值的AUROC，AUPRC和最大F1分数测量性能。对于模拟数据，我们在AUROC中胜过当前的最新时间约为1％，而在HERA数据集中，AUPRC中的最新最新时间优于3％。此外，我们的算法既可以增加AUROC和AUPRC的4％，而Lofar数据集的F1得分性能降低了，而无需任何手动标记。

Radio Frequency Interference (RFI) corrupts astronomical measurements, thus affecting the performance of radio telescopes. To address this problem, supervised segmentation models have been proposed as candidate solutions to RFI detection. However, the unavailability of large labelled datasets, due to the prohibitive cost of annotating, makes these solutions unusable. To solve these shortcomings, we focus on the inverse problem; training models on only uncontaminated emissions thereby learning to discriminate RFI from all known astronomical signals and system noise. We use Nearest-Latent-Neighbours (NLN) - an algorithm that utilises both the reconstructions and latent distances to the nearest-neighbours in the latent space of generative autoencoding models for novelty detection. The uncontaminated regions are selected using weak-labels in the form of RFI flags (generated by classical RFI flagging methods) available from most radio astronomical data archives at no additional cost. We evaluate performance on two independent datasets, one simulated from the HERA telescope and another consisting of real observations from LOFAR telescope. Additionally, we provide a small expert-labelled LOFAR dataset (i.e., strong labels) for evaluation of our and other methods. Performance is measured using AUROC, AUPRC and the maximum F1-score for a fixed threshold. For the simulated data we outperform the current state-of-the-art by approximately 1% in AUROC and 3% in AUPRC for the HERA dataset. Furthermore, our algorithm offers both a 4% increase in AUROC and AUPRC at a cost of a degradation in F1-score performance for the LOFAR dataset, without any manual labelling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题