论文标题
单一发生的正则表达式的一种耐噪声的可区分学习方法
A Noise-tolerant Differentiable Learning Approach for Single Occurrence Regular Expression with Interleaving
论文作者
论文摘要
我们研究了从可能带有噪声的一组文本字符串中学习单个发生正则表达的问题。 Soire完全支持交织,并涵盖了实践中使用的大部分正则表达式。学习晚会是具有挑战性的,因为它需要大量计算,并且文本字符串通常在实践中包含噪声。以前的大多数研究仅学习受限制的SOIRE,并且对嘈杂的数据不健壮。为了解决这些问题,我们为Soire提出了一种耐噪声的可微分学习方法。我们设计了一个神经网络来模拟Soire匹配,理论上证明了由神经网络学到的一组参数的某些分配(称为忠实的编码)是一对一的,对应于有界尺寸的Soires。基于此通信,我们通过探索最近的信徒编码来解释来自神经网络参数集的目标。实验结果表明,Soiredl的表现优于最先进的方法,尤其是在嘈杂的数据上。
We study the problem of learning a single occurrence regular expression with interleaving (SOIRE) from a set of text strings possibly with noise. SOIRE fully supports interleaving and covers a large portion of regular expressions used in practice. Learning SOIREs is challenging because it requires heavy computation and text strings usually contain noise in practice. Most of the previous studies only learn restricted SOIREs and are not robust on noisy data. To tackle these issues, we propose a noise-tolerant differentiable learning approach SOIREDL for SOIRE. We design a neural network to simulate SOIRE matching and theoretically prove that certain assignments of the set of parameters learnt by the neural network, called faithful encodings, are one-to-one corresponding to SOIREs for a bounded size. Based on this correspondence, we interpret the target SOIRE from an assignment of the set of parameters of the neural network by exploring the nearest faithful encodings. Experimental results show that SOIREDL outperforms the state-of-the-art approaches, especially on noisy data.