TransRegex：通过生成和修复的多模式正则表达合成

论文标题

TransRegex：通过生成和修复的多模式正则表达合成

TransRegex: Multi-modal Regular Expression Synthesis by Generate-and-Repair

论文作者

Li, Yeting, Li, Shuaimin, Xu, Zhiwu, Cao, Jialun, Chen, Zixuan, Hu, Yun, Chen, Haiming, Cheung, Shing-Chi

论文摘要

由于正态表达式（Abbrev。Regexes）很难理解和组成，因此自动产生Regexes是一个重要的研究问题。本文介绍了TransRegex，用于自动从自然语言描述和示例中构造Regexes。据我们所知，TransRegex是第一个将基于NLP和示例的正则综合问题视为基于NLP的合成问题的问题。为此，我们介绍了基于NLP的合成和正则修复的新型算法。我们在三个公开可用数据集上使用十种相关的最新工具来评估Transregex。评估结果表明，我们的TransRegex的准确性分别比三个数据集中的基于NLP的方法高17.4％，35.8％和38.9％。此外，与最先进的多模式技术相比，TransRegex可以获得更高的精度，所有三个数据集的精度都提高了10％至30％。评估结果还表明，以更有效的方式使用自然语言和示例。

Since regular expressions (abbrev. regexes) are difficult to understand and compose, automatically generating regexes has been an important research problem. This paper introduces TransRegex, for automatically constructing regexes from both natural language descriptions and examples. To the best of our knowledge, TransRegex is the first to treat the NLP-and-example-based regex synthesis problem as the problem of NLP-based synthesis with regex repair. For this purpose, we present novel algorithms for both NLP-based synthesis and regex repair. We evaluate TransRegex with ten relevant state-of-the-art tools on three publicly available datasets. The evaluation results demonstrate that the accuracy of our TransRegex is 17.4%, 35.8% and 38.9% higher than that of NLP-based approaches on the three datasets, respectively. Furthermore, TransRegex can achieve higher accuracy than the state-of-the-art multi-modal techniques with 10% to 30% higher accuracy on all three datasets. The evaluation results also indicate TransRegex utilizing natural language and examples in a more effective way.

下载PDF全文

下载文献需遵守相关版权规定

论文标题