论文标题
SEMMT:一种基于语义的机器翻译系统的测试方法
SemMT: A Semantic-based Testing Approach for Machine Translation Systems
论文作者
论文摘要
机器翻译在日常生活中有广泛的应用。在关键任务应用程序(例如翻译官方文件)中,不正确的翻译可能会带来不愉快或有时灾难性的后果。这激发了有关机器翻译系统测试方法的最新研究。现有的方法主要依赖于在文本级别(例如Levenshtein距离)或句法水平(例如,语法结构之间的距离)设计的变质关系来确定翻译结果的正确性。但是,这些变质关系没有考虑原始句子和翻译句子是否具有相同的含义(即语义相似性)。因此,在本文中,我们提出了SEMMT,这是基于语义相似性检查的机器翻译系统的自动测试方法。 SEMMT应用往返翻译,并测量原始句子和翻译句子之间的语义相似性。我们的见解是,可以使用正则表达式(或确定性有限自动机)来捕获句子中逻辑和数字约束所表达的语义,其中有效的等价/相似性检查算法可用。利用洞察力,我们提出了三个语义相似性指标,并在SEMMT中实现它们。实验结果表明,与最先进的作品相比,SEMMT可以实现更高的效率,分别提高了准确性和F-评分的21%和23%。我们还探讨了当采用适当的指标组合时可以实现的潜在改进。最后,我们讨论了在往返翻译中找到可疑旅行的解决方案,该旅行可能会在进一步的探索中散发出灯光。
Machine translation has wide applications in daily life. In mission-critical applications such as translating official documents, incorrect translation can have unpleasant or sometimes catastrophic consequences. This motivates recent research on testing methodologies for machine translation systems. Existing methodologies mostly rely on metamorphic relations designed at the textual level (e.g., Levenshtein distance) or syntactic level (e.g., the distance between grammar structures) to determine the correctness of translation results. However, these metamorphic relations do not consider whether the original and translated sentences have the same meaning (i.e., Semantic similarity). Therefore, in this paper, we propose SemMT, an automatic testing approach for machine translation systems based on semantic similarity checking. SemMT applies round-trip translation and measures the semantic similarity between the original and translated sentences. Our insight is that the semantics expressed by the logic and numeric constraint in sentences can be captured using regular expressions (or deterministic finite automata) where efficient equivalence/similarity checking algorithms are available. Leveraging the insight, we propose three semantic similarity metrics and implement them in SemMT. The experiment result reveals SemMT can achieve higher effectiveness compared with state-of-the-art works, achieving an increase of 21% and 23% on accuracy and F-Score, respectively. We also explore potential improvements that can be achieved when proper combinations of metrics are adopted. Finally, we discuss a solution to locate the suspicious trip in round-trip translation, which may shed lights on further exploration.