论文标题
seq2edits:使用跨度级编辑操作的序列转导
Seq2Edits: Sequence Transduction Using Span-level Edit Operations
论文作者
论文摘要
我们提出了SEQ2Edits,这是一种用于自然语言处理(NLP)任务的序列编辑的开放式摄影方法,在输入和输出文本之间具有高度重叠。在这种方法中,每个序列到序列的转导被表示为一系列编辑操作,在该序列中,每个操作要么用目标令牌代替整个源跨度,要么将其保持不变。我们在五个NLP任务(文本归一化,句子融合,句子拆分和重新启动,简化和语法错误校正)上评估我们的方法,并全面报告竞争结果。对于语法校正,与完整序列模型相比,我们的方法最多可以提高5.2倍的速度,因为推理时间取决于编辑的数量,而不是目标令牌的数量。对于文本归一化,句子融合和语法错误校正,我们的方法通过将每个编辑操作与人类可读标签相关联,可提高解释性。
We propose Seq2Edits, an open-vocabulary approach to sequence editing for natural language processing (NLP) tasks with a high degree of overlap between input and output texts. In this approach, each sequence-to-sequence transduction is represented as a sequence of edit operations, where each operation either replaces an entire source span with target tokens or keeps it unchanged. We evaluate our method on five NLP tasks (text normalization, sentence fusion, sentence splitting & rephrasing, text simplification, and grammatical error correction) and report competitive results across the board. For grammatical error correction, our method speeds up inference by up to 5.2x compared to full sequence models because inference time depends on the number of edits rather than the number of target tokens. For text normalization, sentence fusion, and grammatical error correction, our approach improves explainability by associating each edit operation with a human-readable tag.