论文标题
样式的物质:文档级目标内容传输
Substance over Style: Document-Level Targeted Content Transfer
论文作者
论文摘要
现有的语言模型在从头开始写作时表现出色,但是许多现实的场景都需要重写现有文档以适合一组约束。尽管句子级的重写已经进行了充分的研究,但很少的工作挑战了整个文档连贯的挑战。在这项工作中,我们介绍了文档级的目标内容转移的任务,并在食谱领域中介绍了该任务,并将食谱作为文档和饮食限制(例如纯素食或无乳制品)作为目标约束。我们根据生成预训练的语言模型(GPT-2)为此任务提出了一个新型模型,并在大量大致平整的配方对(https://github.com/microsoft/document/document-level-targeted-content-transfer)上训练。自动评估和人类评估都表明,我们的模型通过生成遵守约束的连贯和多样化的重写,同时保持与原始文档的距离,从而超过了现有方法。最后,我们分析了模型的重写,以评估进度,以使语言发电更适合实质而不是风格的约束。
Existing language models excel at writing from scratch, but many real-world scenarios require rewriting an existing document to fit a set of constraints. Although sentence-level rewriting has been fairly well-studied, little work has addressed the challenge of rewriting an entire document coherently. In this work, we introduce the task of document-level targeted content transfer and address it in the recipe domain, with a recipe as the document and a dietary restriction (such as vegan or dairy-free) as the targeted constraint. We propose a novel model for this task based on the generative pre-trained language model (GPT-2) and train on a large number of roughly-aligned recipe pairs (https://github.com/microsoft/document-level-targeted-content-transfer). Both automatic and human evaluations show that our model out-performs existing methods by generating coherent and diverse rewrites that obey the constraint while remaining close to the original document. Finally, we analyze our model's rewrites to assess progress toward the goal of making language generation more attuned to constraints that are substantive rather than stylistic.