稀疏的文字生成

论文标题

稀疏的文字生成

Sparse Text Generation

论文作者

Martins, Pedro Henrique, Marinho, Zita, Martins, André F. T.

论文摘要

当前的最新文本生成器建立在强大的语言模型（例如GPT-2）上，从而实现了令人印象深刻的性能。但是，为避免退化文本，它们需要通过温度参数或临时截断技术从改良的软磁性进行采样，例如Top-$ k $或Nucleus采样。这在训练和测试条件之间造成了不匹配。在本文中，我们使用最近引入的Entmax转换来训练和样本，从本地稀疏的语言模型中避免这种不匹配。结果是一个文本生成器，在流利性和一致性，重复较少以及n-gram多样性方面具有良好的性能，更接近人类文本。为了评估我们的模型，我们提出了三个新的指标，以比较稀疏或截断的分布：$ε$ - perplexity，sparsemax得分和詹森 - 香农脱落。人类评估的故事完成和对话生成的实验表明，Entmax采样会导致更具吸引力，连贯的故事和对话。

Current state-of-the-art text generators build on powerful language models such as GPT-2, achieving impressive performance. However, to avoid degenerate text, they require sampling from a modified softmax, via temperature parameters or ad-hoc truncation techniques, as in top-$k$ or nucleus sampling. This creates a mismatch between training and testing conditions. In this paper, we use the recently introduced entmax transformation to train and sample from a natively sparse language model, avoiding this mismatch. The result is a text generator with favorable performance in terms of fluency and consistency, fewer repetitions, and n-gram diversity closer to human text. In order to evaluate our model, we propose three new metrics for comparing sparse or truncated distributions: $ε$-perplexity, sparsemax score, and Jensen-Shannon divergence. Human-evaluated experiments in story completion and dialogue generation show that entmax sampling leads to more engaging and coherent stories and conversations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题