论文标题
稀疏的文字生成
Sparse Text Generation
论文作者
论文摘要
当前的最新文本生成器建立在强大的语言模型(例如GPT-2)上,从而实现了令人印象深刻的性能。但是,为避免退化文本,它们需要通过温度参数或临时截断技术从改良的软磁性进行采样,例如Top-$ k $或Nucleus采样。这在训练和测试条件之间造成了不匹配。在本文中,我们使用最近引入的Entmax转换来训练和样本,从本地稀疏的语言模型中避免这种不匹配。结果是一个文本生成器,在流利性和一致性,重复较少以及n-gram多样性方面具有良好的性能,更接近人类文本。为了评估我们的模型,我们提出了三个新的指标,以比较稀疏或截断的分布:$ε$ - perplexity,sparsemax得分和詹森 - 香农脱落。人类评估的故事完成和对话生成的实验表明,Entmax采样会导致更具吸引力,连贯的故事和对话。
Current state-of-the-art text generators build on powerful language models such as GPT-2, achieving impressive performance. However, to avoid degenerate text, they require sampling from a modified softmax, via temperature parameters or ad-hoc truncation techniques, as in top-$k$ or nucleus sampling. This creates a mismatch between training and testing conditions. In this paper, we use the recently introduced entmax transformation to train and sample from a natively sparse language model, avoiding this mismatch. The result is a text generator with favorable performance in terms of fluency and consistency, fewer repetitions, and n-gram diversity closer to human text. In order to evaluate our model, we propose three new metrics for comparing sparse or truncated distributions: $ε$-perplexity, sparsemax score, and Jensen-Shannon divergence. Human-evaluated experiments in story completion and dialogue generation show that entmax sampling leads to more engaging and coherent stories and conversations.