论文标题

与对话的无监督评估交互式对话框

Unsupervised Evaluation of Interactive Dialog with DialoGPT

论文作者

Mehri, Shikib, Eskenazi, Maxine

论文摘要

为开放域对话研究定义有意义且可解释的自动评估指标很重要。标准语言产生指标已被证明对话框无效。本文介绍了Fed Metric(对话框的细粒度评估),这是一种自动评估指标,使用对话框,而无需进行任何微调或监督。它还介绍了FED数据集,该数据集是通过注释一组人类系统和人类对话,并具有18个细粒度的对话品质。美联储度量(1)不依赖基本真相的响应,(2)不需要培训数据,(3)在回合和整个对话级别上都衡量细粒度的对话质量。美联储在两个层面上都与人类判断力相关。

It is important to define meaningful and interpretable automatic evaluation metrics for open-domain dialog research. Standard language generation metrics have been shown to be ineffective for dialog. This paper introduces the FED metric (fine-grained evaluation of dialog), an automatic evaluation metric which uses DialoGPT, without any fine-tuning or supervision. It also introduces the FED dataset which is constructed by annotating a set of human-system and human-human conversations with eighteen fine-grained dialog qualities. The FED metric (1) does not rely on a ground-truth response, (2) does not require training data and (3) measures fine-grained dialog qualities at both the turn and whole dialog levels. FED attains moderate to strong correlation with human judgement at both levels.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源