与精确的ThingTalk表示形式的几个射击语义解析器

论文标题

与精确的ThingTalk表示形式的几个射击语义解析器

A Few-Shot Semantic Parser for Wizard-of-Oz Dialogues with the Precise ThingTalk Representation

论文作者

Campagna, Giovanni, Semnani, Sina J., Kearns, Ryan, Sato, Lucas Jun Koba, Xu, Silei, Lam, Monica S.

论文摘要

先前试图为巫师（WOZ）对话建立有效的语义解析器的尝试困难，因此很难获得高质量的手动注释训练集。仅基于对话综合的方法是不够的，因为基于州机器的模型产生的对话是现实生活中对话的近似值。此外，以前提出的对话状态表示是模棱两可的，并且缺乏建立有效代理所必需的精确性。本文提出了新的对话表示和样本效率的方法，可以预测WOZ对话中的精确对话状态。我们扩展了ThingTalk表示形式，以捕获代理需要正确响应的所有信息。我们的培训策略是样本效率的：我们将（1）少量数据稀少地采样了完整的对话空间，（2）合成数据涵盖了由简洁的基于状态的对话模型产生的对话的子集空间。完全操作的代理展示了扩展事物Talk语言的完整性，该代理也用于培训数据综合。我们证明了我们的方法对Multiwoz 3.0的有效性，这是ThingTalk中多WOZ 2.1数据集的重新注释。 ThingTalk可以代表98％的测试转弯，而模拟器可以模拟验证集的85％。我们使用我们的策略训练上下文的语义解析器，并在重新注释的测试集中获得79％的转弯确切匹配精度。

Previous attempts to build effective semantic parsers for Wizard-of-Oz (WOZ) conversations suffer from the difficulty in acquiring a high-quality, manually annotated training set. Approaches based only on dialogue synthesis are insufficient, as dialogues generated from state-machine based models are poor approximations of real-life conversations. Furthermore, previously proposed dialogue state representations are ambiguous and lack the precision necessary for building an effective agent. This paper proposes a new dialogue representation and a sample-efficient methodology that can predict precise dialogue states in WOZ conversations. We extended the ThingTalk representation to capture all information an agent needs to respond properly. Our training strategy is sample-efficient: we combine (1) fewshot data sparsely sampling the full dialogue space and (2) synthesized data covering a subset space of dialogues generated by a succinct state-based dialogue model. The completeness of the extended ThingTalk language is demonstrated with a fully operational agent, which is also used in training data synthesis. We demonstrate the effectiveness of our methodology on MultiWOZ 3.0, a reannotation of the MultiWOZ 2.1 dataset in ThingTalk. ThingTalk can represent 98% of the test turns, while the simulator can emulate 85% of the validation set. We train a contextual semantic parser using our strategy, and obtain 79% turn-by-turn exact match accuracy on the reannotated test set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题