论文标题

了解生成的Corpora的属性

Understanding the Properties of Generated Corpora

论文作者

Zwerdling, Naama, Shlomov, Segev, Goldbraich, Esther, Kour, George, Carmeli, Boaz, Tepper, Naama, Ronen, Inbal, Zabershinsky, Vitaly, Anaby-Tavor, Ateret

论文摘要

文本生成模型已成为许多研究任务的焦点,尤其是句子语料库的生成。但是,了解自动生成的文本语料库的属性仍然具有挑战性。我们建议一组检查生成文本语料库的属性的工具。将这些工具应用于各种生成的语料库中,使我们能够对生成模型的属性获得新的见解。作为我们特征过程的一部分,我们发现了两种领先的生成技术产生的语料库存在显着差异。

Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora. However, understanding the properties of an automatically generated text corpus remains challenging. We propose a set of tools that examine the properties of generated text corpora. Applying these tools on various generated corpora allowed us to gain new insights into the properties of the generative models. As part of our characterization process, we found remarkable differences in the corpora generated by two leading generative technologies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源