Duallip：联合唇阅读和发电的系统

论文标题

Duallip：联合唇阅读和发电的系统

DualLip: A System for Joint Lip Reading and Generation

论文作者

Chen, Weicong, Tan, Xu, Xia, Yingce, Qin, Tao, Wang, Yu, Liu, Tie-Yan

论文摘要

唇部阅读旨在识别说话唇的文本，而唇部生成旨在根据文本综合说话唇，这是说话面部生成的关键组成部分，并且是唇部阅读的双重任务。在本文中，我们开发了Duallip，该系统通过利用任务双重性并使用未标记的文本和唇部视频数据来共同改善唇部阅读和产生。 Duallip的关键想法包括：1）使用唇部生成模型从未标记的文本产生唇部视频，并使用伪对来改善唇部阅读； 2）通过使用唇部阅读模型从未标记的唇部视频中生成文本，并使用伪对来改善唇部生成。我们进一步将杜阿利普（Duallip）扩展到了谈话的面部生成，并提供了两个其他引入的组成部分：唇部发电和文字到语音生成。关于网格和TCD-TIMIT的实验证明了Duallip通过使用未标记的数据来改善唇部阅读，唇部产生和说话面部发电的有效性。具体而言，我们的Duallip系统中只有10％配对数据训练的Duallip系统中的唇部生成模型超过了与整个配对数据训练的性能。在唇读的网格基准上，我们获得了1.16％的字符错误率和2.71％的单词错误率，使用相同数量的配对数据优于最先进的模型。

Lip reading aims to recognize text from talking lip, while lip generation aims to synthesize talking lip according to text, which is a key component in talking face generation and is a dual task of lip reading. In this paper, we develop DualLip, a system that jointly improves lip reading and generation by leveraging the task duality and using unlabeled text and lip video data. The key ideas of the DualLip include: 1) Generate lip video from unlabeled text with a lip generation model, and use the pseudo pairs to improve lip reading; 2) Generate text from unlabeled lip video with a lip reading model, and use the pseudo pairs to improve lip generation. We further extend DualLip to talking face generation with two additionally introduced components: lip to face generation and text to speech generation. Experiments on GRID and TCD-TIMIT demonstrate the effectiveness of DualLip on improving lip reading, lip generation, and talking face generation by utilizing unlabeled data. Specifically, the lip generation model in our DualLip system trained with only10% paired data surpasses the performance of that trained with the whole paired data. And on the GRID benchmark of lip reading, we achieve 1.16% character error rate and 2.71% word error rate, outperforming the state-of-the-art models using the same amount of paired data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题