Alignnet：一种统一的视听对准方法

论文标题

Alignnet：一种统一的视听对准方法

AlignNet: A Unifying Approach to Audio-Visual Alignment

论文作者

Wang, Jianren, Fang, Zhaoyuan, Zhao, Hang

论文摘要

我们提出了Alignnet，该模型将视频与不一致和不规则未对准的参考音频同步。 Alignnet了解视频和音频之间的每个帧之间的端到端密度对应关系。我们的方法是根据简单且完善的原则设计的：注意力，金字塔处理，翘曲和亲和力功能。与模型一起，我们发布了一个舞蹈数据集Dance50进行培训和评估。关于舞蹈音乐对齐和语音唇线对齐的定性，定量和主观评估结果表明，我们的方法远远超过了最先进的方法。项目视频和代码可在https://jianrenw.github.io/alignnet上找到。

We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments. AlignNet learns the end-to-end dense correspondence between each frame of a video and an audio. Our method is designed according to simple and well-established principles: attention, pyramidal processing, warping, and affinity function. Together with the model, we release a dancing dataset Dance50 for training and evaluation. Qualitative, quantitative and subjective evaluation results on dance-music alignment and speech-lip alignment demonstrate that our method far outperforms the state-of-the-art methods. Project video and code are available at https://jianrenw.github.io/AlignNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题