论文标题

Alignnet:一种统一的视听对准方法

AlignNet: A Unifying Approach to Audio-Visual Alignment

论文作者

Wang, Jianren, Fang, Zhaoyuan, Zhao, Hang

论文摘要

我们提出了Alignnet,该模型将视频与不一致和不规则未对准的参考音频同步。 Alignnet了解视频和音频之间的每个帧之间的端到端密度对应关系。我们的方法是根据简单且完善的原则设计的:注意力,金字塔处理,翘曲和亲和力功能。与模型一起,我们发布了一个舞蹈数据集Dance50进行培训和评估。关于舞蹈音乐对齐和语音唇线对齐的定性,定量和主观评估结果表明,我们的方法远远超过了最先进的方法。项目视频和代码可在https://jianrenw.github.io/alignnet上找到。

We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments. AlignNet learns the end-to-end dense correspondence between each frame of a video and an audio. Our method is designed according to simple and well-established principles: attention, pyramidal processing, warping, and affinity function. Together with the model, we release a dancing dataset Dance50 for training and evaluation. Qualitative, quantitative and subjective evaluation results on dance-music alignment and speech-lip alignment demonstrate that our method far outperforms the state-of-the-art methods. Project video and code are available at https://jianrenw.github.io/AlignNet.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源