AlignNet: A Unifying Approach to Audio-Visual Alignment

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Please contact us in case of a broken link from here

Authors Hang Zhao, Zhaoyuan Fang, Jianren Wang
Journal/Conference Name Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
Paper Category
Paper Abstract We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments. AlignNet learns the end-to-end dense correspondence between each frame of a video and an audio. Our method is designed according to simple and well-established principles attention, pyramidal processing, warping, and affinity function. Together with the model, we release a dancing dataset Dance50 for training and evaluation. Qualitative, quantitative and subjective evaluation results on dance-music alignment and speech-lip alignment demonstrate that our method far outperforms the state-of-the-art methods. Project video and code are available at https//jianrenw.github.io/AlignNet.
Date of publication 2020
Code Programming Language Python
Comment

Copyright Researcher 2022