Speech-VGG: A deep feature extractor for speech processing

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Please contact us in case of a broken link from here

Authors Milos Cernak, Pierre Beckmann, Hugues Saltini, Mikolaj Kegler
Journal/Conference Name arXiv preprint
Paper Category
Paper Abstract Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In particular, readily available models pre-trained on large datasets are key for the efficient transfer of knowledge. They can be applied as feature extractors for data preprocessing, fine-tuned to perform a variety of tasks, or used for computing feature losses in the training of deep learning systems. While applications of transfer learning are common in the fields of computer vision and natural language processing, audio- and speech processing are surprisingly lacking readily available and transferable models. Here, we introduce speechVGG, a flexible, transferable feature extractor tailored for integration with deep learning frameworks for speech processing. Our transferable model adopts the classic VGG-16 architecture and is trained on a spoken word classification task. We demonstrate the application of the pre-trained model in four speech processing tasks, including speech enhancement, language identification, speech, noise and music classification, and speaker identification. Each time, we compare the performance of our approach to existing baselines. Our results confirm that the representation of natural speech captured using speechVGG is transferable and generalizable across various speech processing problems and datasets. Notably, relatively simple applications of our pre-trained model are capable of achieving competitive results.
Date of publication 2019
Code Programming Language Multiple

Copyright Researcher 2022