Notes on nonnegative tensor factorization of the spectrogram for audio source separation~: statistical insights and towards self-clustering of the spatial cues

View Researcher II's Other Codes

Disclaimer: “The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).”

Please contact us in case of a broken link from here

Authors C. FĂ©votte & A. Ozerov
Journal/Conference Name 7th International Symposium on Computer Music Modeling and Retrieval (CMMR)
Paper Category
Paper Abstract Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.
Date of publication 2010
Code Programming Language MATLAB
Comment

Copyright Researcher II 2021