Fuzzy-Clustering-Based Decision Tree Approach for Large Population Speaker Identification
View Researcher's Other CodesMATLAB code for the paper: “Fuzzy-Clustering-Based Decision Tree Approach for Large Population Speaker Identification”.
Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).
Please contact us in case of a broken link from here
Authors | Yakun Hu, Dapeng Wu, and Antonio Nucci |
Journal/Conference Name | IEEE Transactions on Audio, Speech, and Language Processing |
Paper Category | Speaker Recognition |
Paper Abstract | In this paper, we address the problem of large population speaker identification under noisy conditions. Major techniques for speaker identification is based on Mel-Frequency Cepstral Coefficients (MFCC), Gaussian Mixture Model (GMM) and Universal Background Model (UBM) which we call MFCC+GMM and MFCC+GMM+UBM. The approaches are known to perform very well for small population identification under low-noise conditions. However, the increase of population size can cause performance degradation of these schemes under noisy conditions. To mitigate this limitation, we propose a fuzzy-clustering-based decision tree approach. The key idea of our approach is to 1) use a decision tree to hierarchically partition the whole population into groups of small size, and determine which speaker group at the leaf node a speaker under test belongs to, and 2) apply MFCC+GMM to the selected speaker group for speaker identification. The advantage of our approach is that we use features that are independent from MFCC to partition speakers into groups and only apply MFCC+GMM to speaker groups at the leaf level. The key challenge in our design is how to achieve a low error probability of decision-tree-based classification. To address this, we adopt fuzzy clustering in constructing the tree for population partitioning, i.e., at each level, a speaker may belong to multiple groups. Such redundancy increases the probability of classifying a speaker under test into a correct group/node on the tree. Another novelty of this paper is that we use pitch and five vocal source features to construct a six-level decision tree. Experimental results demonstrate that our approach outperforms MFCC+ GMM and MFCC+ GMM+ UBM with higher accuracy and lower complexity for large population identification under additive white Gaussian noise (AWGN) conditions. |
Date of publication | 2013 |
Code Programming Language | MATLAB |
Comment |