Neural Network Language Modeling with Letter-based Features and Importance Sampling

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Please contact us in case of a broken link from here

Authors Hainan Xu, Daniel Povey, Sanjeev Khudanpur, Shiyin Kang, Yiming Wang, Jian Wang, Ke Li, Xie Chen
Journal/Conference Name ICASSP 2018 4
Paper Category
Paper Abstract In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks. We combine the use of subword features (letter n-grams) and one-hot encoding of frequent words so that the models can handle large vocabularies containing infrequent words. We propose a new objective function that allows for training of unnormalized probabilities. An importance sampling based method is supported to speed up training when the vocabulary is large. Experimental results on five corpora show that Kaldi-RNNLM rivals other recurrent neural network language model toolkits both on performance and training speed.
Date of publication 2018
Code Programming Language Shell

Copyright Researcher 2022