Semi-Supervised Training in Deep Learning Acoustic Model

We studied the semi-supervised training in a fully connected deep neural network (DNN), unfolded recurrent neural network (RNN), and long short-term memory recurrent neural network (LSTM-RNN) with respect to the transcription quality, the importance data sampling, and the training data amount. We found that DNN, unfolded RNN, and LSTM-RNN are increasingly more sensitive to labeling errors. For example, with the simulated erroneous training transcription at 5%, 10%, or 15% word error rate (WER) level, the semi-supervised DNN yields 2.37%, 4.84%, or 7.46% relative WER increase against the baseline model trained with the perfect transcription; in comparison, the corresponding WER increase is 2.53%, 4.89%, or 8.85% in an unfolded RNN and 4.47%, 9.38%, or 14.01% in an LSTM-RNN. We further found that the importance sampling has similar impact on all three…


Link to Full Article: Semi-Supervised Training in Deep Learning Acoustic Model