Title: Tandem and Hybrid Speech Recognition Systems based on a General ANN Extension in HTK
Speaker: Chao Zhang, Ph.D., Cambridge University
Time: 10:00-12:00, 18th, Mar.
Place: 1-315, FIT Building
Organizer: Research Institute of Information Technology (RIIT), Tsinghua University
Chao Zhang received his B.E. and M.S. degrees in 2009 and 2012 respectively, both in computer science and technology from Tsinghua University. He studied on accented Mandarin speech recognition using discrete variable discriminative training and speech attribute detection approaches while studying for his master's degree at Center for Speech and Language Technologies. Since 2012, he has been pursuing his Ph.D. at Cambridge University Engineering Department, working on DNN based speech recognition systems and their joint optimisation, and has created the ANN modules in HTK. Chao Zhang has recieved the best student paper award from both NCMMSC 2011 and ICASSP 2014, a best paper candidate from ASRU 2015, along with other paper awards. He also attended a set of project evaluations and challenges, including iARPA Babel 2013, DARPA BOLT 2014, and ASRU 2015 MGB, and has built either the single most important or the best performance systems.
Since 2010, various kinds of artificial neural network (ANN) models have become increasingly prominent for automatic speech recognition. This talk introduces the ANN functions in the HTK hidden Markov model toolkit, with very general architectures to be used for both acoustic modelling and feature extraction purposes. Many ANN-based speech processing techniques, including discriminative sequence training, speaker adaptation, model stacking, and system joint optimisation, are supported. This ANN extension has recently been released in HTK v3.5. This talk will focus on state-of-the-art deep neural network (DNN) tandem and hybrid systems for conversational telephony speech and multi-genre broadcast transcription. Some related research work, such as tandem and hybrid system joint decoding, parameterised activation functions for speaker independent and dependent modelling, etc., will be included as well.