【2016学术报告01】Tandem and Hybrid Speech Recognition Systems based on a General ANN

2016年清华大学信息技术研究院系列学术报告01

 

Title: Tandem and Hybrid Speech Recognition Systems based on a General ANN Extension in HTK

 

Speaker: Chao Zhang, Ph.D., Cambridge University

 

Time: 10:00-12:00, 18th, Mar.

 

Place: 1-315, FIT Building

 

Language: English

 

Organizer: Research Institute of Information Technology (RIIT), Tsinghua University

 

 

Biography:

        Chao Zhang received his B.E. and M.S. degrees in 2009 and 2012 respectively, both in computer science and technology from Tsinghua University. He studied on accented Mandarin speech recognition using discrete variable discriminative training and speech attribute detection approaches while studying for his master's degree at Center for Speech and Language Technologies. Since 2012, he has been pursuing his Ph.D. at Cambridge University Engineering Department, working on DNN based speech recognition systems and their joint optimisation, and has created the ANN modules in HTK. Chao Zhang has recieved the best student paper award from both NCMMSC 2011 and ICASSP 2014, a best paper candidate from ASRU 2015, along with other paper awards. He also attended a set of project evaluations and challenges, including iARPA Babel 2013, DARPA BOLT 2014, and ASRU 2015 MGB, and has built either the single most important or the best performance systems.

 

Abstract:

        Since 2010, various kinds of artificial neural network (ANN) models have become increasingly prominent for automatic speech recognition. This talk introduces the ANN functions in the HTK hidden Markov model toolkit, with very general architectures to be used for both acoustic modelling and feature extraction purposes. Many ANN-based speech processing techniques, including discriminative sequence training, speaker adaptation, model stacking, and system joint optimisation, are supported. This ANN extension has recently been released in HTK v3.5. This talk will focus on state-of-the-art deep neural network (DNN) tandem and hybrid systems for conversational telephony speech and multi-genre broadcast transcription. Some related research work, such as tandem and hybrid system joint decoding, parameterised activation functions for speaker independent and dependent modelling, etc., will be included as well.

 

【发布时间:2016-07-07】【浏览次数:140】