September 16, 2015
Speaker: Tara Sainath, Google Research
In this talk, I will discuss various efforts in our group at Google towards replacing various parts of the acoustic modeling pipeline with neural networks. First, I will describe a new modeling approach known as Convolutional, Long Short-Term Memory, Deep Neural Networks (CLDNNs), and why this architecture makes sense for speech tasks. Next, I will talk about using CLDNNs for raw-waveform modeling, allowing us to remove front-end log-mel filterbank feature computation. Finally, I will discuss CTC, which allows us to remove the need for a prior alignment and CD states.
Hosted by Colin Raffel.