Speech Recognition Research Papers

  •   

    (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding

    Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck

    Proceedings of the IEEE ICASSP (2018)

  •    

    An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model

    Anjuli Kannan, Yonnghui Wu, Patrick Nguyen, Tara N. Sainath, Zhifeng Chen, Rohit Prabhavalkar

    ICASSP (2018)

  •    

    Decoding the auditory brain with canonical component analysis

    Alain de Cheveigné, Daniel D. E. Wong, Giovanni M. Di Liberto, Jens Hjortkjaer, Malcolm Slaney, Edmund Lalor

    NeuroImage (2018)

  •   

    Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models

    Rohit Prabhavalkar, Tara Sainath, Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan

    ICASSP 2018 (to appear)

  •    

    Multilingual Speech Recognition with a Single End-to-End Model

    Shubham Toshniwal, Tara N. Sainath, Ron Weiss, Bo Li, Pedro Moreno, Eugene Weinsten, Kanishka Rao

    ICASSP (2018)

  •    

    ON USING BACKPROPAGATION FOR SPEECH TEXTURE GENERATION AND VOICE CONVERSION

    Jan Chorowski, Ron J. Weiss, Rif A. Saurous, Samy Bengio

    ICASSP (2018)

  •    

    Sound source separation using phase difference and reliable mask selection

    Chanwoo Kim, Anjali Menon, Michiel Bacchiani, Richard M. Stern

    ICASSP (2018) (to appear)

  •    

    Spectral distortion model for training phase-sensitive deep-neural networks for far-field speech recognition

    Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani

    ICASSP 2018 (2018)

  •    

    State-of-the-art Speech Recognition With Sequence-to-Sequence Models

    Chung-Cheng Chiu, Tara Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Katya Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani

    ICASSP (2018) (to appear)

  •    

    A Cascade Architecture for Keyword Spotting on Mobile Devices

    Alexander Gruenstein, Raziel Alvarez, Chris Thornton, Mohammadali Ghodrat

    31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017)

  •    

    A Comparison of Sequence-to-Sequence Models for Speech Recognition

    Rohit Prabhavalkar, Kanishka Rao, Tara Sainath, Bo Li, Leif Johnson, Navdeep Jaitly

    Interspeech 2017, ISCA (2017)

  •   

    A Segmental Framework for Fully-Unsupervised Large-Vocabulary Speech Recognition

    Herman Kamper, Aren Jansen, Sharon Goldwater

    Computer Speech and Language (2017) (to appear)

  •   

    A more general method for pronunciation learning

    Antoine Bruguier, Dan Gnanapragasam, Francoise Beaufays, Kanishka Rao, Leif Johnson

    Interspeech 2017 (2017)

  •    

    Acoustic Modeling for Google Home

    Bo Li, Tara Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean Chin, Khe Chai Sim, Ron J. Weiss, Kevin Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Rick Rose, Matt Shannon

    INTERSPEECH 2017 (2017)

  •    

    An Analysis of "Attention" in Sequence-to-Sequence Models

    Rohit Prabhavalkar, Tara Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly

    Interspeech 2017, ISCA (2017)

  •    

    Approaches for Neural-Network Language Model Adaptation

    Fadi Biadsy, Michael Alexander Nirschl, Min Ma, Shankar Kumar

    Interspeech 2017, Stockholm, Sweden (2017)

  •    

    Areal and Phylogenetic Features for Multilingual Speech Synthesis

    Alexander Gutkin, Richard Sproat

    Proc. of Interspeech 2017, ISCA, August 20–24, 2017, Stockholm, Sweden, pp. 2078-2082

  •    

    Attention-Based Models for Text-Dependent Speaker Verification

    F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan

    (2017)

  •    

    Binaural processing for robust speech recognition of degraded speech

    Anjali Menon, Chanwoo Kim, Umpei Kurokawa, Richard M. Stern

    IEEE Automatic Speech Recognition and Understanding Workshop (2017)

  •    

    Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals

    Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro

    Interpspeech 2017 (2017)

  •    

    Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models

    Chanwoo Kim, Ehsan Variani, Arun Narayanan, Michiel Bacchiani

    arxiv (2017)

  •    

    End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow

    Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani

    Interspeech 2017 (2017)

  •   

    Endpoint detection using grid long short-term memory networks for streaming speech recognition

    Bo Li, Carolina Parada, Gabor Simko, Shuo-yiin Chang, Tara Sainath

    In Proc. Interspeech 2017 (to appear)

  •    

    Generalized End-to-End Loss for Speaker Verification

    Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno

    (2017)

  •    

    Generation of large-scale simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home

    Chanwoo Kim, Ananya Misra, Kean Chin, Thad Hughes, Arun Narayanan, Tara Sainath, Michiel Bacchiani

    interspeech 2017 (2017), pp. 379-383

  •    

    Generative Model-Based Text-to-Speech Synthesis

    Heiga Zen

    MIT (2017)

  •   

    Google's next-generation real-time unit-selection synthesizer using sequence-to-sequence LSTM-based autoencoders

    Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silen, Jakub Vit

    Interspeech (2017)

  •    

    Highway-LSTM and Recurrent Highway Networks for Speech Recognition

    Golan Pundak, Tara Sainath

    Proc. Interspeech 2017, ISCA

  •   

    Human and Machine Hearing: Extracting Meaning from Sound

    Richard F. Lyon

    Cambridge University Press (2017)

  •   

    Improved end-of-query detection for streaming speech recognition

    Carolina Parada, Gabor Simko, Matt Shannon, Shuo-yiin Chang

    Proc. Interspeech 2017 (2017) (to appear)

  •   

    Incoherent idempotent ambisonics rendering

    W. Bastiaan Kleijn, Andrew Allen, Jan Skoglund, Felicia Lim

    2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)

  •   

    Joint Wideband Source Localization and Acquisition Based on a Grid-Shift Approach

    Christos Tzagkarakis, Bastiaan Kleijn, Jan Skoglund

    2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)

  •    

    Keyword Spotting for Google Assistant Using Contextual Speech Recognition

    Assaf Michaely, Carolina Parada, Frank Zhang, Gabor Simko, Petar Aleksic

    ASRU 2017, IEEE

  •    

    Language Modeling in the Era of Abundant Data

    Ciprian Chelba

    AI With the Best online conference. (2017)

  •   

    Multi-Accent Speech Recognition with Hierarchical Grapheme Based Models

    Hasim Sak, Kanishka Rao

    ICASSP 2017 (to appear)

  •    

    Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition

    Tara Sainath, Ron J. Weiss, Kevin Wilson, Bo Li, Arun Narayanan, Ehsan Variani, Michiel Bacchiani, Izhak Shafran, Andrew Senior, Kean Chin, Ananya Misra, Chanwoo Kim

    IEEE /ACM Transactions on Audio, Speech, and Language Processing, vol. 25 (2017), pp. 965 - 979

  •    

    On Lattice Generation for Large Vocabulary Speech Recognition

    David Rybach, Johan Schalkwyk, Michael Riley

    IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan (2017)

  •    

    Optimizing expected word error rate via sampling for speech recognition

    Matt Shannon

    Proc. Interspeech 2017 (2017) (to appear)

  •    

    Parallel WaveNet: Fast High-Fidelity Speech Synthesis

    Aäron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis Carlos Cobo Rus, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alexander Graves, Helen King, Thomas Walters, Dan Belov, Demis Hassabis

    NA, Google Deepmind, NA (2017)

  •  

    Practically Efficient Nonlinear Acoustic Echo Cancellers Using Cascaded Block RLS and FLMS Adaptive Filters

    Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs

    ICASSP (2017)

  •    

    Raw Multichannel Processing Using Deep Neural Networks

    Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani, Bo Li, Ehsan Variani, Izhak Shafran, Andrew Senior, Kean Chin, Ananya Misra, Chanwoo Kim

    New Era for Robust Speech Recognition: Exploiting Deep Learning, Springer (2017)

  •    

    Robust Speech Recognition Based on Binaural Auditory Processing

    Anjali Menon, Chanwoo Kim, Richard M. Stern

    INTERSPEECH 2017 (2017), pp. 3872-3876

  •   

    Robust and low-complexity blind source separation for meeting rooms

    W. Bastiaan Kleijn, Felicia Lim

    Proceedings Fifth Joint Workshop on Hands-free Speech Communication and Microphone Arrays (2017)

  •    

    Sparse Non-negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap

    Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy

    The 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, pp. 2725-2729 (to appear)

  •    

    Speaker Diarization with LSTM

    Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno

    (2017)

  •   

    Streaming Small-Footprint Keyword Spotting Using Sequence-to-Sequence Models

    Yanzhang (Ryan) He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw

    Automatic Speech Recognition and Understanding (ASRU), 2017 IEEE Workshop on

  •   

    Syllable-Based Acoustic Modeling with CTC-SMBR-LSTM

    Zhongdi Qu, Parisa Haghani, Eugene Weinstein, Pedro Moreno

    ASRU 2017

  •    

    Tacotron: Towards End-to-End Speech Synthesis

    Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous

    Interspeech (2017)

  •    

    Trainable Frontend For Robust and Far-Field Keyword Spotting

    Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. Saurous

    Proc. IEEE ICASSP 2017, New Orleans, LA

  •   

    Uncovering Latent Style Factors for Expressive Speech Synthesis

    Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Battenberg, Rob Clark, Rif A. Saurous

    NIPS Workshop on Machine Learning for Audio Signal Processing (ML4Audio) (2017) (to appear)

  •    

    Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages

    Alexander Gutkin

    Proc. of Interspeech 2017, ISCA, August 20–24, Stockholm, Sweden, pp. 2183-2187

  •    

    Wavenet based low rate speech coding

    W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters

    arXiv preprint arXiv:1712.01120 (2017)

  •    

    A subband-based stationary-component suppression method using harmanics and power ratio for reverberant speech recognition

    Byung Joon Cho, Haeyong Kwon, Ji-Won Cho, Chanwoo Kim, Richard M. Stern, Hyung-Min Park

    IEEE SIGNAL PROCESSING LETTERS, vol. 23 (2016), pp. 780-784

  •  

    AN ACOUSTIC KEYSTROKE TRANSIENT CANCELER FOR SPEECH COMMUNICATION TERMINALS USING A SEMI-BLIND ADAPTIVE FILTER MODEL

    Herbert Buchner, Simon Godsill, Jan Skoglund

    ICASSP (2016)

  •    

    AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

    Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley

    NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop (to appear)

  •    

    Automatic Optimization of Data Perturbation Distributions for Multi-Style Training in Speech Recognition

    Mortaza Doulaty, Richard Rose, Olivier Siohan

    Proceedings of the IEEE 2016 Workshop on Spoken Language Technology (SLT2016)

  •  

    BI-MAGNITUDE PROCESSING FRAMEWORK FOR NONLINEAR ACOUSTIC ECHO CANCELLATION ON ANDROID DEVICES

    Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs

    International Workshop on Acoustic Signal Enhancement 2016 (IWAENC2016)

  •    

    Building Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla

    Alexander Gutkin, Linne Ha, Martin Jansche, Oddur Kjartansson, Knot Pipatsrisawat, Richard Sproat

    SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, 09-12 May 2016, Yogyakarta, Indonesia; Procedia Computer Science, Elsevier B.V., pp. 194-200

  •    

    Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling

    Ehsan Variani, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani

    Interspeech 2016 (2016)

  •    

    Contextual prediction models for speech recognition

    Yoni Halpern, Keith Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Baeuml

    Proceedings of Interspeech 2016

  •    

    Cross-lingual projection for class-based language models

    Beat Gfeller, Vlad Schogol, Keith Hall

    ACL2016

  •    

    Directly Modeling Voiced and Unvoiced Components in Speech Waveforms by Neural Networks

    Keiichi Tokuda, Heiga Zen

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2016), pp. 5640-5644

  •   

    Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition

    Austin Waters, Yevgen Chebotar

    Interspeech (2016)

  •    

    Distributed representation and estimation of WFST-based n-gram models

    Cyril Allauzen, Michael Riley, Brian Roark

    Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (StatFSM) (2016), pp. 32-41

  •    

    End-to-End Text-Dependent Speaker Verification

    Georg Heigold, Ignacio Moreno, Samy Bengio, Noam M. Shazeer

    International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)

  •   

    Factored Spatial and Spectral Multichannel Raw Waveform CLDNNs

    Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani

    International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)

  •    

    Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

    Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemysław Szczepaniak

    Proc. Interspeech, San Francisco, CA, USA (2016)

  •   

    Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection

    Ruben Zazo, Tara N. Sainath, Gabor Simko, Carolina Parada

  •   

    Flatstart-CTC: a new acoustic model training procedure for speech recognition

    Andrew Senior, Hasim Sak, Kanishka Rao

    ICASSP 2016

  •  

    GLOBALLY OPTIMIZED LEAST-SQUARES POST-FILTERING FOR MICROPHONE ARRAY SPEECH ENHANCEMENT

    Yiteng (Arden) Huang, Alejandro Luebs, Jan Skoglund, W. Bastiaan Kleijn

    ICASSP (2016)

  •    

    High quality agreement-based semi-supervised training data for acoustic modeling

    Félix de Chaumont Quitry, Asa Oines, Pedro Moreno, Eugene Weinstein

    2016 IEEE Workshop on Spoken Language Technology

  •  

    Learning Compact Recurrent Neural Networks

    Zhiyun Lu, Vikas Sindhwani, Tara Sainath

    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016

  •    

    Learning N-gram Language Models from Uncertain Data

    Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark

    Interspeech (2016)

  •    

    Learning Personalized Pronunciations for Contact Names Recognition

    Tony Bruguier, Fuchun Peng, Francoise Beaufays

    Interspeech 2016 (to appear)

  •    

    Listen, Attend and Spell: A Neural Network for Large Vocabulary Conversational Speech Recognition

    William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

    ICASSP (2016)

  •    

    Lower Frame Rate Neural Network Acoustic Models

    Golan Pundak, Tara Sainath

    Interspeech (2016)

  •    

    Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

    Tara N. Sainath, Bo Li

    Proc. Interspeech, ISCA (2016) (to appear)

  •    

    Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN based Statistical Parametric Speech Synthesis

    Bo Li, Heiga Zen

    Proc. Interspeech, ISCA (2016) (to appear)

  •    

    Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition

    Bo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani

    Proc. Interspeech, ISCA (2016)

  •   

    Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

    Hagen Soltau, Hank Liao, Hasim Sak

    ArXiv e-prints (2016)

  •  

    ON PRE-FILTERING STRATEGIES FOR THE GCC-PHAT ALGORITHM

    Hong-Goo Kang, Michael Graczyk, Jan Skoglund

    International Workshop on Acoustic Signal Enhancement 2016 (IWAENC 2016)

  •   

    On The Compression Of Recurrent Neural Networks With An Application To LVCSR Acoustic Modeling For Embedded Speech Recognition

  • Please, wait while we are validating your browser

    0 thoughts on “Speech Recognition Research Papers

    Leave a Reply

    Your email address will not be published. Required fields are marked *