Hsiao et al., 2015 - Google Patents

Unsupervised adaptation for deep neural network using linear least square method.

Hsiao et al., 2015

Document ID: 764634963486754278
Author: Hsiao R; Ng T; Tsakalidis S; Nguyen L; Schwartz R
Publication year: 2015
Publication venue: Interspeech

External Links

Cited by

Snippet

In this paper, we propose a novel model based adaptation for deep neural networks based on a linear least square method. Our proposed algorithm can perform unsupervised adaptation even if the auto transcripts may have 60-70% of word error rate. We evaluate our …

Continue reading at www.isca-archive.org (PDF) (other versions)

230000004301 light adaptation 0 title abstract description 79

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection

Similar Documents

Publication	Publication Date	Title
US9542948B2 (en)	2017-01-10	Text-dependent speaker identification
Saon et al.	2013	Speaker adaptation of neural network acoustic models using i-vectors
US9401148B2 (en)	2016-07-26	Speaker verification using neural networks
Ravuri et al.	2015	Recurrent neural network and LSTM models for lexical utterance classification.
Ferrer et al.	2015	Study of senone-based deep neural network approaches for spoken language recognition
KR102167719B1 (en)	2020-10-19	Method and apparatus for training language model, method and apparatus for recognizing speech
US9653093B1 (en)	2017-05-16	Generative modeling of speech using neural networks
US20200152179A1 (en)	2020-05-14	Time-frequency convolutional neural network with bottleneck architecture for query-by-example processing
US20150149174A1 (en)	2015-05-28	Differential acoustic model representation and linear transform-based adaptation for efficient user profile update techniques in automatic speech recognition
Tüske et al.	2014	Multilingual MRASTA features for low-resource keyword search and speech recognition systems
CN108766445A (en)	2018-11-06	Method for recognizing sound-groove and system
Kitza et al.	2019	Cumulative adaptation for BLSTM acoustic models
Zhang et al.	2014	Standalone training of context-dependent deep neural network acoustic models
Yu et al.	2014	Deep neural network-hidden markov model hybrid systems
Huang et al.	2016	Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code
Samarakoon et al.	2016	Subspace LHUC for fast adaptation of deep neural network acoustic models
Hartmann et al.	2014	Comparing decoding strategies for subword-based keyword spotting in low-resourced languages.
Hsiao et al.	2013	Discriminative semi-supervised training for keyword search in low resource languages
Juan et al.	2015	Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for Iban
Tsakalidis et al.	2014	The 2013 BBN Vietnamese telephone speech keyword spotting system
Scheffer et al.	2014	Content matching for short duration speaker recognition.
Samarakoon et al.	2015	Learning factorized feature transforms for speaker normalization
Zhang et al.	2017	End-to-end text-independent speaker verification with flexibility in utterance duration
Golik et al.	2015	Multilingual features based keyword search for very low-resource languages.
US8639510B1 (en)	2014-01-28	Acoustic scoring unit implemented on a single FPGA or ASIC