Hsiao et al., 2015 - Google Patents
Unsupervised adaptation for deep neural network using linear least square method.Hsiao et al., 2015
View PDF- Document ID
- 764634963486754278
- Author
- Hsiao R
- Ng T
- Tsakalidis S
- Nguyen L
- Schwartz R
- Publication year
- Publication venue
- Interspeech
External Links
Snippet
In this paper, we propose a novel model based adaptation for deep neural networks based on a linear least square method. Our proposed algorithm can perform unsupervised adaptation even if the auto transcripts may have 60-70% of word error rate. We evaluate our …
- 230000004301 light adaptation 0 title abstract description 79
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9542948B2 (en) | Text-dependent speaker identification | |
| Saon et al. | Speaker adaptation of neural network acoustic models using i-vectors | |
| US9401148B2 (en) | Speaker verification using neural networks | |
| Ravuri et al. | Recurrent neural network and LSTM models for lexical utterance classification. | |
| Ferrer et al. | Study of senone-based deep neural network approaches for spoken language recognition | |
| KR102167719B1 (en) | Method and apparatus for training language model, method and apparatus for recognizing speech | |
| US9653093B1 (en) | Generative modeling of speech using neural networks | |
| US20200152179A1 (en) | Time-frequency convolutional neural network with bottleneck architecture for query-by-example processing | |
| US20150149174A1 (en) | Differential acoustic model representation and linear transform-based adaptation for efficient user profile update techniques in automatic speech recognition | |
| Tüske et al. | Multilingual MRASTA features for low-resource keyword search and speech recognition systems | |
| CN108766445A (en) | Method for recognizing sound-groove and system | |
| Kitza et al. | Cumulative adaptation for BLSTM acoustic models | |
| Zhang et al. | Standalone training of context-dependent deep neural network acoustic models | |
| Yu et al. | Deep neural network-hidden markov model hybrid systems | |
| Huang et al. | Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code | |
| Samarakoon et al. | Subspace LHUC for fast adaptation of deep neural network acoustic models | |
| Hartmann et al. | Comparing decoding strategies for subword-based keyword spotting in low-resourced languages. | |
| Hsiao et al. | Discriminative semi-supervised training for keyword search in low resource languages | |
| Juan et al. | Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for Iban | |
| Tsakalidis et al. | The 2013 BBN Vietnamese telephone speech keyword spotting system | |
| Scheffer et al. | Content matching for short duration speaker recognition. | |
| Samarakoon et al. | Learning factorized feature transforms for speaker normalization | |
| Zhang et al. | End-to-end text-independent speaker verification with flexibility in utterance duration | |
| Golik et al. | Multilingual features based keyword search for very low-resource languages. | |
| US8639510B1 (en) | Acoustic scoring unit implemented on a single FPGA or ASIC |