Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Hsiao et al., 2015 - Google Patents
[go: Go Back, main page]

Hsiao et al., 2015 - Google Patents

Unsupervised adaptation for deep neural network using linear least square method.

Hsiao et al., 2015

View PDF
Document ID
764634963486754278
Author
Hsiao R
Ng T
Tsakalidis S
Nguyen L
Schwartz R
Publication year
Publication venue
Interspeech

External Links

Snippet

In this paper, we propose a novel model based adaptation for deep neural networks based on a linear least square method. Our proposed algorithm can perform unsupervised adaptation even if the auto transcripts may have 60-70% of word error rate. We evaluate our …
Continue reading at www.isca-archive.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection

Similar Documents

Publication Publication Date Title
US9542948B2 (en) Text-dependent speaker identification
Saon et al. Speaker adaptation of neural network acoustic models using i-vectors
US9401148B2 (en) Speaker verification using neural networks
Ravuri et al. Recurrent neural network and LSTM models for lexical utterance classification.
Ferrer et al. Study of senone-based deep neural network approaches for spoken language recognition
KR102167719B1 (en) Method and apparatus for training language model, method and apparatus for recognizing speech
US9653093B1 (en) Generative modeling of speech using neural networks
US20200152179A1 (en) Time-frequency convolutional neural network with bottleneck architecture for query-by-example processing
US20150149174A1 (en) Differential acoustic model representation and linear transform-based adaptation for efficient user profile update techniques in automatic speech recognition
Tüske et al. Multilingual MRASTA features for low-resource keyword search and speech recognition systems
CN108766445A (en) Method for recognizing sound-groove and system
Kitza et al. Cumulative adaptation for BLSTM acoustic models
Zhang et al. Standalone training of context-dependent deep neural network acoustic models
Yu et al. Deep neural network-hidden markov model hybrid systems
Huang et al. Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code
Samarakoon et al. Subspace LHUC for fast adaptation of deep neural network acoustic models
Hartmann et al. Comparing decoding strategies for subword-based keyword spotting in low-resourced languages.
Hsiao et al. Discriminative semi-supervised training for keyword search in low resource languages
Juan et al. Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for Iban
Tsakalidis et al. The 2013 BBN Vietnamese telephone speech keyword spotting system
Scheffer et al. Content matching for short duration speaker recognition.
Samarakoon et al. Learning factorized feature transforms for speaker normalization
Zhang et al. End-to-end text-independent speaker verification with flexibility in utterance duration
Golik et al. Multilingual features based keyword search for very low-resource languages.
US8639510B1 (en) Acoustic scoring unit implemented on a single FPGA or ASIC