Panahi, 2023 - Google Patents
DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech EnhancementPanahi, 2023
View HTML- Document ID
- 17675015329314436706
- Author
- Panahi I
- Publication year
- Publication venue
- arXiv (Cornell University)
External Links
Snippet
Invariance to microphone array configuration is a rare attribute in neural beamformers. Filter- and-sum (FS) methods in this class define the target signal with respect to a reference channel. However, this not only complicates formulation in reverberant conditions but also …
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wang et al. | Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR | |
| Li et al. | Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement | |
| Tan et al. | Neural spectrospatial filtering | |
| Kinoshita et al. | A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research | |
| Xiao et al. | Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation | |
| Krueger et al. | Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation | |
| Huang et al. | Advances in microphone array processing and multichannel speech enhancement | |
| Xiao et al. | The NTU-ADSC systems for reverberation challenge 2014 | |
| Roman et al. | Binaural segregation in multisource reverberant environments | |
| Aroudi et al. | Dbnet: Doa-driven beamforming network for end-to-end reverberant sound source separation | |
| Nakatani et al. | Dominance based integration of spatial and spectral features for speech enhancement | |
| Liu et al. | Inplace gated convolutional recurrent neural network for dual-channel speech enhancement | |
| Subramanian et al. | An investigation of end-to-end multichannel speech recognition for reverberant and mismatch conditions | |
| Dadvar et al. | Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target | |
| Song et al. | An integrated multi-channel approach for joint noise reduction and dereverberation | |
| Priyanka et al. | Multi-channel speech enhancement using early and late fusion convolutional neural networks | |
| Kovalyov et al. | Dsenet: Directional signal extraction network for hearing improvement on edge devices | |
| Sainath et al. | Raw multichannel processing using deep neural networks | |
| Kovalyov et al. | Dfsnet: A steerable neural beamformer invariant to microphone array configuration for real-time, low-latency speech enhancement | |
| Yang et al. | Guided speech enhancement network | |
| Meng et al. | Deep Kronecker product beamforming for large-scale microphone arrays | |
| Liu et al. | A new neural beamformer for multi-channel speech separation | |
| CN117121104A (en) | Estimating optimized masks for processing acquired sound data | |
| Li et al. | Speech enhancement based on binaural sound source localization and cosh measure wiener filtering | |
| Togami et al. | Real-time stereo speech enhancement with spatial-cue preservation based on dual-path structure |