Panahi, 2023 - Google Patents

DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement

Panahi, 2023

View HTML

Document ID: 17675015329314436706
Author: Panahi I
Publication year: 2023
Publication venue: arXiv (Cornell University)

External Links

Cited by

Snippet

Invariance to microphone array configuration is a rare attribute in neural beamformers. Filter- and-sum (FS) methods in this class define the target signal with respect to a reference channel. However, this not only complicates formulation in reverberant conditions but also …

Continue reading at www.academia.edu (HTML) (other versions)

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification

Similar Documents

Publication	Publication Date	Title
Wang et al.	2020	Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR
Li et al.	2022	Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement
Tan et al.	2022	Neural spectrospatial filtering
Kinoshita et al.	2016	A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
Xiao et al.	2016	Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation
Krueger et al.	2010	Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation
Huang et al.	2025	Advances in microphone array processing and multichannel speech enhancement
Xiao et al.	2014	The NTU-ADSC systems for reverberation challenge 2014
Roman et al.	2006	Binaural segregation in multisource reverberant environments
Aroudi et al.	2021	Dbnet: Doa-driven beamforming network for end-to-end reverberant sound source separation
Nakatani et al.	2013	Dominance based integration of spatial and spectral features for speech enhancement
Liu et al.	2021	Inplace gated convolutional recurrent neural network for dual-channel speech enhancement
Subramanian et al.	2019	An investigation of end-to-end multichannel speech recognition for reverberant and mismatch conditions
Dadvar et al.	2019	Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
Song et al.	2021	An integrated multi-channel approach for joint noise reduction and dereverberation
Priyanka et al.	2023	Multi-channel speech enhancement using early and late fusion convolutional neural networks
Kovalyov et al.	2023	Dsenet: Directional signal extraction network for hearing improvement on edge devices
Sainath et al.	2017	Raw multichannel processing using deep neural networks
Kovalyov et al.	2023	Dfsnet: A steerable neural beamformer invariant to microphone array configuration for real-time, low-latency speech enhancement
Yang et al.	2023	Guided speech enhancement network
Meng et al.	2024	Deep Kronecker product beamforming for large-scale microphone arrays
Liu et al.	2022	A new neural beamformer for multi-channel speech separation
CN117121104A (en)	2023-11-24	Estimating optimized masks for processing acquired sound data
Li et al.	2022	Speech enhancement based on binaural sound source localization and cosh measure wiener filtering
Togami et al.	2024	Real-time stereo speech enhancement with spatial-cue preservation based on dual-path structure