Wang et al., 2024 - Google Patents

Watch your mouth: silent speech recognition with depth sensing

Wang et al., 2024

Document ID: 14780443601759146232
Author: Wang X; Su Z; Rekimoto J; Zhang Y
Publication year: 2024
Publication venue: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

External Links

Cited by

Snippet

Silent speech recognition is a promising technology that decodes human speech without requiring audio signals, enabling private human-computer interactions. In this paper, we propose Watch Your Mouth, a novel method that leverages depth sensing to enable …

Continue reading at dl.acm.org (PDF) (other versions)

238000000034 method 0 abstract description 70

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions

Similar Documents

Publication	Publication Date	Title
Wang et al.	2024	Watch your mouth: silent speech recognition with depth sensing
US12452390B2 (en)	2025-10-21	Word flow annotation
US11482134B2 (en)	2022-10-25	Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner
Ong et al.	2005	Automatic sign language analysis: A survey and the future beyond lexical meaning
Luettin	1997	Visual speech and speaker recognition
US12430833B2 (en)	2025-09-30	Realtime AI sign language recognition with avatar
Von Agris et al.	2008	Recent developments in visual sign language recognition
Su et al.	2023	Liplearner: Customizable silent speech interactions on mobile devices
Zhang et al.	2021	Speechin: A smart necklace for silent speech recognition
WO2017112813A1 (en)	2017-06-29	Multi-lingual virtual personal assistant
US11861778B1 (en)	2024-01-02	Apparatus and method for generating a virtual avatar
KR20120120858A (en)	2012-11-02	Service and method for video call, server and terminal thereof
Rathipriya et al.	2024	A comprehensive review of recent advances in deep neural networks for lipreading with sign language recognition
Lim et al.	2025	Spellring: Recognizing continuous fingerspelling in american sign language using a ring
Yau	2024	Video analysis of mouth movement using motion templates for computer-based lip-reading
Zhang et al.	2024	Speech-driven personalized gesture synthetics: Harnessing automatic fuzzy feature inference
Liu et al.	2023	A survey on deep multi-modal learning for body language recognition and generation
Kunhoth et al.	2023	VisualAid+: Assistive system for visually impaired with TinyML enhanced object detection and scene narration
CN111191490A (en)	2020-05-22	Lip reading research method based on Kinect vision
Innocente et al.	2025	Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation.
Cai et al.	2025	SignGlass: First-Person View Comprehensive and Generalizable ASL Translation Using Wearable Glass
Chokchaitam et al.	2025	A System for Detecting and Translating Thai Sign Language Using Image Processing and Artificial Intelligence
CN120877390B (en)	2026-01-06	Sign language translation methods, devices, computer equipment and storage media
Vaishnavi et al.	2025	AI Powered Trifocals for Sign Language Detection and Speech Recognition
Su et al.	2025	Multimodal Silent Speech-based Text Entry with Word-initials Conditioned LLM