Wang et al., 2024 - Google Patents
Watch your mouth: silent speech recognition with depth sensingWang et al., 2024
View PDF- Document ID
- 14780443601759146232
- Author
- Wang X
- Su Z
- Rekimoto J
- Zhang Y
- Publication year
- Publication venue
- Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
External Links
Snippet
Silent speech recognition is a promising technology that decodes human speech without requiring audio signals, enabling private human-computer interactions. In this paper, we propose Watch Your Mouth, a novel method that leverages depth sensing to enable …
- 238000000034 method 0 abstract description 70
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wang et al. | Watch your mouth: silent speech recognition with depth sensing | |
| US12452390B2 (en) | Word flow annotation | |
| US11482134B2 (en) | Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner | |
| Ong et al. | Automatic sign language analysis: A survey and the future beyond lexical meaning | |
| Luettin | Visual speech and speaker recognition | |
| US12430833B2 (en) | Realtime AI sign language recognition with avatar | |
| Von Agris et al. | Recent developments in visual sign language recognition | |
| Su et al. | Liplearner: Customizable silent speech interactions on mobile devices | |
| Zhang et al. | Speechin: A smart necklace for silent speech recognition | |
| WO2017112813A1 (en) | Multi-lingual virtual personal assistant | |
| US11861778B1 (en) | Apparatus and method for generating a virtual avatar | |
| KR20120120858A (en) | Service and method for video call, server and terminal thereof | |
| Rathipriya et al. | A comprehensive review of recent advances in deep neural networks for lipreading with sign language recognition | |
| Lim et al. | Spellring: Recognizing continuous fingerspelling in american sign language using a ring | |
| Yau | Video analysis of mouth movement using motion templates for computer-based lip-reading | |
| Zhang et al. | Speech-driven personalized gesture synthetics: Harnessing automatic fuzzy feature inference | |
| Liu et al. | A survey on deep multi-modal learning for body language recognition and generation | |
| Kunhoth et al. | VisualAid+: Assistive system for visually impaired with TinyML enhanced object detection and scene narration | |
| CN111191490A (en) | Lip reading research method based on Kinect vision | |
| Innocente et al. | Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation. | |
| Cai et al. | SignGlass: First-Person View Comprehensive and Generalizable ASL Translation Using Wearable Glass | |
| Chokchaitam et al. | A System for Detecting and Translating Thai Sign Language Using Image Processing and Artificial Intelligence | |
| CN120877390B (en) | Sign language translation methods, devices, computer equipment and storage media | |
| Vaishnavi et al. | AI Powered Trifocals for Sign Language Detection and Speech Recognition | |
| Su et al. | Multimodal Silent Speech-based Text Entry with Word-initials Conditioned LLM |