Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
[go: Go Back, main page]

https://arxivexplained.com/papers/avatar-forcing-real-time-interactive-head-avatar-generation-for-natural-conversation

\n","updatedAt":"2026-01-05T21:02:12.915Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6814250349998474},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false}},{"id":"695c67252651f86e50228033","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-01-06T01:36:37.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking](https://huggingface.co/papers/2512.09327) (2025)\n* [StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars](https://huggingface.co/papers/2512.22065) (2025)\n* [TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation](https://huggingface.co/papers/2512.20296) (2025)\n* [Towards Seamless Interaction: Causal Turn-Level Modeling of Interactive 3D Conversational Head Dynamics](https://huggingface.co/papers/2512.15340) (2025)\n* [IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer](https://huggingface.co/papers/2511.22167) (2025)\n* [DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model](https://huggingface.co/papers/2512.24408) (2025)\n* [Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length](https://huggingface.co/papers/2512.04677) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-06T01:36:37.322Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.682834804058075},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"695c68f1a9cec4f41209eddf","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-01-06T01:44:17.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv lens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/avatar-forcing-real-time-interactive-head-avatar-generation-for-natural-conversation-5659-07bcf5f4\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXiv lens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/avatar-forcing-real-time-interactive-head-avatar-generation-for-natural-conversation-5659-07bcf5f4

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-01-06T01:44:17.401Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6634332537651062},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.00664","authors":[{"_id":"695b237a832867f253525d70","user":{"_id":"66b57c77778c98d29446c8ec","avatarUrl":"/avatars/c176bb7c072f3093f6a0786c87d384d8.svg","isPro":false,"fullname":"Taekyung Ki","user":"taekyungki","type":"user"},"name":"Taekyung Ki","status":"claimed_verified","statusLastChangedAt":"2026-01-05T08:27:19.100Z","hidden":false},{"_id":"695b237a832867f253525d71","user":{"_id":"63bbf972d8d676a2299cdb44","avatarUrl":"/avatars/366d6ca7a4e19e42d2ec236a38d74ebd.svg","isPro":false,"fullname":"Sangwon","user":"agwmon","type":"user"},"name":"Sangwon Jang","status":"claimed_verified","statusLastChangedAt":"2026-01-29T09:17:58.520Z","hidden":false},{"_id":"695b237a832867f253525d72","user":{"_id":"65e5bd4568234ef5d6decadc","avatarUrl":"/avatars/c41095a946c0176b949c0b3566136c05.svg","isPro":false,"fullname":"Jaehyeong Jo","user":"harryjo97","type":"user"},"name":"Jaehyeong Jo","status":"claimed_verified","statusLastChangedAt":"2026-01-27T16:23:06.813Z","hidden":false},{"_id":"695b237a832867f253525d73","user":{"_id":"652066649004117947e46ed6","avatarUrl":"/avatars/972c97df6f26d2c3d6ce71ec579984bb.svg","isPro":false,"fullname":"Jaehong Yoon","user":"jaehong31","type":"user"},"name":"Jaehong Yoon","status":"claimed_verified","statusLastChangedAt":"2026-01-05T08:27:17.228Z","hidden":false},{"_id":"695b237a832867f253525d74","name":"Sung Ju Hwang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6039478ab3ecf716b1a5fd4d/OjpAmq7fuwGa-ZxL3KbSY.mp4"],"publishedAt":"2026-01-02T11:58:48.000Z","submittedOnDailyAt":"2026-01-05T00:05:44.498Z","title":"Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we introduce a direct preference optimization method that leverages synthetic losing samples constructed by dropping user conditions, enabling label-free learning of expressive interaction. Experimental results demonstrate that our framework enables real-time interaction with low latency (approximately 500ms), achieving 6.8X speedup compared to the baseline, and produces reactive and expressive avatar motion, which is preferred over 80% against the baseline.","upvotes":56,"discussionId":"695b237a832867f253525d75","projectPage":"https://taekyungki.github.io/AvatarForcing/","githubRepo":"https://github.com/TaekyungKi/AvatarForcing","githubRepoAddedBy":"user","ai_summary":"Avatar Forcing framework enables real-time interactive head avatar generation with low latency and expressive motion through diffusion forcing and label-free preference optimization.","ai_keywords":["diffusion forcing","real-time interaction","multimodal inputs","audio","motion","causal constraints","synthetic losing samples","direct preference optimization","interactive head avatar generation"],"githubStars":251},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6312c303b843665ddc6d63ae","avatarUrl":"/avatars/a698a1ee52b5939a5e1cec9562568040.svg","isPro":false,"fullname":"Kim","user":"SungIk","type":"user"},{"_id":"6507fbecffc738079ca592bf","avatarUrl":"/avatars/1cb0f39ac6dc2dba2292846a8d7746da.svg","isPro":false,"fullname":"Ming Chen","user":"ChenMing-thu14","type":"user"},{"_id":"66b57c77778c98d29446c8ec","avatarUrl":"/avatars/c176bb7c072f3093f6a0786c87d384d8.svg","isPro":false,"fullname":"Taekyung Ki","user":"taekyungki","type":"user"},{"_id":"65e5bd4568234ef5d6decadc","avatarUrl":"/avatars/c41095a946c0176b949c0b3566136c05.svg","isPro":false,"fullname":"Jaehyeong Jo","user":"harryjo97","type":"user"},{"_id":"652066649004117947e46ed6","avatarUrl":"/avatars/972c97df6f26d2c3d6ce71ec579984bb.svg","isPro":false,"fullname":"Jaehong Yoon","user":"jaehong31","type":"user"},{"_id":"64b5457af249713053c736c5","avatarUrl":"/avatars/84cd17e11f20aee404f7ffadf659cd6f.svg","isPro":false,"fullname":"Yukyeong Lee","user":"leee99","type":"user"},{"_id":"65f06c8356cb8a32b41baf83","avatarUrl":"/avatars/ac66cc63f3abade4a859f7bf9357682a.svg","isPro":false,"fullname":"Jiongdao Jin","user":"jiongdao","type":"user"},{"_id":"65caff7e024c6320f4384130","avatarUrl":"/avatars/67f7c20372bfcc0c1554fb8c9284653a.svg","isPro":false,"fullname":"Suji Kim","user":"SujiKim","type":"user"},{"_id":"65b7c0eb8237bf70e416355d","avatarUrl":"/avatars/191540c8fe16329969882f0403de612e.svg","isPro":false,"fullname":"Degwie","user":"Sung10230","type":"user"},{"_id":"65633c5e84a9fbe322f87d81","avatarUrl":"/avatars/7233a555b43c669847a950ce5697c92c.svg","isPro":false,"fullname":"DongkiKim","user":"DongkiKim","type":"user"},{"_id":"666086f24a197fe2f97539be","avatarUrl":"/avatars/f57b9bf277e029bb479d651b9347a8bd.svg","isPro":false,"fullname":"Dohyeon Kim","user":"Dohyeon1","type":"user"},{"_id":"66339dbf143f209fe1de6fe7","avatarUrl":"/avatars/25db7821f92fc149e7ac90017acb231b.svg","isPro":false,"fullname":"Silvia Zhang","user":"RealSilvia","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Papers
arxiv:2601.00664

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Published on Jan 2
ยท Submitted by
taesiri
on Jan 5
#3 Paper of the day
Authors:

Abstract

Avatar Forcing framework enables real-time interactive head avatar generation with low latency and expressive motion through diffusion forcing and label-free preference optimization.

AI-generated summary

Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we introduce a direct preference optimization method that leverages synthetic losing samples constructed by dropping user conditions, enabling label-free learning of expressive interaction. Experimental results demonstrate that our framework enables real-time interaction with low latency (approximately 500ms), achieving 6.8X speedup compared to the baseline, and produces reactive and expressive avatar motion, which is preferred over 80% against the baseline.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXiv lens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/avatar-forcing-real-time-interactive-head-avatar-generation-for-natural-conversation-5659-07bcf5f4

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.00664 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.00664 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.00664 in a Space README.md to link it from this page.

Collections including this paper 8