Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Efficient Autoregressive Video Diffusion with Dummy Head
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-06T01:38:43.984Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6594904661178589},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"6985e56dc3882063e8758fdc","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-06T12:58:21.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper πŸ‘‰ https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper πŸ‘‰ https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-06T12:58:21.808Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6094051599502563},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}},{"id":"69874e46351583ce5d32c97f","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-07T14:37:58.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper πŸ‘‰ https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper πŸ‘‰ https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-07T14:37:58.210Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6094051599502563},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.20499","authors":[{"_id":"69844621e34659da7e1f50d5","user":{"_id":"67d6bb22eab66ce9cb4e3662","avatarUrl":"/avatars/814704eef9e6907d9d4ab407b605566b.svg","isPro":false,"fullname":"Hang Guo","user":"HangGuo","type":"user"},"name":"Hang Guo","status":"claimed_verified","statusLastChangedAt":"2026-02-05T10:52:07.030Z","hidden":false},{"_id":"69844621e34659da7e1f50d6","name":"Zhaoyang Jia","hidden":false},{"_id":"69844621e34659da7e1f50d7","name":"Jiahao Li","hidden":false},{"_id":"69844621e34659da7e1f50d8","name":"Bin Li","hidden":false},{"_id":"69844621e34659da7e1f50d9","name":"Yuanhao Cai","hidden":false},{"_id":"69844621e34659da7e1f50da","name":"Jiangshan Wang","hidden":false},{"_id":"69844621e34659da7e1f50db","name":"Yawei Li","hidden":false},{"_id":"69844621e34659da7e1f50dc","name":"Yan Lu","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/67d6bb22eab66ce9cb4e3662/Rv8QZ2iY4ipyUaTkQytH8.mp4"],"publishedAt":"2026-01-28T11:20:43.000Z","submittedOnDailyAt":"2026-02-05T05:03:59.115Z","title":"Efficient Autoregressive Video Diffusion with Dummy Head","submittedOnDailyBy":{"_id":"67d6bb22eab66ce9cb4e3662","avatarUrl":"/avatars/814704eef9e6907d9d4ab407b605566b.svg","isPro":false,"fullname":"Hang Guo","user":"HangGuo","type":"user"},"summary":"The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline, supporting video generation at 24.3 FPS with less than 0.5% quality drop. Project page is available at https://csguoh.github.io/project/DummyForcing/.","upvotes":8,"discussionId":"69844621e34659da7e1f50dd","projectPage":"https://csguoh.github.io/project/DummyForcing/","githubRepo":"https://github.com/csguoh/DummyForcing","githubRepoAddedBy":"user","ai_summary":"Autoregressive video diffusion models suffer from inefficient attention mechanisms that underutilize historical frames, but a new method called Dummy Forcing improves efficiency through heterogeneous memory allocation and dynamic head programming while maintaining quality.","ai_keywords":["autoregressive video diffusion model","multi-head self-attention","causal modeling","iterative denoising","KV caches","heterogeneous memory allocation","dynamic head programming","context packing","cache compression"],"githubStars":50,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67d6bb22eab66ce9cb4e3662","avatarUrl":"/avatars/814704eef9e6907d9d4ab407b605566b.svg","isPro":false,"fullname":"Hang Guo","user":"HangGuo","type":"user"},{"_id":"6604d9bd0a114bc034c93e15","avatarUrl":"/avatars/1279d022bb56ce4546ffd763dc8c8995.svg","isPro":false,"fullname":"Jiangshan Wang","user":"wjs0725","type":"user"},{"_id":"67bbade8a8c89b98ec377944","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67bbade8a8c89b98ec377944/HPtKDo8fnKr4OxpN1Z17D.png","isPro":false,"fullname":"Urodoc Oncall","user":"UDCAI","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6291b654a29097b211bd0665","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6291b654a29097b211bd0665/QJkzfayU6_Jz18PJxoXel.png","isPro":false,"fullname":"Yan Lu","user":"Jason-Lu","type":"user"},{"_id":"673969726c12c4b98b6ab29f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/C2elfn7L68jAt4dtHzDAW.png","isPro":false,"fullname":"Yuanhao Cai","user":"CaiYuanhao","type":"user"},{"_id":"62447e04f555de1927a9c879","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648655841478-noauth.png","isPro":false,"fullname":"jasonjiang","user":"mikinyaa","type":"user"},{"_id":"60525c2ba7226b25aaeea2ba","avatarUrl":"/avatars/fa52f0ed961993dce0a5c271dca0b4b7.svg","isPro":false,"fullname":"Daniel Darabos","user":"darabos","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}}">
Papers
arxiv:2601.20499

Efficient Autoregressive Video Diffusion with Dummy Head

Published on Jan 28
Β· Submitted by
Hang Guo
on Feb 5
Authors:
,
,
,
,
,
,

Abstract

Autoregressive video diffusion models suffer from inefficient attention mechanisms that underutilize historical frames, but a new method called Dummy Forcing improves efficiency through heterogeneous memory allocation and dynamic head programming while maintaining quality.

AI-generated summary

The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline, supporting video generation at 24.3 FPS with less than 0.5% quality drop. Project page is available at https://csguoh.github.io/project/DummyForcing/.

Community

Paper author Paper submitter

Dummy Forcing is built on the observation that about 25% attention heads in existing autoregressive video diffusion models are "dummy", attending almost exclusively to the current frame despite access to historical context. Based on this observation, Dummy Forcing develops a technique to automatically identifies dummy heads and allocates varying context. Leveraging this "dummy property", we can enable 1. Efficient Video Generation at 24.3FPS real-time speed. 2. High-resolution Video Generation which supports 720P&1080P with 2.0x speedup. 3. Long-context Video Gneration to enlarge the context window by 6.58x without losing efficiency.

arXivLens breakdown of this paper πŸ‘‰ https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

arXivLens breakdown of this paper πŸ‘‰ https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.20499 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.20499 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.20499 in a Space README.md to link it from this page.

Collections including this paper 3