Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Past- and Future-Informed KV Cache Policy with Salience Estimation in Autoregressive Video Diffusion (2026)
Fast Autoregressive Video Diffusion and World Models with Temporal Cache Compression and Sparse Attention (2026)
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming (2025)
PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache (2026)
Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing (2026)
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling (2025)
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization (2026)

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-06T01:38:43.984Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6594904661178589},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"6985e56dc3882063e8758fdc","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-06T12:58:21.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d

Executive Summary
Detailed Breakdown
Practical Applications

\n","updatedAt":"2026-02-06T12:58:21.808Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6094051599502563},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}},{"id":"69874e46351583ce5d32c97f","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-07T14:37:58.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/efficient-autoregressive-video-diffusion-with-dummy-head-750-aac1a39d

Executive Summary
Detailed Breakdown
Practical Applications

\n","updatedAt":"2026-02-07T14:37:58.210Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6094051599502563},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.20499","authors":[{"_id":"69844621e34659da7e1f50d5","user":{"_id":"67d6bb22eab66ce9cb4e3662","avatarUrl":"/avatars/814704eef9e6907d9d4ab407b605566b.svg","isPro":false,"fullname":"Hang Guo","user":"HangGuo","type":"user"},"name":"Hang Guo","status":"claimed_verified","statusLastChangedAt":"2026-02-05T10:52:07.030Z","hidden":false},{"_id":"69844621e34659da7e1f50d6","name":"Zhaoyang Jia","hidden":false},{"_id":"69844621e34659da7e1f50d7","name":"Jiahao Li","hidden":false},{"_id":"69844621e34659da7e1f50d8","name":"Bin Li","hidden":false},{"_id":"69844621e34659da7e1f50d9","name":"Yuanhao Cai","hidden":false},{"_id":"69844621e34659da7e1f50da","name":"Jiangshan Wang","hidden":false},{"_id":"69844621e34659da7e1f50db","name":"Yawei Li","hidden":false},{"_id":"69844621e34659da7e1f50dc","name":"Yan Lu","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/67d6bb22eab66ce9cb4e3662/Rv8QZ2iY4ipyUaTkQytH8.mp4"],"publishedAt":"2026-01-28T11:20:43.000Z","submittedOnDailyAt":"2026-02-05T05:03:59.115Z","title":"Efficient Autoregressive Video Diffusion with Dummy Head","submittedOnDailyBy":{"_id":"67d6bb22eab66ce9cb4e3662","avatarUrl":"/avatars/814704eef9e6907d9d4ab407b605566b.svg","isPro":false,"fullname":"Hang Guo","user":"HangGuo","type":"user"},"summary":"The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline, supporting video generation at 24.3 FPS with less than 0.5% quality drop. Project page is available at https://csguoh.github.io/project/DummyForcing/.","upvotes":8,"discussionId":"69844621e34659da7e1f50dd","projectPage":"https://csguoh.github.io/project/DummyForcing/","githubRepo":"https://github.com/csguoh/DummyForcing","githubRepoAddedBy":"user","ai_summary":"Autoregressive video diffusion models suffer from inefficient attention mechanisms that underutilize historical frames, but a new method called Dummy Forcing improves efficiency through heterogeneous memory allocation and dynamic head programming while maintaining quality.","ai_keywords":["autoregressive video diffusion model","multi-head self-attention","causal modeling","iterative denoising","KV caches","heterogeneous memory allocation","dynamic head programming","context packing","cache compression"],"githubStars":50,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67d6bb22eab66ce9cb4e3662","avatarUrl":"/avatars/814704eef9e6907d9d4ab407b605566b.svg","isPro":false,"fullname":"Hang Guo","user":"HangGuo","type":"user"},{"_id":"6604d9bd0a114bc034c93e15","avatarUrl":"/avatars/1279d022bb56ce4546ffd763dc8c8995.svg","isPro":false,"fullname":"Jiangshan Wang","user":"wjs0725","type":"user"},{"_id":"67bbade8a8c89b98ec377944","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67bbade8a8c89b98ec377944/HPtKDo8fnKr4OxpN1Z17D.png","isPro":false,"fullname":"Urodoc Oncall","user":"UDCAI","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6291b654a29097b211bd0665","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6291b654a29097b211bd0665/QJkzfayU6_Jz18PJxoXel.png","isPro":false,"fullname":"Yan Lu","user":"Jason-Lu","type":"user"},{"_id":"673969726c12c4b98b6ab29f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/C2elfn7L68jAt4dtHzDAW.png","isPro":false,"fullname":"Yuanhao Cai","user":"CaiYuanhao","type":"user"},{"_id":"62447e04f555de1927a9c879","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648655841478-noauth.png","isPro":false,"fullname":"jasonjiang","user":"mikinyaa","type":"user"},{"_id":"60525c2ba7226b25aaeea2ba","avatarUrl":"/avatars/fa52f0ed961993dce0a5c271dca0b4b7.svg","isPro":false,"fullname":"Daniel Darabos","user":"darabos","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}}">

Papers

arxiv:2601.20499

Efficient Autoregressive Video Diffusion with Dummy Head

Published on Jan 28

· Submitted by

Hang Guo on Feb 5

Microsoft Research

Upvote

Authors:

Hang Guo ,

Abstract

Autoregressive video diffusion models suffer from inefficient attention mechanisms that underutilize historical frames, but a new method called Dummy Forcing improves efficiency through heterogeneous memory allocation and dynamic head programming while maintaining quality.

AI-generated summary

The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline, supporting video generation at 24.3 FPS with less than 0.5% quality drop. Project page is available at https://csguoh.github.io/project/DummyForcing/.

View arXiv page View PDF Project page GitHub 50 Add to collection

Community

HangGuo

Paper author Paper submitter 15 days ago

Dummy Forcing is built on the observation that about 25% attention heads in existing autoregressive video diffusion models are "dummy", attending almost exclusively to the current frame despite access to historical context. Based on this observation, Dummy Forcing develops a technique to automatically identifies dummy heads and allocates varying context. Leveraging this "dummy property", we can enable 1. Efficient Video Generation at 24.3FPS real-time speed. 2. High-resolution Video Generation which supports 720P&1080P with 2.0x speedup. 3. Long-context Video Gneration to enlarge the context window by 6.58x without losing efficiency.