Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
[go: Go Back, main page]

https://arxivlens.com/PaperView/Details/light-forcing-accelerating-autoregressive-video-diffusion-via-sparse-attention-7253-048e49c3

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-06T21:15:22.270Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5494124293327332},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}},{"id":"6986982b46c78748f76727ba","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-07T01:40:59.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Efficient Autoregressive Video Diffusion with Dummy Head](https://huggingface.co/papers/2601.20499) (2026)\n* [Past- and Future-Informed KV Cache Policy with Salience Estimation in Autoregressive Video Diffusion](https://huggingface.co/papers/2601.21896) (2026)\n* [Fast Autoregressive Video Diffusion and World Models with Temporal Cache Compression and Sparse Attention](https://huggingface.co/papers/2602.01801) (2026)\n* [HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming](https://huggingface.co/papers/2512.21338) (2025)\n* [Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization](https://huggingface.co/papers/2602.02958) (2026)\n* [PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers](https://huggingface.co/papers/2602.01077) (2026)\n* [Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation](https://huggingface.co/papers/2602.02214) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-07T01:40:59.386Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6568928956985474},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.04789","authors":[{"_id":"69856aa94ad556f294b7ec18","user":{"_id":"658e760cccbc1e2cc78b4258","avatarUrl":"/avatars/50160e785f9aaff69c32c65d52100ae1.svg","isPro":false,"fullname":"Chengtao Lv","user":"mack-williams","type":"user"},"name":"Chengtao Lv","status":"claimed_verified","statusLastChangedAt":"2026-02-06T18:51:35.610Z","hidden":false},{"_id":"69856aa94ad556f294b7ec19","name":"Yumeng Shi","hidden":false},{"_id":"69856aa94ad556f294b7ec1a","user":{"_id":"64b500fdf460afaefc5c64b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b500fdf460afaefc5c64b3/bYYyCXHTPUhsfw1HcPRPP.webp","isPro":false,"fullname":"Yushi Huang","user":"Harahan","type":"user"},"name":"Yushi Huang","status":"claimed_verified","statusLastChangedAt":"2026-02-06T18:51:32.981Z","hidden":false},{"_id":"69856aa94ad556f294b7ec1b","name":"Ruihao Gong","hidden":false},{"_id":"69856aa94ad556f294b7ec1c","name":"Shen Ren","hidden":false},{"_id":"69856aa94ad556f294b7ec1d","name":"Wenya Wang","hidden":false}],"publishedAt":"2026-02-04T17:41:53.000Z","submittedOnDailyAt":"2026-02-06T05:37:03.757Z","title":"Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention","submittedOnDailyBy":{"_id":"64b500fdf460afaefc5c64b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b500fdf460afaefc5c64b3/bYYyCXHTPUhsfw1HcPRPP.webp","isPro":false,"fullname":"Yushi Huang","user":"Harahan","type":"user"},"summary":"Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (\\ie, frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (\\eg, 84.5 on VBench) and efficiency (\\eg, 1.2{sim}1.3times end-to-end speedup). Combined with FP8 quantization and LightVAE, Light Forcing further achieves a 2.3times speedup and 19.7\\,FPS on an RTX~5090 GPU. Code will be released at https://github.com/chengtao-lv/LightForcing{https://github.com/chengtao-lv/LightForcing}.","upvotes":3,"discussionId":"69856aa94ad556f294b7ec1e","githubRepo":"https://github.com/chengtao-lv/LightForcing","githubRepoAddedBy":"user","ai_summary":"Light Forcing introduces a novel sparse attention mechanism for autoregressive video generation that improves efficiency while maintaining quality through chunk-aware growth and hierarchical sparse attention strategies.","ai_keywords":["autoregressive video generation","sparse attention","attention complexity","chunk-aware growth","hierarchical sparse attention","frame level masking","block level masking","FP8 quantization","LightVAE"],"githubStars":24},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"658e760cccbc1e2cc78b4258","avatarUrl":"/avatars/50160e785f9aaff69c32c65d52100ae1.svg","isPro":false,"fullname":"Chengtao Lv","user":"mack-williams","type":"user"},{"_id":"64b500fdf460afaefc5c64b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b500fdf460afaefc5c64b3/bYYyCXHTPUhsfw1HcPRPP.webp","isPro":false,"fullname":"Yushi Huang","user":"Harahan","type":"user"},{"_id":"665c91e15b11dca02f0c5891","avatarUrl":"/avatars/49a4ee76c3edfe5b0916051a5ac4acfd.svg","isPro":false,"fullname":"Ye Huang","user":"henry-y1","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2602.04789

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Published on Feb 4
ยท Submitted by
Yushi Huang
on Feb 6
Authors:
,
,
,

Abstract

Light Forcing introduces a novel sparse attention mechanism for autoregressive video generation that improves efficiency while maintaining quality through chunk-aware growth and hierarchical sparse attention strategies.

AI-generated summary

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (\ie, frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (\eg, 84.5 on VBench) and efficiency (\eg, 1.2{sim}1.3times end-to-end speedup). Combined with FP8 quantization and LightVAE, Light Forcing further achieves a 2.3times speedup and 19.7\,FPS on an RTX~5090 GPU. Code will be released at https://github.com/chengtao-lv/LightForcing{https://github.com/chengtao-lv/LightForcing}.

Community

Paper author Paper submitter

Light Forcing introduces a novel sparse attention mechanism for autoregressive video generation that improves efficiency while maintaining quality through chunk-aware growth and hierarchical sparse attention strategies.

arXivLens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/light-forcing-accelerating-autoregressive-video-diffusion-via-sparse-attention-7253-048e49c3

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.04789 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.04789 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.04789 in a Space README.md to link it from this page.

Collections including this paper 2