Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - LIVE: Long-horizon Interactive Video World Modeling
[go: Go Back, main page]

https://junchao-cs.github.io/LIVE-demo/
Technical Paper: https://arxiv.org/pdf/2602.03747

\n","updatedAt":"2026-02-04T14:50:49.606Z","author":{"_id":"68961e2a83f79bb28f78baa3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/bZmicB1lDqFjIVbpsg_8c.png","fullname":"junchao-cuhk","name":"junchao-cuhk","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3887362480163574},"editors":["junchao-cuhk"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/bZmicB1lDqFjIVbpsg_8c.png"],"reactions":[],"isReport":false}},{"id":"6983f516bb7c1f67ebf02f0b","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-05T01:40:38.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models](https://huggingface.co/papers/2512.12080) (2025)\n* [End-to-End Training for Autoregressive Video Diffusion via Self-Resampling](https://huggingface.co/papers/2512.15702) (2025)\n* [VideoAR: Autoregressive Video Generation via Next-Frame&Scale Prediction](https://huggingface.co/papers/2601.05966) (2026)\n* [LongVie 2: Multimodal Controllable Ultra-Long Video World Model](https://huggingface.co/papers/2512.13604) (2025)\n* [Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation](https://huggingface.co/papers/2512.18741) (2025)\n* [Endless World: Real-Time 3D-Aware Long Video Generation](https://huggingface.co/papers/2512.12430) (2025)\n* [Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation](https://huggingface.co/papers/2512.21734) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-05T01:40:38.112Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6458009481430054},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"69878458e143f25e927704e5","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-07T18:28:40.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/live-long-horizon-interactive-video-world-modeling-7592-67b8df5c\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/live-long-horizon-interactive-video-world-modeling-7592-67b8df5c

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-07T18:28:40.794Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6466903686523438},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.03747","authors":[{"_id":"6982cb3f9084cb4f0ecb578f","user":{"_id":"68961e2a83f79bb28f78baa3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/bZmicB1lDqFjIVbpsg_8c.png","isPro":false,"fullname":"junchao-cuhk","user":"junchao-cuhk","type":"user"},"name":"Junchao Huang","status":"claimed_verified","statusLastChangedAt":"2026-02-04T12:28:01.292Z","hidden":false},{"_id":"6982cb3f9084cb4f0ecb5790","name":"Ziyang Ye","hidden":false},{"_id":"6982cb3f9084cb4f0ecb5791","name":"Xinting Hu","hidden":false},{"_id":"6982cb3f9084cb4f0ecb5792","user":{"_id":"619b7b1cab4c7b7f16a7d59e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/619b7b1cab4c7b7f16a7d59e/6TvXaAqBghAMYO1-j5l4v.jpeg","isPro":false,"fullname":"Tianyu He","user":"deeptimhe","type":"user"},"name":"Tianyu He","status":"claimed_verified","statusLastChangedAt":"2026-02-06T18:56:14.561Z","hidden":false},{"_id":"6982cb3f9084cb4f0ecb5793","name":"Guiyu Zhang","hidden":false},{"_id":"6982cb3f9084cb4f0ecb5794","name":"Shaoshuai Shi","hidden":false},{"_id":"6982cb3f9084cb4f0ecb5795","name":"Jiang Bian","hidden":false},{"_id":"6982cb3f9084cb4f0ecb5796","name":"Li Jiang","hidden":false}],"publishedAt":"2026-02-03T17:10:03.000Z","submittedOnDailyAt":"2026-02-04T12:20:49.591Z","title":"LIVE: Long-horizon Interactive Video World Modeling","submittedOnDailyBy":{"_id":"68961e2a83f79bb28f78baa3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/bZmicB1lDqFjIVbpsg_8c.png","isPro":false,"fullname":"junchao-cuhk","user":"junchao-cuhk","type":"user"},"summary":"Autoregressive video world models predict future visual observations conditioned on actions. While effective over short horizons, these models often struggle with long-horizon generation, as small prediction errors accumulate over time. Prior methods alleviate this by introducing pre-trained teacher models and sequence-level distribution matching, which incur additional computational cost and fail to prevent error propagation beyond the training horizon. In this work, we propose LIVE, a Long-horizon Interactive Video world modEl that enforces bounded error accumulation via a novel cycle-consistency objective, thereby eliminating the need for teacher-based distillation. Specifically, LIVE first performs a forward rollout from ground-truth frames and then applies a reverse generation process to reconstruct the initial state. The diffusion loss is subsequently computed on the reconstructed terminal state, providing an explicit constraint on long-horizon error propagation. Moreover, we provide an unified view that encompasses different approaches and introduce progressive training curriculum to stabilize training. Experiments demonstrate that LIVE achieves state-of-the-art performance on long-horizon benchmarks, generating stable, high-quality videos far beyond training rollout lengths.","upvotes":12,"discussionId":"6982cb409084cb4f0ecb5797","projectPage":"https://junchao-cs.github.io/LIVE-demo/","githubRepo":"https://github.com/Junchao-cs/LIVE","githubRepoAddedBy":"user","ai_summary":"LIVE is a long-horizon video world model that uses cycle-consistency and diffusion loss to control error accumulation during extended video generation.","ai_keywords":["video world models","autoregressive models","error accumulation","cycle-consistency objective","diffusion loss","forward rollout","reverse generation","progressive training curriculum"],"githubStars":18,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"68961e2a83f79bb28f78baa3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/bZmicB1lDqFjIVbpsg_8c.png","isPro":false,"fullname":"junchao-cuhk","user":"junchao-cuhk","type":"user"},{"_id":"652845524def261e0febbf5b","avatarUrl":"/avatars/92743e595cbce4e37a8fe7400b5051ba.svg","isPro":false,"fullname":"Xinting Hu","user":"kaleidudu","type":"user"},{"_id":"66147ae0e2ba01441b59b437","avatarUrl":"/avatars/b0fc7d3f00c5a51eacf2ffb5fa8e1508.svg","isPro":false,"fullname":"Henry Wong","user":"krumo","type":"user"},{"_id":"65800b80c7cb69069ac6b87a","avatarUrl":"/avatars/d92b7fc68b028d9ccc21056446b29731.svg","isPro":false,"fullname":"yunbai lee","user":"yunbai","type":"user"},{"_id":"6895907f539447bd1daa3d80","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6895907f539447bd1daa3d80/nVr6xKPeVhPfCeWe4wt2c.jpeg","isPro":false,"fullname":"Qingyu Wu","user":"Vertsineu","type":"user"},{"_id":"66d2d9255ae47374c29c3d26","avatarUrl":"/avatars/3bce1bb35316a1c476c80c8e46ca8b62.svg","isPro":false,"fullname":"Yiyan Xu","user":"Grainn","type":"user"},{"_id":"666c06ca9ed9e91df03e7e27","avatarUrl":"/avatars/3101ad4f1a3a243b75080d06c5e59e9c.svg","isPro":false,"fullname":"Ouxiang Li","user":"lioooox","type":"user"},{"_id":"68c7c936379c90c084e5216d","avatarUrl":"/avatars/f784a5732c02cc840fe8bdf983bc3c56.svg","isPro":false,"fullname":"Jingxin Wang","user":"Jason2Jason","type":"user"},{"_id":"632634c0a3f07c8e16950f9b","avatarUrl":"/avatars/d92c15bb9296317d48f78e7d5e9d166a.svg","isPro":false,"fullname":"Yuan Xin","user":"SuperYuan","type":"user"},{"_id":"619b7b1cab4c7b7f16a7d59e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/619b7b1cab4c7b7f16a7d59e/6TvXaAqBghAMYO1-j5l4v.jpeg","isPro":false,"fullname":"Tianyu He","user":"deeptimhe","type":"user"},{"_id":"665433f061422fc89738a14a","avatarUrl":"/avatars/9e9c65dc10768961b60022139f209309.svg","isPro":false,"fullname":"ChanYanKi","user":"Louis0411","type":"user"},{"_id":"672af78a6d08049618ee69c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/QLjH76C9v9NUvzsg5mk8Y.png","isPro":false,"fullname":"nithin","user":"nithin12342","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}}">
Papers
arxiv:2602.03747

LIVE: Long-horizon Interactive Video World Modeling

Published on Feb 3
ยท Submitted by
junchao-cuhk
on Feb 4
Authors:
,
,
,
,
,

Abstract

LIVE is a long-horizon video world model that uses cycle-consistency and diffusion loss to control error accumulation during extended video generation.

AI-generated summary

Autoregressive video world models predict future visual observations conditioned on actions. While effective over short horizons, these models often struggle with long-horizon generation, as small prediction errors accumulate over time. Prior methods alleviate this by introducing pre-trained teacher models and sequence-level distribution matching, which incur additional computational cost and fail to prevent error propagation beyond the training horizon. In this work, we propose LIVE, a Long-horizon Interactive Video world modEl that enforces bounded error accumulation via a novel cycle-consistency objective, thereby eliminating the need for teacher-based distillation. Specifically, LIVE first performs a forward rollout from ground-truth frames and then applies a reverse generation process to reconstruct the initial state. The diffusion loss is subsequently computed on the reconstructed terminal state, providing an explicit constraint on long-horizon error propagation. Moreover, we provide an unified view that encompasses different approaches and introduce progressive training curriculum to stabilize training. Experiments demonstrate that LIVE achieves state-of-the-art performance on long-horizon benchmarks, generating stable, high-quality videos far beyond training rollout lengths.

Community

Paper author Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXivLens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/live-long-horizon-interactive-video-world-modeling-7592-67b8df5c

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03747 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03747 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03747 in a Space README.md to link it from this page.

Collections including this paper 1