Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
\n","updatedAt":"2025-12-18T18:18:48.918Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6999529004096985},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2512.15702","authors":[{"_id":"69437cdb542d62d58a7bf766","name":"Yuwei Guo","hidden":false},{"_id":"69437cdb542d62d58a7bf767","name":"Ceyuan Yang","hidden":false},{"_id":"69437cdb542d62d58a7bf768","name":"Hao He","hidden":false},{"_id":"69437cdb542d62d58a7bf769","name":"Yang Zhao","hidden":false},{"_id":"69437cdb542d62d58a7bf76a","name":"Meng Wei","hidden":false},{"_id":"69437cdb542d62d58a7bf76b","name":"Zhenheng Yang","hidden":false},{"_id":"69437cdb542d62d58a7bf76c","name":"Weilin Huang","hidden":false},{"_id":"69437cdb542d62d58a7bf76d","name":"Dahua Lin","hidden":false}],"publishedAt":"2025-12-17T18:53:29.000Z","submittedOnDailyAt":"2025-12-18T01:34:14.444Z","title":"End-to-End Training for Autoregressive Video Diffusion via Self-Resampling","submittedOnDailyBy":{"_id":"6371f83d5ffa922d638ef486","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6371f83d5ffa922d638ef486/UzWl3xhVvtol3Rz1mjQ9t.jpeg","isPro":false,"fullname":"Yuwei Guo","user":"guoyww","type":"user"},"summary":"Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent works address this via post-training, they typically rely on a bidirectional teacher model or online discriminator. To achieve an end-to-end solution, we introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale. Central to our approach is a self-resampling scheme that simulates inference-time model errors on history frames during training. Conditioned on these degraded histories, a sparse causal mask enforces temporal causality while enabling parallel training with frame-level diffusion loss. To facilitate efficient long-horizon generation, we further introduce history routing, a parameter-free mechanism that dynamically retrieves the top-k most relevant history frames for each query. Experiments demonstrate that our approach achieves performance comparable to distillation-based baselines while exhibiting superior temporal consistency on longer videos owing to native-length training.","upvotes":16,"discussionId":"69437cdb542d62d58a7bf76e","projectPage":"https://guoyww.github.io/projects/resampling-forcing/","ai_summary":"Resampling Forcing is introduced as a teacher-free framework to train autoregressive video diffusion models with improved temporal consistency using self-resampling and history routing.","ai_keywords":["autoregressive video diffusion models","exposure bias","train-test mismatch","Resampling Forcing","self-resampling scheme","sparse causal mask","frame-level diffusion loss","history routing","temporal consistency"],"organization":{"_id":"67d1140985ea0644e2f14b99","name":"ByteDance-Seed","fullname":"ByteDance Seed","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6535c9e88bde2fae19b6fb25/flkDUqd_YEuFsjeNET3r-.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6407e5294edf9f5c4fd32228","avatarUrl":"/avatars/8e2d55460e9fe9c426eb552baf4b2cb0.svg","isPro":false,"fullname":"Stoney Kang","user":"sikang99","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6507fbecffc738079ca592bf","avatarUrl":"/avatars/1cb0f39ac6dc2dba2292846a8d7746da.svg","isPro":false,"fullname":"Ming Chen","user":"ChenMing-thu14","type":"user"},{"_id":"6343e37f73b4f9cedab1c846","avatarUrl":"/avatars/2638af4626e8a4e3a95f845b94ad94f6.svg","isPro":false,"fullname":"Leheng Li","user":"lilelife","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6570450a78d7aca0c361a177","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6570450a78d7aca0c361a177/MX7jHhTQwLs-BvYIu5rqb.jpeg","isPro":false,"fullname":"Harold Chen","user":"Harold328","type":"user"},{"_id":"66a422c559a6a2ef45e7845e","avatarUrl":"/avatars/e4a9402bb181c37399f6645ddf9aaeae.svg","isPro":false,"fullname":"Yu li","user":"Yukkkop","type":"user"},{"_id":"684d57f26e04c265777ead3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/cuOj-bQqukSZreXgUJlfm.png","isPro":false,"fullname":"Joakim Lee","user":"Reinforcement4All","type":"user"},{"_id":"637c941588699fba70e29f70","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/637c941588699fba70e29f70/b6G_QZkT-MhE47dx87i0d.png","isPro":false,"fullname":"LIU JIAMING","user":"jamesliu1217","type":"user"},{"_id":"624ac233c04d55ec0f42b11e","avatarUrl":"/avatars/58a9abce945e71a65abc8a54085de6d7.svg","isPro":false,"fullname":"oh sehun","user":"sehun","type":"user"},{"_id":"675dd24a2c98629a5e49dfac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/tI3V8-PZ8d3CC32fzO31e.png","isPro":false,"fullname":"Starstrek","user":"Stars321123","type":"user"},{"_id":"63805b8a9b75fb485a56e5c0","avatarUrl":"/avatars/bb2fdf72a57e3d8b4a0ec70c47e140e7.svg","isPro":true,"fullname":"Parth Ubhe","user":"AkumaDachi","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"67d1140985ea0644e2f14b99","name":"ByteDance-Seed","fullname":"ByteDance Seed","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6535c9e88bde2fae19b6fb25/flkDUqd_YEuFsjeNET3r-.png"}}">
Resampling Forcing is introduced as a teacher-free framework to train autoregressive video diffusion models with improved temporal consistency using self-resampling and history routing.
AI-generated summary
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent works address this via post-training, they typically rely on a bidirectional teacher model or online discriminator. To achieve an end-to-end solution, we introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale. Central to our approach is a self-resampling scheme that simulates inference-time model errors on history frames during training. Conditioned on these degraded histories, a sparse causal mask enforces temporal causality while enabling parallel training with frame-level diffusion loss. To facilitate efficient long-horizon generation, we further introduce history routing, a parameter-free mechanism that dynamically retrieves the top-k most relevant history frames for each query. Experiments demonstrate that our approach achieves performance comparable to distillation-based baselines while exhibiting superior temporal consistency on longer videos owing to native-length training.
The method is proposed to overcome limitations of self-forcing (reliance on teacher model, GAN loss..). Why does it need a warmup by adopting self-forcing objective? How's the result without self-forcing warmup?
The abstract claims to enable training AR model from scratch? Any results without pretrained weights?