Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-02-10T01:43:18.374Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6927594542503357},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.06949","authors":[{"_id":"69894c74beecc443208d25db","name":"Shenyuan Gao","hidden":false},{"_id":"69894c74beecc443208d25dc","name":"William Liang","hidden":false},{"_id":"69894c74beecc443208d25dd","name":"Kaiyuan Zheng","hidden":false},{"_id":"69894c74beecc443208d25de","name":"Ayaan Malik","hidden":false},{"_id":"69894c74beecc443208d25df","name":"Seonghyeon Ye","hidden":false},{"_id":"69894c74beecc443208d25e0","name":"Sihyun Yu","hidden":false},{"_id":"69894c74beecc443208d25e1","name":"Wei-Cheng Tseng","hidden":false},{"_id":"69894c74beecc443208d25e2","name":"Yuzhu Dong","hidden":false},{"_id":"69894c74beecc443208d25e3","name":"Kaichun Mo","hidden":false},{"_id":"69894c74beecc443208d25e4","name":"Chen-Hsuan Lin","hidden":false},{"_id":"69894c74beecc443208d25e5","name":"Qianli Ma","hidden":false},{"_id":"69894c74beecc443208d25e6","name":"Seungjun Nah","hidden":false},{"_id":"69894c74beecc443208d25e7","name":"Loic Magne","hidden":false},{"_id":"69894c74beecc443208d25e8","name":"Jiannan Xiang","hidden":false},{"_id":"69894c74beecc443208d25e9","name":"Yuqi Xie","hidden":false},{"_id":"69894c74beecc443208d25ea","name":"Ruijie Zheng","hidden":false},{"_id":"69894c74beecc443208d25eb","name":"Dantong Niu","hidden":false},{"_id":"69894c74beecc443208d25ec","name":"You Liang Tan","hidden":false},{"_id":"69894c74beecc443208d25ed","name":"K. R. Zentner","hidden":false},{"_id":"69894c74beecc443208d25ee","name":"George Kurian","hidden":false},{"_id":"69894c74beecc443208d25ef","name":"Suneel Indupuru","hidden":false},{"_id":"69894c74beecc443208d25f0","name":"Pooya Jannaty","hidden":false},{"_id":"69894c74beecc443208d25f1","name":"Jinwei Gu","hidden":false},{"_id":"69894c74beecc443208d25f2","name":"Jun Zhang","hidden":false},{"_id":"69894c74beecc443208d25f3","name":"Jitendra Malik","hidden":false},{"_id":"69894c74beecc443208d25f4","name":"Pieter Abbeel","hidden":false},{"_id":"69894c74beecc443208d25f5","name":"Ming-Yu Liu","hidden":false},{"_id":"69894c74beecc443208d25f6","name":"Yuke Zhu","hidden":false},{"_id":"69894c74beecc443208d25f7","name":"Joel Jang","hidden":false},{"_id":"69894c74beecc443208d25f8","name":"Linxi \"Jim\" Fan","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6039478ab3ecf716b1a5fd4d/MN-A84kxkw1l1lyftyRTR.mp4"],"publishedAt":"2026-02-06T18:49:43.000Z","submittedOnDailyAt":"2026-02-09T00:32:34.350Z","title":"DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Being able to simulate the outcomes of actions in varied environments will revolutionize the development of generalist agents at scale. However, modeling these world dynamics, especially for dexterous robotics tasks, poses significant challenges due to limited data coverage and scarce action labels. As an endeavor towards this end, we introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos. Our data mixture represents the largest video dataset to date for world model pretraining, spanning a wide range of daily scenarios with diverse objects and skills. To address the scarcity of action labels, we introduce continuous latent actions as unified proxy actions, enhancing interaction knowledge transfer from unlabeled videos. After post-training on small-scale target robot data, DreamDojo demonstrates a strong understanding of physics and precise action controllability. We also devise a distillation pipeline that accelerates DreamDojo to a real-time speed of 10.81 FPS and further improves context consistency. Our work enables several important applications based on generative world models, including live teleoperation, policy evaluation, and model-based planning. Systematic evaluation on multiple challenging out-of-distribution (OOD) benchmarks verifies the significance of our method for simulating open-world, contact-rich tasks, paving the way for general-purpose robot world models.","upvotes":34,"discussionId":"69894c74beecc443208d25f9","projectPage":"https://dreamdojo-world.github.io/","ai_summary":"DreamDojo is a foundation world model trained on 44k hours of egocentric human videos that enables efficient simulation of dexterous robotic tasks through continuous latent actions and real-time distillation.","ai_keywords":["world model","egocentric videos","continuous latent actions","action labels","distillation pipeline","real-time speed","teleoperation","policy evaluation","model-based planning","out-of-distribution benchmarks"],"organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1613114437487-60262a8e0703121c822a80b6.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"61e52be53d6dbb1da842316a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e52be53d6dbb1da842316a/gx0WGPcOCClXPymoKglc4.jpeg","isPro":false,"fullname":"Börje Karlsson","user":"tellarin","type":"user"},{"_id":"63bbf972d8d676a2299cdb44","avatarUrl":"/avatars/366d6ca7a4e19e42d2ec236a38d74ebd.svg","isPro":false,"fullname":"Sangwon","user":"agwmon","type":"user"},{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},{"_id":"677921d46f370093aa0d26e4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/rkf4tEQRqm3NLVZKRzn9U.png","isPro":false,"fullname":"LIU Zhening","user":"ZincL","type":"user"},{"_id":"64705d224be5cf1f3348d6bc","avatarUrl":"/avatars/270bff7c7cb326528dc192fc38561a8b.svg","isPro":false,"fullname":"Chi-Pin Huang","user":"jasper0314-huang","type":"user"},{"_id":"6463554dd2044cd1d7c6e0bf","avatarUrl":"/avatars/d7653623117268c545a7063fec69664b.svg","isPro":false,"fullname":"Bingzheng Wei","user":"Bingzheng","type":"user"},{"_id":"63ca8e060609f1def7e6548a","avatarUrl":"/avatars/1da7947840cb87d5f77c0af9ee11f9c2.svg","isPro":true,"fullname":"Yi Jung","user":"YJ-142150","type":"user"},{"_id":"67349dff5c8f30a50a211208","avatarUrl":"/avatars/89603590203e97fb58b86fd146c1c43b.svg","isPro":false,"fullname":"Yongkun Yang","user":"Yyk040316","type":"user"},{"_id":"646f17ff6b3df773a2c80697","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f17ff6b3df773a2c80697/W6hgBFkyDJaEpzNm_wGp6.png","isPro":false,"fullname":"Yingjie Lei","user":"ChaceLei2004","type":"user"},{"_id":"67136093d2e50f1e8c9fad52","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0q49MyGuav8lJ9CIeyLhu.png","isPro":false,"fullname":"Donghao Zhou","user":"donghao-zhou","type":"user"},{"_id":"654a31c073416a223f3b5fca","avatarUrl":"/avatars/bab382c46787eaf7889ed241e12775ee.svg","isPro":false,"fullname":"Shenyuan Gao","user":"Little-Podi","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1613114437487-60262a8e0703121c822a80b6.png"}}">
DreamDojo is a foundation world model trained on 44k hours of egocentric human videos that enables efficient simulation of dexterous robotic tasks through continuous latent actions and real-time distillation.
AI-generated summary
Being able to simulate the outcomes of actions in varied environments will revolutionize the development of generalist agents at scale. However, modeling these world dynamics, especially for dexterous robotics tasks, poses significant challenges due to limited data coverage and scarce action labels. As an endeavor towards this end, we introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos. Our data mixture represents the largest video dataset to date for world model pretraining, spanning a wide range of daily scenarios with diverse objects and skills. To address the scarcity of action labels, we introduce continuous latent actions as unified proxy actions, enhancing interaction knowledge transfer from unlabeled videos. After post-training on small-scale target robot data, DreamDojo demonstrates a strong understanding of physics and precise action controllability. We also devise a distillation pipeline that accelerates DreamDojo to a real-time speed of 10.81 FPS and further improves context consistency. Our work enables several important applications based on generative world models, including live teleoperation, policy evaluation, and model-based planning. Systematic evaluation on multiple challenging out-of-distribution (OOD) benchmarks verifies the significance of our method for simulating open-world, contact-rich tasks, paving the way for general-purpose robot world models.