Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - MIND: Benchmarking Memory Consistency and Action Control in World Models
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-02-13T01:47:35.700Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7236065864562988},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.08025","authors":[{"_id":"698c7597eb12ea7453916837","user":{"_id":"68fc3ddcdc9e5cbf49cbc716","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68fc3ddcdc9e5cbf49cbc716/gzvksq-XgWnekB6Xl25pw.jpeg","isPro":false,"fullname":"EasonYe","user":"EasonUwU","type":"user"},"name":"Yixuan Ye","status":"claimed_verified","statusLastChangedAt":"2026-02-12T13:57:38.731Z","hidden":false},{"_id":"698c7597eb12ea7453916838","name":"Xuanyu Lu","hidden":false},{"_id":"698c7597eb12ea7453916839","name":"Yuxin Jiang","hidden":false},{"_id":"698c7597eb12ea745391683a","name":"Yuchao Gu","hidden":false},{"_id":"698c7597eb12ea745391683b","name":"Rui Zhao","hidden":false},{"_id":"698c7597eb12ea745391683c","name":"Qiwei Liang","hidden":false},{"_id":"698c7597eb12ea745391683d","name":"Jiachun Pan","hidden":false},{"_id":"698c7597eb12ea745391683e","name":"Fengda Zhang","hidden":false},{"_id":"698c7597eb12ea745391683f","name":"Weijia Wu","hidden":false},{"_id":"698c7597eb12ea7453916840","name":"Alex Jinpeng Wang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6345a93afe134dfd7a0cfabd/bwNXUxQqslgNNPGQj_FzL.mp4"],"publishedAt":"2026-02-08T15:57:23.000Z","submittedOnDailyAt":"2026-02-11T23:27:34.924Z","title":"MIND: Benchmarking Memory Consistency and Action Control in World Models","submittedOnDailyBy":{"_id":"6345a93afe134dfd7a0cfabd","avatarUrl":"/avatars/65130ce06b1c72ab1066678419731d88.svg","isPro":false,"fullname":"wu weijia","user":"weijiawu","type":"user"},"summary":"World models aim to understand, remember, and predict dynamic visual environments, yet a unified benchmark for evaluating their fundamental abilities remains lacking. To address this gap, we introduce MIND, the first open-domain closed-loop revisited benchmark for evaluating Memory consIstency and action coNtrol in worlD models. MIND contains 250 high-quality videos at 1080p and 24 FPS, including 100 (first-person) + 100 (third-person) video clips under a shared action space and 25 + 25 clips across varied action spaces covering eight diverse scenes. We design an efficient evaluation framework to measure two core abilities: memory consistency and action control, capturing temporal stability and contextual coherence across viewpoints. Furthermore, we design various action spaces, including different character movement speeds and camera rotation angles, to evaluate the action generalization capability across different action spaces under shared scenes. To facilitate future performance benchmarking on MIND, we introduce MIND-World, a novel interactive Video-to-World baseline. Extensive experiments demonstrate the completeness of MIND and reveal key challenges in current world models, including the difficulty of maintaining long-term memory consistency and generalizing across action spaces. Project page: https://csu-jpg.github.io/MIND.github.io/","upvotes":10,"discussionId":"698c7597eb12ea7453916841","projectPage":"https://csu-jpg.github.io/MIND.github.io/","githubRepo":"https://github.com/CSU-JPG/MIND","githubRepoAddedBy":"user","ai_summary":"MIND is introduced as the first open-domain closed-loop benchmark for evaluating memory consistency and action control in world models, featuring high-quality videos and diverse action spaces to assess temporal stability and contextual coherence.","ai_keywords":["world models","memory consistency","action control","closed-loop benchmark","interactive Video-to-World baseline","temporal stability","contextual coherence","action generalization"],"githubStars":35,"organization":{"_id":"63f63303b29015adc33aeaa8","name":"NUS-CS3213","fullname":"National University of Singapore"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6345a93afe134dfd7a0cfabd","avatarUrl":"/avatars/65130ce06b1c72ab1066678419731d88.svg","isPro":false,"fullname":"wu weijia","user":"weijiawu","type":"user"},{"_id":"68fc3ddcdc9e5cbf49cbc716","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68fc3ddcdc9e5cbf49cbc716/gzvksq-XgWnekB6Xl25pw.jpeg","isPro":false,"fullname":"EasonYe","user":"EasonUwU","type":"user"},{"_id":"64b22e6b0a54158d66f18688","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b22e6b0a54158d66f18688/Cbl3oMMMANbnCMoSUYenI.png","isPro":true,"fullname":"Benhao Huang","user":"HuskyDoge","type":"user"},{"_id":"67ac8d04191779fd50db564c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/XFEoCaS13V88l91UHb9-m.png","isPro":false,"fullname":"Erik","user":"Poemcourt","type":"user"},{"_id":"652b83b73b5997ed71a310f2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/652b83b73b5997ed71a310f2/ipCpdeHUp4-0OmRz5z8IW.png","isPro":false,"fullname":"Rui Zhao","user":"ruizhaocv","type":"user"},{"_id":"64e84d40d50f3979be9afcbb","avatarUrl":"/avatars/6a706a4916132c1f1cda63d11dc46b87.svg","isPro":false,"fullname":"Jiang Yuxin","user":"YuxinJ","type":"user"},{"_id":"68636f086da1f3f0ef1f473d","avatarUrl":"/avatars/d43bfd84e0f2791c5f10694f90d81774.svg","isPro":false,"fullname":"JayceonHo","user":"JayceonHo","type":"user"},{"_id":"684d57f26e04c265777ead3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/cuOj-bQqukSZreXgUJlfm.png","isPro":false,"fullname":"Joakim Lee","user":"Reinforcement4All","type":"user"},{"_id":"67ecd8993714b0f817afcecf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/_QQxfBf3ymHqCUzvWOFV8.png","isPro":false,"fullname":"park","user":"young-soo","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"63f63303b29015adc33aeaa8","name":"NUS-CS3213","fullname":"National University of Singapore"}}">
MIND is introduced as the first open-domain closed-loop benchmark for evaluating memory consistency and action control in world models, featuring high-quality videos and diverse action spaces to assess temporal stability and contextual coherence.
AI-generated summary
World models aim to understand, remember, and predict dynamic visual environments, yet a unified benchmark for evaluating their fundamental abilities remains lacking. To address this gap, we introduce MIND, the first open-domain closed-loop revisited benchmark for evaluating Memory consIstency and action coNtrol in worlD models. MIND contains 250 high-quality videos at 1080p and 24 FPS, including 100 (first-person) + 100 (third-person) video clips under a shared action space and 25 + 25 clips across varied action spaces covering eight diverse scenes. We design an efficient evaluation framework to measure two core abilities: memory consistency and action control, capturing temporal stability and contextual coherence across viewpoints. Furthermore, we design various action spaces, including different character movement speeds and camera rotation angles, to evaluate the action generalization capability across different action spaces under shared scenes. To facilitate future performance benchmarking on MIND, we introduce MIND-World, a novel interactive Video-to-World baseline. Extensive experiments demonstrate the completeness of MIND and reveal key challenges in current world models, including the difficulty of maintaining long-term memory consistency and generalizing across action spaces. Project page: https://csu-jpg.github.io/MIND.github.io/