Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - From Word to World: Can Large Language Models be Implicit Text-based World Models?
\n","updatedAt":"2025-12-26T22:39:32.998Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7420381903648376},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2512.18832","authors":[{"_id":"6949fc14335742716e9321d5","user":{"_id":"6645bdf6621ded608be9c37e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6645bdf6621ded608be9c37e/lHFUPBkqCyKc6KCaBUaCs.jpeg","isPro":false,"fullname":"Yixia Li","user":"X1AOX1A","type":"user"},"name":"Yixia Li","status":"claimed_verified","statusLastChangedAt":"2025-12-25T20:50:07.465Z","hidden":false},{"_id":"6949fc14335742716e9321d6","user":{"_id":"65f906e5c3dbdcae83ff7aac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f906e5c3dbdcae83ff7aac/mdjiVkLDJgJcGLwv0rMe4.jpeg","isPro":false,"fullname":"Hongru Wang","user":"Merlin-Hongru","type":"user"},"name":"Hongru Wang","status":"claimed_verified","statusLastChangedAt":"2025-12-25T20:50:04.919Z","hidden":false},{"_id":"6949fc14335742716e9321d7","name":"Jiahao Qiu","hidden":false},{"_id":"6949fc14335742716e9321d8","name":"Zhenfei Yin","hidden":false},{"_id":"6949fc14335742716e9321d9","name":"Dongdong Zhang","hidden":false},{"_id":"6949fc14335742716e9321da","name":"Cheng Qian","hidden":false},{"_id":"6949fc14335742716e9321db","name":"Zeping Li","hidden":false},{"_id":"6949fc14335742716e9321dc","name":"Pony Ma","hidden":false},{"_id":"6949fc14335742716e9321dd","name":"Guanhua Chen","hidden":false},{"_id":"6949fc14335742716e9321de","name":"Heng Ji","hidden":false},{"_id":"6949fc14335742716e9321df","name":"Mengdi Wang","hidden":false}],"publishedAt":"2025-12-21T17:28:42.000Z","submittedOnDailyAt":"2025-12-25T18:21:22.462Z","title":"From Word to World: Can Large Language Models be Implicit Text-based World Models?","submittedOnDailyBy":{"_id":"65f906e5c3dbdcae83ff7aac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f906e5c3dbdcae83ff7aac/mdjiVkLDJgJcGLwv0rMe4.jpeg","isPro":false,"fullname":"Hongru Wang","user":"Merlin-Hongru","type":"user"},"summary":"Agentic reinforcement learning increasingly relies on experience-driven scaling, yet real-world environments remain non-adaptive, limited in coverage, and difficult to scale. World models offer a potential way to improve learning efficiency through simulated experience, but it remains unclear whether large language models can reliably serve this role and under what conditions they meaningfully benefit agents. We study these questions in text-based environments, which provide a controlled setting to reinterpret language modeling as next-state prediction under interaction. We introduce a three-level framework for evaluating LLM-based world models: (i) fidelity and consistency, (ii) scalability and robustness, and (iii) agent utility. Across five representative environments, we find that sufficiently trained world models maintain coherent latent state, scale predictably with data and model size, and improve agent performance via action verification, synthetic trajectory generation, and warm-starting reinforcement learning. Meanwhile, these gains depend critically on behavioral coverage and environment complexity, delineating clear boundry on when world modeling effectively supports agent learning.","upvotes":15,"discussionId":"6949fc14335742716e9321e0","githubRepo":"https://github.com/X1AOX1A/Word2World","githubRepoAddedBy":"user","ai_summary":"LLM-based world models enhance agent performance in text-based environments through action verification, synthetic trajectory generation, and warm-starting reinforcement learning, but their effectiveness is contingent on behavioral coverage and environment complexity.","ai_keywords":["agentic reinforcement learning","experience-driven scaling","real-world environments","world models","large language models","next-state prediction","three-level framework","fidelity","consistency","scalability","robustness","agent utility","action verification","synthetic trajectory generation","warm-starting reinforcement learning"],"githubStars":46,"organization":{"_id":"652e72b5fd5e3a357cf6f844","name":"EdinburghNLP","fullname":"EdinburghNLP - Natural Language Processing Group at the University of Edinburgh","avatar":"https://cdn-uploads.huggingface.co/production/uploads/5fbfd09ee366524fe8e97cd3/KBva4SboTuDXRdYqWZsCX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6645bdf6621ded608be9c37e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6645bdf6621ded608be9c37e/lHFUPBkqCyKc6KCaBUaCs.jpeg","isPro":false,"fullname":"Yixia Li","user":"X1AOX1A","type":"user"},{"_id":"6757ed528066b63cd7587f62","avatarUrl":"/avatars/2eb56fded80ee02f11fbbfea588da592.svg","isPro":false,"fullname":"laip","user":"lllp11","type":"user"},{"_id":"67a569a006d90a17bc94b6ab","avatarUrl":"/avatars/7b144b8e022ffca2ec2705126f85052e.svg","isPro":false,"fullname":"zwRuan","user":"zwRuan","type":"user"},{"_id":"63672d7864bcbbd03e39b2f1","avatarUrl":"/avatars/0c100709de78988a9a39dd48cb1be513.svg","isPro":false,"fullname":"Zhu He","user":"chichi56","type":"user"},{"_id":"66554b4c9ccb17d967f3eda3","avatarUrl":"/avatars/5a8d480b678de9dbb1302ca0dd6e78a2.svg","isPro":false,"fullname":"Qys77","user":"Qys77","type":"user"},{"_id":"6857b7a7af93bf2f664775d2","avatarUrl":"/avatars/de43ecb6f681ad66e899a06ccf80c340.svg","isPro":false,"fullname":"yulang gao","user":"u12312828","type":"user"},{"_id":"65f906e5c3dbdcae83ff7aac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f906e5c3dbdcae83ff7aac/mdjiVkLDJgJcGLwv0rMe4.jpeg","isPro":false,"fullname":"Hongru Wang","user":"Merlin-Hongru","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"60bccec062080d33f875cd0c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60bccec062080d33f875cd0c/KvEhYxx9-Tff_Qb7PsjAL.png","isPro":true,"fullname":"Peter Szemraj","user":"pszemraj","type":"user"},{"_id":"684d57f26e04c265777ead3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/cuOj-bQqukSZreXgUJlfm.png","isPro":false,"fullname":"Joakim Lee","user":"Reinforcement4All","type":"user"},{"_id":"65e856f7fd33d243640f72a9","avatarUrl":"/avatars/869f239ce513350a5ef856739dc215c0.svg","isPro":false,"fullname":"tonywang","user":"wzm206","type":"user"},{"_id":"67c27673e5911f17dbdded18","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/hp1VNuQe9Lf8QH0-km5dQ.png","isPro":false,"fullname":"LChen","user":"Charlie-LChen","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"652e72b5fd5e3a357cf6f844","name":"EdinburghNLP","fullname":"EdinburghNLP - Natural Language Processing Group at the University of Edinburgh","avatar":"https://cdn-uploads.huggingface.co/production/uploads/5fbfd09ee366524fe8e97cd3/KBva4SboTuDXRdYqWZsCX.png"}}">
LLM-based world models enhance agent performance in text-based environments through action verification, synthetic trajectory generation, and warm-starting reinforcement learning, but their effectiveness is contingent on behavioral coverage and environment complexity.