Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-02-20T01:36:17.056Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7218629717826843},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.13092","authors":[{"_id":"67b5473109afe1f3029835cb","user":{"_id":"6237df4a5ab9df625fb70c1a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6237df4a5ab9df625fb70c1a/qVM-7ww4ozUDu_MS4JivM.jpeg","isPro":false,"fullname":"Mengkang Hu","user":"MengkangHu","type":"user"},"name":"Mengkang Hu","status":"claimed_verified","statusLastChangedAt":"2025-02-19T09:01:15.592Z","hidden":false},{"_id":"67b5473109afe1f3029835cc","name":"Tianxing Chen","hidden":false},{"_id":"67b5473109afe1f3029835cd","user":{"_id":"67b4123c9a692ad23daa8a73","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67b4123c9a692ad23daa8a73/vhkxv0NkM2socQ2tySzqc.jpeg","isPro":false,"fullname":"Yude Zou","user":"xdzouyd","type":"user"},"name":"Yude Zou","status":"claimed_verified","statusLastChangedAt":"2025-02-20T09:37:22.709Z","hidden":false},{"_id":"67b5473109afe1f3029835ce","name":"Yuheng Lei","hidden":false},{"_id":"67b5473109afe1f3029835cf","user":{"_id":"636f526a6cd69d9a36ff2b53","avatarUrl":"/avatars/8f2271a193fcac609d9be270552b5afa.svg","isPro":false,"fullname":"Qiguang Chen","user":"LightChen2333","type":"user"},"name":"Qiguang Chen","status":"claimed_verified","statusLastChangedAt":"2025-02-19T09:01:12.095Z","hidden":false},{"_id":"67b5473109afe1f3029835d0","name":"Ming Li","hidden":false},{"_id":"67b5473109afe1f3029835d1","name":"Hongyuan Zhang","hidden":false},{"_id":"67b5473109afe1f3029835d2","name":"Wenqi Shao","hidden":false},{"_id":"67b5473109afe1f3029835d3","name":"Ping Luo","hidden":false}],"publishedAt":"2025-02-18T17:59:48.000Z","submittedOnDailyAt":"2025-02-19T10:23:04.918Z","title":"Text2World: Benchmarking Large Language Models for Symbolic World Model\n Generation","submittedOnDailyBy":{"_id":"6237df4a5ab9df625fb70c1a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6237df4a5ab9df625fb70c1a/qVM-7ww4ozUDu_MS4JivM.jpeg","isPro":false,"fullname":"Mengkang Hu","user":"MengkangHu","type":"user"},"summary":"Recently, there has been growing interest in leveraging large language models\n(LLMs) to generate symbolic world models from textual descriptions. Although\nLLMs have been extensively explored in the context of world modeling, prior\nstudies encountered several challenges, including evaluation randomness,\ndependence on indirect metrics, and a limited domain scope. To address these\nlimitations, we introduce a novel benchmark, Text2World, based on planning\ndomain definition language (PDDL), featuring hundreds of diverse domains and\nemploying multi-criteria, execution-based metrics for a more robust evaluation.\nWe benchmark current LLMs using Text2World and find that reasoning models\ntrained with large-scale reinforcement learning outperform others. However,\neven the best-performing model still demonstrates limited capabilities in world\nmodeling. Building on these insights, we examine several promising strategies\nto enhance the world modeling capabilities of LLMs, including test-time\nscaling, agent training, and more. We hope that Text2World can serve as a\ncrucial resource, laying the groundwork for future research in leveraging LLMs\nas world models. The project page is available at\nhttps://text-to-world.github.io/.","upvotes":13,"discussionId":"67b5473209afe1f302983600","projectPage":"https://text-to-world.github.io/","githubRepo":"https://github.com/Aaron617/text2world","githubRepoAddedBy":"auto","ai_summary":"Text2World, a novel PDDL-based benchmark, evaluates world modeling capabilities of LLMs using diverse domains and multi-criteria metrics, highlighting the need for enhanced strategies.","ai_keywords":["large language models","symbolic world models","Text2World","planning domain definition language","reinforcement learning","test-time scaling","agent training"],"githubStars":27},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"636f526a6cd69d9a36ff2b53","avatarUrl":"/avatars/8f2271a193fcac609d9be270552b5afa.svg","isPro":false,"fullname":"Qiguang Chen","user":"LightChen2333","type":"user"},{"_id":"6237df4a5ab9df625fb70c1a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6237df4a5ab9df625fb70c1a/qVM-7ww4ozUDu_MS4JivM.jpeg","isPro":false,"fullname":"Mengkang Hu","user":"MengkangHu","type":"user"},{"_id":"65b37a9b06d8b55123ef8921","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65b37a9b06d8b55123ef8921/CT5tLwezjXct1eTszA8sO.jpeg","isPro":false,"fullname":"Tianxing Chen","user":"TianxingChen","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"650c8bfb3d3542884da1a845","avatarUrl":"/avatars/863a5deebf2ac6d4faedc4dd368e0561.svg","isPro":false,"fullname":"Adhurim ","user":"Limi07","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/Ydjcjd4VuNUGj5Cd4QHdB.png","isPro":false,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"631e14ac473a6825f285e89d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/631e14ac473a6825f285e89d/K-6QnoeGLg8XFvbTMMdqA.jpeg","isPro":false,"fullname":"Yury Panikov","user":"panikov","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"67b4123c9a692ad23daa8a73","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67b4123c9a692ad23daa8a73/vhkxv0NkM2socQ2tySzqc.jpeg","isPro":false,"fullname":"Yude Zou","user":"xdzouyd","type":"user"},{"_id":"634dffc49b777beec3bc6448","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670144568552-634dffc49b777beec3bc6448.jpeg","isPro":false,"fullname":"Zhipeng Yang","user":"svjack","type":"user"},{"_id":"631313e1b46fc4e2432ebe56","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/631313e1b46fc4e2432ebe56/r2sDFz8uwmqPZq_0JO_eY.jpeg","isPro":false,"fullname":"Rishabh Singh","user":"lulzx","type":"user"},{"_id":"6410213f928400b416424f6e","avatarUrl":"/avatars/4ce6a2a33d73119dc840217d7d053343.svg","isPro":false,"fullname":"Xudong Xu","user":"Sheldoooon","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2502.13092

Text2World: Benchmarking Large Language Models for Symbolic World Model Generation

Published on Feb 18, 2025
· Submitted by
Mengkang Hu
on Feb 19, 2025
Authors:
,
,
,
,
,

Abstract

Text2World, a novel PDDL-based benchmark, evaluates world modeling capabilities of LLMs using diverse domains and multi-criteria metrics, highlighting the need for enhanced strategies.

AI-generated summary

Recently, there has been growing interest in leveraging large language models (LLMs) to generate symbolic world models from textual descriptions. Although LLMs have been extensively explored in the context of world modeling, prior studies encountered several challenges, including evaluation randomness, dependence on indirect metrics, and a limited domain scope. To address these limitations, we introduce a novel benchmark, Text2World, based on planning domain definition language (PDDL), featuring hundreds of diverse domains and employing multi-criteria, execution-based metrics for a more robust evaluation. We benchmark current LLMs using Text2World and find that reasoning models trained with large-scale reinforcement learning outperform others. However, even the best-performing model still demonstrates limited capabilities in world modeling. Building on these insights, we examine several promising strategies to enhance the world modeling capabilities of LLMs, including test-time scaling, agent training, and more. We hope that Text2World can serve as a crucial resource, laying the groundwork for future research in leveraging LLMs as world models. The project page is available at https://text-to-world.github.io/.

Community

Paper author Paper submitter

Recently, there has been growing interest in leveraging large language models (LLMs) to generate symbolic world models from textual descriptions. Although LLMs have been extensively explored in the context of world modeling, prior studies encountered several challenges, including evaluation randomness, dependence on indirect metrics, and a limited domain scope. To address these limitations, we introduce a novel benchmark, Text2World, based on planning domain definition language (PDDL), featuring hundreds of diverse domains and employing multi-criteria, execution-based metrics for a more robust evaluation. We benchmark current LLMs using Text2World and find that reasoning models trained with large-scale reinforcement learning outperform others. However, even the best-performing model still demonstrates limited capabilities in world modeling. Building on these insights, we examine several promising strategies to enhance the world modeling capabilities of LLMs, including test-time scaling, agent training, and more. We hope that Text2World can serve as a crucial resource, laying the groundwork for future research in leveraging LLMs as world models.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.13092 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 2