Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-01-28T01:37:30.899Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7282543778419495},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"69797a7eb8a99cd529dd3fa5","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-01-28T02:54:54.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/deepplanning-benchmarking-long-horizon-agentic-planning-with-verifiable-constraints-1874-4d8c8070\n\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"
\n","updatedAt":"2026-01-28T02:54:54.380Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6475877165794373},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.18137","authors":[{"_id":"69784aee026bdf0473116f4e","name":"Yinger Zhang","hidden":false},{"_id":"69784aee026bdf0473116f4f","user":{"_id":"6746c6c99700a50f13a0eda9","avatarUrl":"/avatars/8e8b82a9b73ff807d976fc48bb2e3edc.svg","isPro":false,"fullname":"Shutong Jiang","user":"Stjiang","type":"user"},"name":"Shutong Jiang","status":"admin_assigned","statusLastChangedAt":"2026-01-27T10:23:07.229Z","hidden":false},{"_id":"69784aee026bdf0473116f50","user":{"_id":"64abc87cf79cb0c313821c11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64abc87cf79cb0c313821c11/tJa4iX5_f5XoSIzXt-rwF.jpeg","isPro":false,"fullname":"Renhao Li","user":"RioLee","type":"user"},"name":"Renhao Li","status":"admin_assigned","statusLastChangedAt":"2026-01-27T10:23:38.820Z","hidden":false},{"_id":"69784aee026bdf0473116f51","user":{"_id":"654bead777401b47e6424f88","avatarUrl":"/avatars/7bcbdbb051c93b004f0dc3ad36c4a0ce.svg","isPro":false,"fullname":"Jianhong Tu","user":"JianhongTu","type":"user"},"name":"Jianhong Tu","status":"admin_assigned","statusLastChangedAt":"2026-01-27T10:23:44.994Z","hidden":false},{"_id":"69784aee026bdf0473116f52","name":"Yang Su","hidden":false},{"_id":"69784aee026bdf0473116f53","name":"Lianghao Deng","hidden":false},{"_id":"69784aee026bdf0473116f54","name":"Xudong Guo","hidden":false},{"_id":"69784aee026bdf0473116f55","name":"Chenxu Lv","hidden":false},{"_id":"69784aee026bdf0473116f56","user":{"_id":"620760a26e3b7210c2ff1943","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/VC-rKqimF6yxGESNVlPoR.jpeg","isPro":false,"fullname":"Junyang Lin","user":"JustinLin610","type":"user"},"name":"Junyang Lin","status":"admin_assigned","statusLastChangedAt":"2026-01-27T10:24:12.538Z","hidden":false}],"publishedAt":"2026-01-26T04:43:49.000Z","submittedOnDailyAt":"2026-01-27T07:52:17.482Z","title":"DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints","submittedOnDailyBy":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},"summary":"While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. Meanwhile, existing LLM planning benchmarks underrepresent the active information gathering and fine-grained local constraints typical of real-world settings. To address this, we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning. It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. Evaluations on DeepPlanning show that even frontier agentic LLMs struggle with these problems, highlighting the importance of reliable explicit reasoning patterns and parallel tool use for achieving better effectiveness-efficiency trade-offs. Error analysis further points to promising directions for improving agentic LLMs over long planning horizons. We open-source the code and data to support future research.","upvotes":26,"discussionId":"69784aef026bdf0473116f57","ai_summary":"DeepPlanning benchmark addresses limitations of current LLM planning assessments by introducing complex, real-world tasks requiring both global optimization and local constraint reasoning.","ai_keywords":["agent evaluation","long-horizon tasks","global constrained optimization","local constrained reasoning","agentic LLMs","explicit reasoning patterns","parallel tool use"],"organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-uploads.huggingface.co/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"63e0eea7af523c37e5a77966","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678663263366-63e0eea7af523c37e5a77966.jpeg","isPro":true,"fullname":"Nathan Habib","user":"SaylorTwift","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"},{"_id":"655e4c26d5c0d3db535cdd66","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655e4c26d5c0d3db535cdd66/7gUJ8urq7mEZ4OE4ppQCj.png","isPro":false,"fullname":"Lincoln","user":"Presidentlin","type":"user"},{"_id":"64abc87cf79cb0c313821c11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64abc87cf79cb0c313821c11/tJa4iX5_f5XoSIzXt-rwF.jpeg","isPro":false,"fullname":"Renhao Li","user":"RioLee","type":"user"},{"_id":"66a09c7199ce1e9e9f5cbac0","avatarUrl":"/avatars/3a166f7b145786a3144a173a0e2ff32f.svg","isPro":false,"fullname":"YY","user":"ZZZ1998","type":"user"},{"_id":"6978ca81ad94585f41a5b139","avatarUrl":"/avatars/ddf5b78737b8e8ed854decb46318480f.svg","isPro":false,"fullname":"remoteworker","user":"remoteworker","type":"user"},{"_id":"68d7c8e0f2f999edd0cfcbb4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/5SyJU2oALWxHj4yKJtDk3.jpeg","isPro":false,"fullname":"Nikolai Skripko","user":"NikolaiSkripko","type":"user"},{"_id":"670504a4edc0bfcf216f5ee2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/670504a4edc0bfcf216f5ee2/rYsGW3yy1-dI3nK76ALWR.png","isPro":false,"fullname":"Zongmin Yu","user":"zongmin-yu","type":"user"},{"_id":"6947f69751d7ae7c3c7b6908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/PuIDZB9XDShHohKhYmdmp.png","isPro":true,"fullname":"Ben Kelly","user":"YellowjacketGames","type":"user"},{"_id":"68d3a5744b819cf36b787bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68d3a5744b819cf36b787bb7/18tHvxolEbtJlIM5kzC78.jpeg","isPro":false,"fullname":"Heisenberg YGM","user":"goodman2001","type":"user"},{"_id":"65dba1f1b62d242ed88b2d2a","avatarUrl":"/avatars/e35ef7687e217e6ab71ad76cef59ea21.svg","isPro":false,"fullname":"Gibran Iqbal","user":"Jibbscript","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-uploads.huggingface.co/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"}}">
DeepPlanning benchmark addresses limitations of current LLM planning assessments by introducing complex, real-world tasks requiring both global optimization and local constraint reasoning.
AI-generated summary
While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. Meanwhile, existing LLM planning benchmarks underrepresent the active information gathering and fine-grained local constraints typical of real-world settings. To address this, we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning. It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. Evaluations on DeepPlanning show that even frontier agentic LLMs struggle with these problems, highlighting the importance of reliable explicit reasoning patterns and parallel tool use for achieving better effectiveness-efficiency trade-offs. Error analysis further points to promising directions for improving agentic LLMs over long planning horizons. We open-source the code and data to support future research.