Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
[go: Go Back, main page]

\"Screenshot

\n","updatedAt":"2025-04-17T09:40:39.395Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9179,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.20812149345874786},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"6801ac2ce3d3ab9ce5f0e898","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-04-18T01:34:36.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [ToRL: Scaling Tool-Integrated RL](https://huggingface.co/papers/2503.23383) (2025)\n* [Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models](https://huggingface.co/papers/2503.06749) (2025)\n* [Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning](https://huggingface.co/papers/2503.09516) (2025)\n* [Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't](https://huggingface.co/papers/2503.16219) (2025)\n* [R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning](https://huggingface.co/papers/2503.05592) (2025)\n* [Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning](https://huggingface.co/papers/2504.11354) (2025)\n* [ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning](https://huggingface.co/papers/2503.19470) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-04-18T01:34:36.642Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7596277594566345},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"680a3637db07e82a376515bd","author":{"_id":"62ac3bcec35bb36ff0785962","avatarUrl":"/avatars/edd4f14556abc39739bac951043a3065.svg","fullname":"李云济","name":"awdrgyjilplij","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2025-04-24T13:01:43.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"The model and code are 404","html":"

The model and code are 404

\n","updatedAt":"2025-04-24T13:01:43.860Z","author":{"_id":"62ac3bcec35bb36ff0785962","avatarUrl":"/avatars/edd4f14556abc39739bac951043a3065.svg","fullname":"李云济","name":"awdrgyjilplij","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9141681790351868},"editors":["awdrgyjilplij"],"editorAvatarUrls":["/avatars/edd4f14556abc39739bac951043a3065.svg"],"reactions":[],"isReport":false},"replies":[{"id":"68119295277b24e123b28840","author":{"_id":"649a95f951e1ea30f1475530","avatarUrl":"/avatars/d4d167c3985e702044e8290408dde661.svg","fullname":"Jamie Jiazhan Feng","name":"jzfeng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2025-04-30T03:01:41.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks! We fixed it yesterday. Now the model, data and code are available in https://retool-rl.github.io/ ","html":"

Thanks! We fixed it yesterday. Now the model, data and code are available in https://retool-rl.github.io/

\n","updatedAt":"2025-04-30T03:01:41.567Z","author":{"_id":"649a95f951e1ea30f1475530","avatarUrl":"/avatars/d4d167c3985e702044e8290408dde661.svg","fullname":"Jamie Jiazhan Feng","name":"jzfeng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.816728413105011},"editors":["jzfeng"],"editorAvatarUrls":["/avatars/d4d167c3985e702044e8290408dde661.svg"],"reactions":[],"isReport":false,"parentCommentId":"680a3637db07e82a376515bd"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2504.11536","authors":[{"_id":"6800cc7159e20f50cc282e87","user":{"_id":"649a95f951e1ea30f1475530","avatarUrl":"/avatars/d4d167c3985e702044e8290408dde661.svg","isPro":false,"fullname":"Jamie Jiazhan Feng","user":"jzfeng","type":"user"},"name":"Jiazhan Feng","status":"claimed_verified","statusLastChangedAt":"2025-04-18T09:40:16.327Z","hidden":false},{"_id":"6800cc7159e20f50cc282e88","user":{"_id":"64ce05c631c655ff8a2e183c","avatarUrl":"/avatars/f2de7f8a1348b05f46946085e3e9718e.svg","isPro":false,"fullname":"Shijue Huang","user":"JoeYing","type":"user"},"name":"Shijue Huang","status":"admin_assigned","statusLastChangedAt":"2025-04-17T11:11:05.968Z","hidden":false},{"_id":"6800cc7159e20f50cc282e89","name":"Xingwei Qu","hidden":false},{"_id":"6800cc7159e20f50cc282e8a","user":{"_id":"638efcf4c67af472d316d424","avatarUrl":"/avatars/97a57859d7d87a3a8f1bb41d32a72bc2.svg","isPro":false,"fullname":"Ge Zhang","user":"zhangysk","type":"user"},"name":"Ge Zhang","status":"admin_assigned","statusLastChangedAt":"2025-04-17T11:11:27.962Z","hidden":false},{"_id":"6800cc7159e20f50cc282e8b","user":{"_id":"643f37cce9d063936912048b","avatarUrl":"/avatars/25822ea5676a79b2e1ddf08d5fc2226c.svg","isPro":false,"fullname":"Yujia Qin","user":"YujiaHi","type":"user"},"name":"Yujia Qin","status":"admin_assigned","statusLastChangedAt":"2025-04-17T11:11:35.665Z","hidden":false},{"_id":"6800cc7159e20f50cc282e8c","name":"Baoquan Zhong","hidden":false},{"_id":"6800cc7159e20f50cc282e8d","user":{"_id":"6698fe04a188ffb7e412deb7","avatarUrl":"/avatars/e389e72e37916b74efc14724d56a0cf1.svg","isPro":false,"fullname":"Chengquan Jiang","user":"imjcqt","type":"user"},"name":"Chengquan Jiang","status":"admin_assigned","statusLastChangedAt":"2025-04-17T11:11:51.371Z","hidden":false},{"_id":"6800cc7159e20f50cc282e8e","user":{"_id":"671ca8f1299315f77400863a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/lB32V9C2R2D1rAXhmz73w.png","isPro":false,"fullname":"Jinxin Chi","user":"chijx","type":"user"},"name":"Jinxin Chi","status":"admin_assigned","statusLastChangedAt":"2025-04-17T11:11:57.367Z","hidden":false},{"_id":"6800cc7159e20f50cc282e8f","user":{"_id":"643f956635e2b54a42e7feba","avatarUrl":"/avatars/c6185f81ae8499ae866ad451c1cbf43b.svg","isPro":false,"fullname":"Wanjun Zhong","user":"WanjunZhong","type":"user"},"name":"Wanjun Zhong","status":"admin_assigned","statusLastChangedAt":"2025-04-17T11:12:03.535Z","hidden":false}],"publishedAt":"2025-04-15T18:10:22.000Z","submittedOnDailyAt":"2025-04-17T08:10:39.383Z","title":"ReTool: Reinforcement Learning for Strategic Tool Use in LLMs","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"While reasoning models (e.g., DeepSeek R1) trained with reinforcement\nlearning (RL), excel in textual reasoning, they struggle in scenarios requiring\nstructured problem-solving, such as geometric reasoning, concise computation,\nor complex equation solving-areas where computational tools like code\ninterpreters (CI) demonstrate distinct advantages. To bridge this gap, we\npropose ReTool, which enhances long-form reasoning with tool-integrated\nlearning, including two key features: (1) dynamic interleaving of real-time\ncode execution within natural language reasoning processes, and (2) an\nautomated RL paradigm that allows policy rollouts with multi-turn real-time\ncode execution and teaches the model in learning when and how to invoke tools\nbased on outcome feedback. ReTool employs a systematic training framework,\nbeginning with synthetic cold-start data generation to produce code-augmented\nlong-form reasoning traces for fine-tuning base models. Subsequent RL training\nleverages task outcomes as rewards to iteratively refine the model's tool use\nstrategy, enabling autonomous discovery of optimal tool invocation patterns\nwithout human priors. Experiments on the challenging MATH Olympiad benchmark\nAIME demonstrate ReTool's superiority: Our 32B model achieves 67% accuracy with\n400 training steps, outperforming text-based RL baseline (40% accuracy, 1080\nsteps) in efficiency and performance. Remarkably, ReTool-32B attains 72.5%\naccuracy in extended settings, surpassing OpenAI's o1-preview by 27.9%. Further\nanalysis reveals emergent behaviors such as code self-correction, signaling an\n''aha moment'' in which the model autonomously masters adaptive tool use. These\nfindings highlight the promise of outcome-driven tool integration for advancing\ncomplex mathematical reasoning and offer new insights into hybrid\nneuro-symbolic systems.","upvotes":63,"discussionId":"6800cc7359e20f50cc282f43","projectPage":"https://retool-rl.github.io/","githubRepo":"https://github.com/ReTool-RL/ReTool","githubRepoAddedBy":"user","ai_summary":"ReTool, a tool-integrated learning framework, enhances reasoning models with real-time code execution and reinforcement learning, significantly improving performance in structured problem-solving tasks like mathematical reasoning.","ai_keywords":["reasoning models","reinforcement learning","RL","code interpreters","CI","tool-integrated learning","long-form reasoning","real-time code execution","policy rollouts","multi-turn execution","synthetic cold-start data","fine-tuning","MATH Olympiad benchmark","AIME","accuracy","OpenAI's o1-preview","code self-correction","hybrid neuro-symbolic systems"],"githubStars":287},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"638efcf4c67af472d316d424","avatarUrl":"/avatars/97a57859d7d87a3a8f1bb41d32a72bc2.svg","isPro":false,"fullname":"Ge Zhang","user":"zhangysk","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"64ce05c631c655ff8a2e183c","avatarUrl":"/avatars/f2de7f8a1348b05f46946085e3e9718e.svg","isPro":false,"fullname":"Shijue Huang","user":"JoeYing","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},{"_id":"6462def82a83863b97c0611e","avatarUrl":"/avatars/c03e9cc7d75b0266fcc56ecb6ee62148.svg","isPro":false,"fullname":"Yuzhen Huang","user":"yuzhen17","type":"user"},{"_id":"649a95f951e1ea30f1475530","avatarUrl":"/avatars/d4d167c3985e702044e8290408dde661.svg","isPro":false,"fullname":"Jamie Jiazhan Feng","user":"jzfeng","type":"user"},{"_id":"641d161921964f8f6d4a20f9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641d161921964f8f6d4a20f9/hBV8MtJHkGd6tu1uucX_3.png","isPro":false,"fullname":"Pablo Carrera (パブロ)","user":"pabloce","type":"user"},{"_id":"630450a71dd5d3c62488535c","avatarUrl":"/avatars/9503a8cf3618b85b5ce0511c3f76e8be.svg","isPro":false,"fullname":"etrotta","user":"etrotta","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"633a00248f27255b6b54ea5f","avatarUrl":"/avatars/8ad54c2d8a42093923cbdd6f15e0d7a7.svg","isPro":false,"fullname":"dfuhoiysOHSVFh82934gfjklb","user":"huba-buba","type":"user"},{"_id":"65c20ee58aedd6edd2b89000","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65c20ee58aedd6edd2b89000/LtS4YTbmxiCFqHSGHfdC8.png","isPro":false,"fullname":"Chmielewski","user":"Eryk-Chmielewski","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">
Papers
arxiv:2504.11536

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Published on Apr 15, 2025
· Submitted by
AK
on Apr 17, 2025
#2 Paper of the day
Authors:
,
,

Abstract

ReTool, a tool-integrated learning framework, enhances reasoning models with real-time code execution and reinforcement learning, significantly improving performance in structured problem-solving tasks like mathematical reasoning.

AI-generated summary

While reasoning models (e.g., DeepSeek R1) trained with reinforcement learning (RL), excel in textual reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computation, or complex equation solving-areas where computational tools like code interpreters (CI) demonstrate distinct advantages. To bridge this gap, we propose ReTool, which enhances long-form reasoning with tool-integrated learning, including two key features: (1) dynamic interleaving of real-time code execution within natural language reasoning processes, and (2) an automated RL paradigm that allows policy rollouts with multi-turn real-time code execution and teaches the model in learning when and how to invoke tools based on outcome feedback. ReTool employs a systematic training framework, beginning with synthetic cold-start data generation to produce code-augmented long-form reasoning traces for fine-tuning base models. Subsequent RL training leverages task outcomes as rewards to iteratively refine the model's tool use strategy, enabling autonomous discovery of optimal tool invocation patterns without human priors. Experiments on the challenging MATH Olympiad benchmark AIME demonstrate ReTool's superiority: Our 32B model achieves 67% accuracy with 400 training steps, outperforming text-based RL baseline (40% accuracy, 1080 steps) in efficiency and performance. Remarkably, ReTool-32B attains 72.5% accuracy in extended settings, surpassing OpenAI's o1-preview by 27.9%. Further analysis reveals emergent behaviors such as code self-correction, signaling an ''aha moment'' in which the model autonomously masters adaptive tool use. These findings highlight the promise of outcome-driven tool integration for advancing complex mathematical reasoning and offer new insights into hybrid neuro-symbolic systems.

Community

Paper submitter

Screenshot 2025-04-17 at 2.40.11 PM.png

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

The model and code are 404

·
Paper author

Thanks! We fixed it yesterday. Now the model, data and code are available in https://retool-rl.github.io/

Sign up or log in to comment

Models citing this paper 3

Datasets citing this paper 3

Spaces citing this paper 1

Collections including this paper 21