Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Learning from Failures in Multi-Attempt Reinforcement Learning
\n","updatedAt":"2025-03-10T03:29:35.522Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9175,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.18245919048786163},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"67cf933b7ab8e507df06b109","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false},"createdAt":"2025-03-11T01:34:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [R1-Zero's\"Aha Moment\"in Visual Reasoning on a 2B Non-SFT Model](https://huggingface.co/papers/2503.05132) (2025)\n* [Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling](https://huggingface.co/papers/2501.11651) (2025)\n* [R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning](https://huggingface.co/papers/2503.05592) (2025)\n* [Self-rewarding correction for mathematical reasoning](https://huggingface.co/papers/2502.19613) (2025)\n* [Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search](https://huggingface.co/papers/2502.02508) (2025)\n* [Visual-RFT: Visual Reinforcement Fine-Tuning](https://huggingface.co/papers/2503.01785) (2025)\n* [MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning](https://huggingface.co/papers/2502.18439) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-11T01:34:51.976Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7263253331184387},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.04808","authors":[{"_id":"67ce5c7065b141ae6b0d3957","user":{"_id":"67cef8b7d9f3ce4930069e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67cef8b7d9f3ce4930069e10/1o2rYNrmLPozUDaUo_nVE.png","isPro":false,"fullname":"Sephen Chung","user":"stephenchungmh","type":"user"},"name":"Stephen Chung","status":"claimed_verified","statusLastChangedAt":"2025-03-10T17:39:13.472Z","hidden":false},{"_id":"67ce5c7065b141ae6b0d3958","user":{"_id":"624c3d2ca19f20b197761ba9","avatarUrl":"/avatars/7a64b81c29f4f6700fa18effc5616865.svg","isPro":false,"fullname":"Wenyu Du","user":"wydu","type":"user"},"name":"Wenyu Du","status":"admin_assigned","statusLastChangedAt":"2025-03-10T10:27:00.543Z","hidden":false},{"_id":"67ce5c7065b141ae6b0d3959","user":{"_id":"641a6895fb5ffff5ac79d593","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641a6895fb5ffff5ac79d593/vxvwsto3llOEWGqQKGMYx.jpeg","isPro":false,"fullname":"Jie Fu","user":"bigaidream","type":"user"},"name":"Jie Fu","status":"claimed_verified","statusLastChangedAt":"2025-03-10T12:46:28.886Z","hidden":false}],"publishedAt":"2025-03-04T02:53:39.000Z","submittedOnDailyAt":"2025-03-10T01:59:35.505Z","title":"Learning from Failures in Multi-Attempt Reinforcement Learning","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Recent advancements in reinforcement learning (RL) for large language models\n(LLMs), exemplified by DeepSeek R1, have shown that even a simple\nquestion-answering task can substantially improve an LLM's reasoning\ncapabilities. In this work, we extend this approach by modifying the task into\na multi-attempt setting. Instead of generating a single response per question,\nthe model is given multiple attempts, with feedback provided after incorrect\nresponses. The multi-attempt task encourages the model to refine its previous\nattempts and improve search efficiency. Experimental results show that even a\nsmall LLM trained on a multi-attempt task achieves significantly higher\naccuracy when evaluated with more attempts, improving from 45.6% with 1 attempt\nto 52.5% with 2 attempts on the math benchmark. In contrast, the same LLM\ntrained on a standard single-turn task exhibits only a marginal improvement,\nincreasing from 42.3% to 43.2% when given more attempts during evaluation. The\nresults indicate that, compared to the standard single-turn task, an LLM\ntrained on a multi-attempt task achieves slightly better performance on math\nbenchmarks while also learning to refine its responses more effectively based\non user feedback. Full code is available at\nhttps://github.com/DualityRL/multi-attempt","upvotes":18,"discussionId":"67ce5c7165b141ae6b0d39c6","projectPage":"https://gossamer-bookcase-8a7.notion.site/Learning-From-Failures-in-Multi-Attempt-Reinforcement-Learning-1a6215521f3a80df9b14d48306a9f7a2","githubRepo":"https://github.com/DualityRL/multi-attempt","githubRepoAddedBy":"user","ai_summary":"Training large language models on a multi-attempt task with feedback improves their accuracy and response refinement compared to standard single-turn tasks.","ai_keywords":["reinforcement learning","large language models","multi-attempt setting","feedback","search efficiency","math benchmark"],"githubStars":19},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66f612b934b8ac9ffa44f084","avatarUrl":"/avatars/6836c122e19c66c90f1673f28b30d7f0.svg","isPro":false,"fullname":"Tang","user":"tommysally","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"624c3d2ca19f20b197761ba9","avatarUrl":"/avatars/7a64b81c29f4f6700fa18effc5616865.svg","isPro":false,"fullname":"Wenyu Du","user":"wydu","type":"user"},{"_id":"641a6895fb5ffff5ac79d593","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641a6895fb5ffff5ac79d593/vxvwsto3llOEWGqQKGMYx.jpeg","isPro":false,"fullname":"Jie Fu","user":"bigaidream","type":"user"},{"_id":"67cef8b7d9f3ce4930069e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67cef8b7d9f3ce4930069e10/1o2rYNrmLPozUDaUo_nVE.png","isPro":false,"fullname":"Sephen Chung","user":"stephenchungmh","type":"user"},{"_id":"643c7c0586ab6dbe34f1eae5","avatarUrl":"/avatars/aba29ed4a6092658900cf16f32c90f02.svg","isPro":false,"fullname":"Fares Obeid","user":"Fareso","type":"user"},{"_id":"679b02beb898ac90bf4bde73","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/679b02beb898ac90bf4bde73/hI-demi0EGMJIiFizsshU.jpeg","isPro":false,"fullname":"Amy Cosgrove","user":"Aiamy82","type":"user"},{"_id":"63130af7e29fb2e86d5baa7b","avatarUrl":"/avatars/89991d1e7d7632df99492178597106a8.svg","isPro":false,"fullname":"Erkin Alp","user":"erkinalp","type":"user"},{"_id":"651c80a26ba9ab9b9582c273","avatarUrl":"/avatars/e963452eafd21f517d800f2e58e0f918.svg","isPro":false,"fullname":"siyeng feng","user":"siyengfeng","type":"user"},{"_id":"65c20ee58aedd6edd2b89000","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65c20ee58aedd6edd2b89000/LtS4YTbmxiCFqHSGHfdC8.png","isPro":false,"fullname":"Chmielewski","user":"Eryk-Chmielewski","type":"user"},{"_id":"61aa376688c20eebf1e8deb3","avatarUrl":"/avatars/7c11dcb232c73547d7d87834be287822.svg","isPro":false,"fullname":"Hao Zhu","user":"ProKil","type":"user"},{"_id":"65c4063740d617a14238f3df","avatarUrl":"/avatars/726b1470e46ad71c9ec233f3f0f396ec.svg","isPro":false,"fullname":"Zikun Li","user":"zikun-li","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Training large language models on a multi-attempt task with feedback improves their accuracy and response refinement compared to standard single-turn tasks.
AI-generated summary
Recent advancements in reinforcement learning (RL) for large language models
(LLMs), exemplified by DeepSeek R1, have shown that even a simple
question-answering task can substantially improve an LLM's reasoning
capabilities. In this work, we extend this approach by modifying the task into
a multi-attempt setting. Instead of generating a single response per question,
the model is given multiple attempts, with feedback provided after incorrect
responses. The multi-attempt task encourages the model to refine its previous
attempts and improve search efficiency. Experimental results show that even a
small LLM trained on a multi-attempt task achieves significantly higher
accuracy when evaluated with more attempts, improving from 45.6% with 1 attempt
to 52.5% with 2 attempts on the math benchmark. In contrast, the same LLM
trained on a standard single-turn task exhibits only a marginal improvement,
increasing from 42.3% to 43.2% when given more attempts during evaluation. The
results indicate that, compared to the standard single-turn task, an LLM
trained on a multi-attempt task achieves slightly better performance on math
benchmarks while also learning to refine its responses more effectively based
on user feedback. Full code is available at
https://github.com/DualityRL/multi-attempt