Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
\n","updatedAt":"2025-03-10T03:43:27.158Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9175,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4382067322731018},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"67cf92efb02154baeeac6b83","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false},"createdAt":"2025-03-11T01:33:35.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [AirRAG: Activating Intrinsic Reasoning for Retrieval Augmented Generation via Tree-based Search](https://huggingface.co/papers/2501.10053) (2025)\n* [Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling](https://huggingface.co/papers/2501.11651) (2025)\n* [DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning](https://huggingface.co/papers/2501.12948) (2025)\n* [Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search](https://huggingface.co/papers/2502.02508) (2025)\n* [Visual-RFT: Visual Reinforcement Fine-Tuning](https://huggingface.co/papers/2503.01785) (2025)\n* [Towards Widening The Distillation Bottleneck for Reasoning Models](https://huggingface.co/papers/2503.01461) (2025)\n* [S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning](https://huggingface.co/papers/2502.12853) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-11T01:33:35.642Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7351886630058289},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.05592","authors":[{"_id":"67ce5fd2e5cdfda52b9123a4","user":{"_id":"66163dc8c7f45b3f893ff40b","avatarUrl":"/avatars/801043dac0caae90bbca8c9d3e2e203b.svg","isPro":false,"fullname":"Song Huatong","user":"XXsongLALA","type":"user"},"name":"Huatong Song","status":"admin_assigned","statusLastChangedAt":"2025-03-10T10:03:49.730Z","hidden":false},{"_id":"67ce5fd2e5cdfda52b9123a5","user":{"_id":"61b8405b516a20acdf3b85ff","avatarUrl":"/avatars/3d2eae7c163a80b73260087b05a4230b.svg","isPro":false,"fullname":"Jinhao Jiang","user":"Boru","type":"user"},"name":"Jinhao Jiang","status":"admin_assigned","statusLastChangedAt":"2025-03-10T10:04:22.446Z","hidden":false},{"_id":"67ce5fd2e5cdfda52b9123a6","user":{"_id":"6703ac76ea890f0ca5b225eb","avatarUrl":"/avatars/5f56c49a1940143d47dd484782a4abbf.svg","isPro":false,"fullname":"Yingqian Min","user":"EliverQ","type":"user"},"name":"Yingqian Min","status":"claimed_verified","statusLastChangedAt":"2025-03-10T09:40:54.171Z","hidden":false},{"_id":"67ce5fd2e5cdfda52b9123a7","user":{"_id":"651a29d566e78720a78317ec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/651a29d566e78720a78317ec/WKPcw6Ziqjl44pkrHtCVa.jpeg","isPro":false,"fullname":"Jie Chen","user":"survivi","type":"user"},"name":"Jie Chen","status":"claimed_verified","statusLastChangedAt":"2025-03-10T12:46:26.714Z","hidden":false},{"_id":"67ce5fd2e5cdfda52b9123a8","user":{"_id":"629b765ce1af194c641fcbc6","avatarUrl":"/avatars/7c53a4c2a1e528c19641a2b601731754.svg","isPro":false,"fullname":"Zhipeng Chen","user":"TimothyCzp","type":"user"},"name":"Zhipeng Chen","status":"claimed_verified","statusLastChangedAt":"2025-03-10T11:08:49.613Z","hidden":false},{"_id":"67ce5fd2e5cdfda52b9123a9","name":"Wayne Xin Zhao","hidden":false},{"_id":"67ce5fd2e5cdfda52b9123aa","name":"Lei Fang","hidden":false},{"_id":"67ce5fd2e5cdfda52b9123ab","user":{"_id":"64b8c89052b7353d8c6a1013","avatarUrl":"/avatars/cd59fffe81f6b07b4519540b8ff3d95f.svg","isPro":false,"fullname":"Ji-Rong Wen","user":"jrwen","type":"user"},"name":"Ji-Rong Wen","status":"admin_assigned","statusLastChangedAt":"2025-03-10T10:04:33.194Z","hidden":false}],"publishedAt":"2025-03-07T17:14:44.000Z","submittedOnDailyAt":"2025-03-10T02:13:27.151Z","title":"R1-Searcher: Incentivizing the Search Capability in LLMs via\n Reinforcement Learning","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Existing Large Reasoning Models (LRMs) have shown the potential of\nreinforcement learning (RL) to enhance the complex reasoning capabilities of\nLarge Language Models~(LLMs). While they achieve remarkable performance on\nchallenging tasks such as mathematics and coding, they often rely on their\ninternal knowledge to solve problems, which can be inadequate for\ntime-sensitive or knowledge-intensive questions, leading to inaccuracies and\nhallucinations. To address this, we propose R1-Searcher, a novel\ntwo-stage outcome-based RL approach designed to enhance the search capabilities\nof LLMs. This method allows LLMs to autonomously invoke external search systems\nto access additional knowledge during the reasoning process. Our framework\nrelies exclusively on RL, without requiring process rewards or distillation for\na cold start. % effectively generalizing to out-of-domain datasets and\nsupporting both Base and Instruct models. Our experiments demonstrate that our\nmethod significantly outperforms previous strong RAG methods, even when\ncompared to the closed-source GPT-4o-mini.","upvotes":27,"discussionId":"67ce5fd3e5cdfda52b912436","ai_summary":"R1-Searcher is a reinforcement learning approach that enhances large language models' reasoning by autonomously accessing external knowledge, achieving better performance than existing methods.","ai_keywords":["reinforcement learning","LLMs","LRM","R1-Searcher","two-stage outcome-based RL","external search systems","RL","RAG methods","GPT-4o-mini"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66f612b934b8ac9ffa44f084","avatarUrl":"/avatars/6836c122e19c66c90f1673f28b30d7f0.svg","isPro":false,"fullname":"Tang","user":"tommysally","type":"user"},{"_id":"67a37810ad1ebf3c241496c2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/azNl-spzRfn4YPsxkZpw7.png","isPro":false,"fullname":"Eric","user":"FightMilk69","type":"user"},{"_id":"662ddffd68000b73ef1e1e0b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/hnW-twD596WA8y7DfH0pK.jpeg","isPro":false,"fullname":"Naphat Permpredanun","user":"MisterOmelet","type":"user"},{"_id":"63c56cc80c24c8b53961728d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673882813861-noauth.png","isPro":true,"fullname":"Jisoo Kim","user":"kuotient","type":"user"},{"_id":"64d4615cf8082bf19b916492","avatarUrl":"/avatars/8e1b59565ec5e4b31090cf1b911781b9.svg","isPro":false,"fullname":"wongyukim","user":"wongyukim","type":"user"},{"_id":"65c20ee58aedd6edd2b89000","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65c20ee58aedd6edd2b89000/LtS4YTbmxiCFqHSGHfdC8.png","isPro":false,"fullname":"Chmielewski","user":"Eryk-Chmielewski","type":"user"},{"_id":"63b91450060d6595d2af4c76","avatarUrl":"/avatars/b7234c2f5ab22bba6c974b3744a72033.svg","isPro":false,"fullname":"ar tale","user":"Artale","type":"user"},{"_id":"633e570be7d5ce7bfe037a53","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/633e570be7d5ce7bfe037a53/zV8ULv4Mu7YIGZ8D3JtmK.jpeg","isPro":false,"fullname":"Zhaocheng Liu","user":"zhaocheng","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"63e774eedb40d9e67fec89b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676113119978-noauth.jpeg","isPro":false,"fullname":"Sarthak Thakur","user":"sarthak247","type":"user"},{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","isPro":true,"fullname":"Sugato Ray","user":"sugatoray","type":"user"},{"_id":"6703ac76ea890f0ca5b225eb","avatarUrl":"/avatars/5f56c49a1940143d47dd484782a4abbf.svg","isPro":false,"fullname":"Yingqian Min","user":"EliverQ","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
R1-Searcher is a reinforcement learning approach that enhances large language models' reasoning by autonomously accessing external knowledge, achieving better performance than existing methods.
AI-generated summary
Existing Large Reasoning Models (LRMs) have shown the potential of
reinforcement learning (RL) to enhance the complex reasoning capabilities of
Large Language Models~(LLMs). While they achieve remarkable performance on
challenging tasks such as mathematics and coding, they often rely on their
internal knowledge to solve problems, which can be inadequate for
time-sensitive or knowledge-intensive questions, leading to inaccuracies and
hallucinations. To address this, we propose R1-Searcher, a novel
two-stage outcome-based RL approach designed to enhance the search capabilities
of LLMs. This method allows LLMs to autonomously invoke external search systems
to access additional knowledge during the reasoning process. Our framework
relies exclusively on RL, without requiring process rewards or distillation for
a cold start. % effectively generalizing to out-of-domain datasets and
supporting both Base and Instruct models. Our experiments demonstrate that our
method significantly outperforms previous strong RAG methods, even when
compared to the closed-source GPT-4o-mini.