Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
[go: Go Back, main page]

\"Screenshot

\n","updatedAt":"2026-02-10T09:31:23.946Z","author":{"_id":"68345345f4bbf856e2d708e2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68345345f4bbf856e2d708e2/L5H2HNCuWje3ti2tNbC5p.jpeg","fullname":"Yikun B","name":"Yikunb","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.46747687458992004},"editors":["Yikunb"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68345345f4bbf856e2d708e2/L5H2HNCuWje3ti2tNbC5p.jpeg"],"reactions":[{"reaction":"🔥","users":["jiaruz2","Yikunb"],"count":2}],"isReport":false}},{"id":"698b449e019c1997032d1119","author":{"_id":"698b2fdf79189bccac31574c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/698b2fdf79189bccac31574c/_305Nxxtakjo6awN-2Ptg.png","fullname":"Jennifer Alex","name":"JenniferAlex2","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-02-10T14:45:50.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Good work!","html":"

Good work!

\n","updatedAt":"2026-02-10T14:45:50.834Z","author":{"_id":"698b2fdf79189bccac31574c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/698b2fdf79189bccac31574c/_305Nxxtakjo6awN-2Ptg.png","fullname":"Jennifer Alex","name":"JenniferAlex2","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.882943868637085},"editors":["JenniferAlex2"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/698b2fdf79189bccac31574c/_305Nxxtakjo6awN-2Ptg.png"],"reactions":[],"isReport":false}},{"id":"698bde08d48d26ad4769b477","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-11T01:40:24.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization](https://huggingface.co/papers/2601.09233) (2026)\n* [TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT](https://huggingface.co/papers/2602.03073) (2026)\n* [RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning](https://huggingface.co/papers/2601.09253) (2026)\n* [KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning](https://huggingface.co/papers/2602.00400) (2026)\n* [Stable On-Policy Distillation through Adaptive Target Reformulation](https://huggingface.co/papers/2601.07155) (2026)\n* [Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models](https://huggingface.co/papers/2602.02244) (2026)\n* [Training-Trajectory-Aware Token Selection](https://huggingface.co/papers/2601.10348) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-11T01:40:24.459Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7367402911186218},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.08222","authors":[{"_id":"698ad20e1b2dc6b37d61b227","user":{"_id":"64afe1653361f887816da303","avatarUrl":"/avatars/320d71adacfad9dd5db064b4ed3dec2b.svg","isPro":false,"fullname":"chenzehao","user":"chhao","type":"user"},"name":"Zehao Chen","status":"claimed_verified","statusLastChangedAt":"2026-02-10T09:04:42.021Z","hidden":false},{"_id":"698ad20e1b2dc6b37d61b228","user":{"_id":"691570a75f0ada3ef3844ae4","avatarUrl":"/avatars/83d17ae85b92dbb5b7cb734bfafa1caf.svg","isPro":false,"fullname":"Gongxun Li","user":"AlexGeek","type":"user"},"name":"Gongxun Li","status":"claimed_verified","statusLastChangedAt":"2026-02-11T11:18:07.950Z","hidden":false},{"_id":"698ad20e1b2dc6b37d61b229","name":"Tianxiang Ai","hidden":false},{"_id":"698ad20e1b2dc6b37d61b22a","name":"Yifei Li","hidden":false},{"_id":"698ad20e1b2dc6b37d61b22b","name":"Zixuan Huang","hidden":false},{"_id":"698ad20e1b2dc6b37d61b22c","name":"Wang Zhou","hidden":false},{"_id":"698ad20e1b2dc6b37d61b22d","name":"Fuzhen Zhuang","hidden":false},{"_id":"698ad20e1b2dc6b37d61b22e","name":"Xianglong Liu","hidden":false},{"_id":"698ad20e1b2dc6b37d61b22f","name":"Jianxin Li","hidden":false},{"_id":"698ad20e1b2dc6b37d61b230","name":"Deqing Wang","hidden":false},{"_id":"698ad20e1b2dc6b37d61b231","user":{"_id":"68345345f4bbf856e2d708e2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68345345f4bbf856e2d708e2/L5H2HNCuWje3ti2tNbC5p.jpeg","isPro":false,"fullname":"Yikun B","user":"Yikunb","type":"user"},"name":"Yikun Ban","status":"claimed_verified","statusLastChangedAt":"2026-02-10T09:04:44.617Z","hidden":false}],"publishedAt":"2026-02-09T02:50:40.000Z","submittedOnDailyAt":"2026-02-10T04:36:28.975Z","title":"Weak-Driven Learning: How Weak Agents make Strong Agents Stronger","submittedOnDailyBy":{"_id":"68345345f4bbf856e2d708e2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68345345f4bbf856e2d708e2/L5H2HNCuWje3ti2tNbC5p.jpeg","isPro":false,"fullname":"Yikun B","user":"Yikunb","type":"user"},"summary":"As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.","upvotes":259,"discussionId":"698ad20e1b2dc6b37d61b232","githubRepo":"https://github.com/chenzehao82/Weak-Driven-Learning","githubRepoAddedBy":"user","ai_summary":"WMSS is a post-training paradigm that uses weak model checkpoints to identify and fill learning gaps, enabling continued improvement beyond conventional saturation points in large language models.","ai_keywords":["post-training optimization","large language models","saturation bottleneck","weak checkpoints","entropy dynamics","compensatory learning","learning gaps"],"githubStars":88},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6800fd30566cd3f32b4ed46c","avatarUrl":"/avatars/5038b1a56eb1eaefd9e8077678cf054d.svg","isPro":false,"fullname":"Chenzehao","user":"beichenhang","type":"user"},{"_id":"64afe1653361f887816da303","avatarUrl":"/avatars/320d71adacfad9dd5db064b4ed3dec2b.svg","isPro":false,"fullname":"chenzehao","user":"chhao","type":"user"},{"_id":"66226023babb810e883f8b65","avatarUrl":"/avatars/e71dfa96eff7687996def2896ec3b28c.svg","isPro":false,"fullname":"chen","user":"cywuuuu","type":"user"},{"_id":"67ad632b4a575270b5e75445","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/nVY1m9-1tkku4YxDYZeIf.jpeg","isPro":false,"fullname":"Charles Ashford","user":"charlesashford","type":"user"},{"_id":"68ee381bfbaa044920753b9f","avatarUrl":"/avatars/8608f6ebe94dee1d1e947ce5237118a5.svg","isPro":false,"fullname":"Jiang Yumo","user":"oNya685","type":"user"},{"_id":"69674175e9ace0d2ab6af043","avatarUrl":"/avatars/631111dc83987ebba57cc2668c92c532.svg","isPro":false,"fullname":"Fengkai Yang","user":"yangfengkai","type":"user"},{"_id":"68560c6344c32e24d63de604","avatarUrl":"/avatars/dd730743c7f147ef68fcb6c0d66114c9.svg","isPro":false,"fullname":"Xiaodong Lu","user":"Lxd99","type":"user"},{"_id":"662bb0bcc1f339211fb7cbe4","avatarUrl":"/avatars/e17aa0f7aeec65cd970f550295eff7c8.svg","isPro":false,"fullname":"lyf","user":"chaojidouding","type":"user"},{"_id":"662f76722994406cba20b0b8","avatarUrl":"/avatars/9f9cff824805b7bce5a3dc742113a64f.svg","isPro":false,"fullname":"Zhou Qi","user":"ZQ12356","type":"user"},{"_id":"6707ab5b5d6c7e5c57096616","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/HfvV-IMahJ93LGBvgoC-W.png","isPro":false,"fullname":"W","user":"Zinnic","type":"user"},{"_id":"6289e724603275fced34e804","avatarUrl":"/avatars/cbf7f62c51c37cf44cbff1141832b390.svg","isPro":false,"fullname":"Wenshu Geng","user":"nobrowning","type":"user"},{"_id":"68345345f4bbf856e2d708e2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68345345f4bbf856e2d708e2/L5H2HNCuWje3ti2tNbC5p.jpeg","isPro":false,"fullname":"Yikun B","user":"Yikunb","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">
Papers
arxiv:2602.08222

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Published on Feb 9
· Submitted by
Yikun B
on Feb 10
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,

Abstract

WMSS is a post-training paradigm that uses weak model checkpoints to identify and fill learning gaps, enabling continued improvement beyond conventional saturation points in large language models.

AI-generated summary

As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.

Community

Paper author Paper submitter

Weak-Driven Learning refers to a class of post-training paradigms in which the improvement of a strong model is driven by systematic discrepancies between its predictions and those of a weaker reference model (e.g., a historical checkpoint), rather than by imitation of a stronger teacher.

awesome

Paper author Paper submitter

Screenshot 2026-02-10 at 17.30.48

Good work!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.08222 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.08222 in a Space README.md to link it from this page.

Collections including this paper 7