Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-10-03T01:33:49.383Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7482820153236389},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2510.01037","authors":[{"_id":"68de25726024653e8a3ed1f8","name":"Yongcheng Zeng","hidden":false},{"_id":"68de25726024653e8a3ed1f9","user":{"_id":"65d1b42f3da87ce21e33261a","avatarUrl":"/avatars/041cb441fa3871acde4ba565632056bf.svg","isPro":false,"fullname":"RubinSun","user":"RubinSun","type":"user"},"name":"Zexu Sun","status":"claimed_verified","statusLastChangedAt":"2025-10-02T13:54:41.772Z","hidden":false},{"_id":"68de25726024653e8a3ed1fa","name":"Bokai Ji","hidden":false},{"_id":"68de25726024653e8a3ed1fb","name":"Erxue Min","hidden":false},{"_id":"68de25726024653e8a3ed1fc","name":"Hengyi Cai","hidden":false},{"_id":"68de25726024653e8a3ed1fd","name":"Shuaiqiang Wang","hidden":false},{"_id":"68de25726024653e8a3ed1fe","name":"Dawei Yin","hidden":false},{"_id":"68de25726024653e8a3ed1ff","name":"Haifeng Zhang","hidden":false},{"_id":"68de25726024653e8a3ed200","name":"Xu Chen","hidden":false},{"_id":"68de25726024653e8a3ed201","name":"Jun Wang","hidden":false}],"publishedAt":"2025-10-01T15:41:27.000Z","submittedOnDailyAt":"2025-10-02T05:49:05.470Z","title":"CurES: From Gradient Analysis to Efficient Curriculum Learning for\n Reasoning LLMs","submittedOnDailyBy":{"_id":"65d1b42f3da87ce21e33261a","avatarUrl":"/avatars/041cb441fa3871acde4ba565632056bf.svg","isPro":false,"fullname":"RubinSun","user":"RubinSun","type":"user"},"summary":"Curriculum learning plays a crucial role in enhancing the training efficiency\nof large language models (LLMs) on reasoning tasks. However, existing methods\noften fail to adequately account for variations in prompt difficulty or rely on\nsimplistic filtering mechanisms to select prompt datasets within a narrow\ncriterion range, resulting in significant computational waste. In this work, we\napproach the problem from the perspective of reinforcement learning gradient\noptimization, offering a systematic and theoretical investigation into how to\nimprove the training efficiency of LLMs. We identify two key factors\ninfluencing training efficiency: the selection of training prompts and the\nallocation of rollout quantities across different prompts. Our theoretical\nanalysis reveals that the sampling distribution of prompts dictates the\nconvergence rate of gradient descent, while the allocation of the rollout\nquantity influences the consistency and stability of overall gradient updates.\nBased on these insights, we propose CurES, an efficient training method that\naccelerates convergence and employs Bayesian posterior estimation to minimize\ncomputational overhead. Experiments demonstrate that our CurES outperforms\nGroup Relative Policy Optimization (GRPO) by +3.30 points and\n+4.82 points with 1.5B and 7B models, respectively. Additionally,\nCurES exhibits faster convergence compared to baselines, including GRPO.","upvotes":2,"discussionId":"68de25726024653e8a3ed202","githubRepo":"https://github.com/ZexuSun/CurES","githubRepoAddedBy":"user","ai_summary":"CurES, a reinforcement learning-based method, improves the training efficiency of large language models by optimizing prompt selection and rollout allocation, leading to faster convergence and reduced computational overhead.","ai_keywords":["curriculum learning","large language models","reinforcement learning","gradient optimization","prompt selection","rollout quantities","gradient descent","Bayesian posterior estimation","Group Relative Policy Optimization"],"githubStars":6},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6587d5aa9835f3cce2b31d39","avatarUrl":"/avatars/3725279a63b772d93f83336a2d1387e9.svg","isPro":false,"fullname":"Vance","user":"Vance0124","type":"user"},{"_id":"65d1b42f3da87ce21e33261a","avatarUrl":"/avatars/041cb441fa3871acde4ba565632056bf.svg","isPro":false,"fullname":"RubinSun","user":"RubinSun","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2510.01037

CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

Published on Oct 1, 2025
· Submitted by
RubinSun
on Oct 2, 2025
Authors:
,
,
,
,
,
,
,
,

Abstract

CurES, a reinforcement learning-based method, improves the training efficiency of large language models by optimizing prompt selection and rollout allocation, leading to faster convergence and reduced computational overhead.

AI-generated summary

Curriculum learning plays a crucial role in enhancing the training efficiency of large language models (LLMs) on reasoning tasks. However, existing methods often fail to adequately account for variations in prompt difficulty or rely on simplistic filtering mechanisms to select prompt datasets within a narrow criterion range, resulting in significant computational waste. In this work, we approach the problem from the perspective of reinforcement learning gradient optimization, offering a systematic and theoretical investigation into how to improve the training efficiency of LLMs. We identify two key factors influencing training efficiency: the selection of training prompts and the allocation of rollout quantities across different prompts. Our theoretical analysis reveals that the sampling distribution of prompts dictates the convergence rate of gradient descent, while the allocation of the rollout quantity influences the consistency and stability of overall gradient updates. Based on these insights, we propose CurES, an efficient training method that accelerates convergence and employs Bayesian posterior estimation to minimize computational overhead. Experiments demonstrate that our CurES outperforms Group Relative Policy Optimization (GRPO) by +3.30 points and +4.82 points with 1.5B and 7B models, respectively. Additionally, CurES exhibits faster convergence compared to baselines, including GRPO.

Community

Paper author Paper submitter

This paper proposes a gradient analysis based budget allocation method for reinforcement learning in LLM reasoning.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.01037 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.01037 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.01037 in a Space README.md to link it from this page.

Collections including this paper 1