Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Small Models Struggle to Learn from Strong Reasoners
https://arxiv.org/pdf/2502.03544v1\n","updatedAt":"2025-02-24T15:46:34.593Z","author":{"_id":"65dafc22ad7ccf910d7144da","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65dafc22ad7ccf910d7144da/bsGJXsVjwJTVoqSO0b1O3.jpeg","fullname":"Yuetai Li","name":"TaiGary","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.908169150352478},"editors":["TaiGary"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65dafc22ad7ccf910d7144da/bsGJXsVjwJTVoqSO0b1O3.jpeg"],"reactions":[{"reaction":"🤯","users":["StoProInc"],"count":1},{"reaction":"🚀","users":["StoProInc"],"count":1}],"isReport":false,"parentCommentId":"67b77059f64d99f9e34acb59"}}]},{"id":"67b7d84d3e8a45f770b4ec61","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false},"createdAt":"2025-02-21T01:35:09.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs](https://huggingface.co/papers/2502.12134) (2025)\n* [Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation](https://huggingface.co/papers/2502.12744) (2025)\n* [Towards Reasoning Ability of Small Language Models](https://huggingface.co/papers/2502.11569) (2025)\n* [Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling](https://huggingface.co/papers/2501.11651) (2025)\n* [LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!](https://huggingface.co/papers/2502.07374) (2025)\n* [NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions](https://huggingface.co/papers/2502.13124) (2025)\n* [Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging - An Open Recipe](https://huggingface.co/papers/2502.09056) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-02-21T01:35:09.381Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7654628753662109},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["StoProInc"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.12143","authors":[{"_id":"67b4d05a9f8a8ab661450397","name":"Yuetai Li","hidden":false},{"_id":"67b4d05a9f8a8ab661450398","user":{"_id":"6230d750d93e84e233882dbc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6230d750d93e84e233882dbc/4MGEekLW3oWzqeFWDWvIK.jpeg","isPro":false,"fullname":"Xiang Yue","user":"yuexiang96","type":"user"},"name":"Xiang Yue","status":"claimed_verified","statusLastChangedAt":"2025-07-02T16:08:22.034Z","hidden":false},{"_id":"67b4d05a9f8a8ab661450399","user":{"_id":"653df1323479e9ebbe3eb6cc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/653df1323479e9ebbe3eb6cc/K_g-r1iMRNKj99LXPuYF3.jpeg","isPro":true,"fullname":"Zhangchen Xu","user":"zhangchenxu","type":"user"},"name":"Zhangchen Xu","status":"claimed_verified","statusLastChangedAt":"2025-02-20T09:37:32.715Z","hidden":false},{"_id":"67b4d05a9f8a8ab66145039a","user":{"_id":"6531e1021dd8ebbdc1a6fd8e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6531e1021dd8ebbdc1a6fd8e/lIcl7zCPtzRsfiUh6uY1o.jpeg","isPro":false,"fullname":"Fengqing Jiang","user":"fqjiang","type":"user"},"name":"Fengqing Jiang","status":"admin_assigned","statusLastChangedAt":"2025-02-20T16:09:45.083Z","hidden":false},{"_id":"67b4d05a9f8a8ab66145039b","user":{"_id":"666dfd4770f5a2cb4aefd12f","avatarUrl":"/avatars/fa0e0dbc203a21e58dda8fdb4cbc67ad.svg","isPro":false,"fullname":"Luyao Niu","user":"LNIU","type":"user"},"name":"Luyao Niu","status":"admin_assigned","statusLastChangedAt":"2025-02-20T16:09:50.890Z","hidden":false},{"_id":"67b4d05a9f8a8ab66145039c","user":{"_id":"607f666a4ad99100d63ce35c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/607f666a4ad99100d63ce35c/QxhxnvfeV6efkxwUFHwjI.png","isPro":false,"fullname":"Bill Yuchen Lin","user":"yuchenlin","type":"user"},"name":"Bill Yuchen Lin","status":"admin_assigned","statusLastChangedAt":"2025-02-20T16:09:56.614Z","hidden":false},{"_id":"67b4d05a9f8a8ab66145039d","name":"Bhaskar Ramasubramanian","hidden":false},{"_id":"67b4d05a9f8a8ab66145039e","name":"Radha Poovendran","hidden":false}],"publishedAt":"2025-02-17T18:56:15.000Z","submittedOnDailyAt":"2025-02-20T00:08:13.468Z","title":"Small Models Struggle to Learn from Strong Reasoners","submittedOnDailyBy":{"_id":"653df1323479e9ebbe3eb6cc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/653df1323479e9ebbe3eb6cc/K_g-r1iMRNKj99LXPuYF3.jpeg","isPro":true,"fullname":"Zhangchen Xu","user":"zhangchenxu","type":"user"},"summary":"Large language models (LLMs) excel in complex reasoning tasks, and distilling\ntheir reasoning capabilities into smaller models has shown promise. However, we\nuncover an interesting phenomenon, which we term the Small Model Learnability\nGap: small models (leq3B parameters) do not consistently benefit from long\nchain-of-thought (CoT) reasoning or distillation from larger models. Instead,\nthey perform better when fine-tuned on shorter, simpler reasoning chains that\nbetter align with their intrinsic learning capacity. To address this, we\npropose Mix Distillation, a simple yet effective strategy that balances\nreasoning complexity by combining long and short CoT examples or reasoning from\nboth larger and smaller models. Our experiments demonstrate that Mix\nDistillation significantly improves small model reasoning performance compared\nto training on either data alone. These findings highlight the limitations of\ndirect strong model distillation and underscore the importance of adapting\nreasoning complexity for effective reasoning capability transfer.","upvotes":39,"discussionId":"67b4d05b9f8a8ab6614503cb","githubRepo":"https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap","githubRepoAddedBy":"auto","ai_summary":"Mix Distillation improves small model reasoning by balancing long and short chain-of-thought examples, addressing the Small Model Learnability Gap.","ai_keywords":["large language models (LLMs)","chain-of-thought (CoT)","distillation","small models","parameter-efficient fine-tuning","Mix Distillation"],"githubStars":20},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"653df1323479e9ebbe3eb6cc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/653df1323479e9ebbe3eb6cc/K_g-r1iMRNKj99LXPuYF3.jpeg","isPro":true,"fullname":"Zhangchen Xu","user":"zhangchenxu","type":"user"},{"_id":"65dafc22ad7ccf910d7144da","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65dafc22ad7ccf910d7144da/bsGJXsVjwJTVoqSO0b1O3.jpeg","isPro":false,"fullname":"Yuetai Li","user":"TaiGary","type":"user"},{"_id":"6531e1021dd8ebbdc1a6fd8e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6531e1021dd8ebbdc1a6fd8e/lIcl7zCPtzRsfiUh6uY1o.jpeg","isPro":false,"fullname":"Fengqing Jiang","user":"fqjiang","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6463554dd2044cd1d7c6e0bf","avatarUrl":"/avatars/d7653623117268c545a7063fec69664b.svg","isPro":false,"fullname":"Bingzheng Wei","user":"Bingzheng","type":"user"},{"_id":"6254f8e5d21e4cc386b881ad","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1649899774659-6254f8e5d21e4cc386b881ad.jpeg","isPro":false,"fullname":"Somshubra Majumdar","user":"smajumdar94","type":"user"},{"_id":"6577073fc2bf55b1f6bafb49","avatarUrl":"/avatars/58803398b1a918b7570db17893e65122.svg","isPro":false,"fullname":"Bencheng liao","user":"LegendBC","type":"user"},{"_id":"67a2f446b3d078524bed3924","avatarUrl":"/avatars/306a255a9db8e694d080df2b194bf3c5.svg","isPro":false,"fullname":"Ogi Ultra","user":"ogiultra","type":"user"},{"_id":"64b929308b53fb5dbd059ce3","avatarUrl":"/avatars/564c44cf45db794747a96f79f30ecd91.svg","isPro":false,"fullname":"Liu Songhua","user":"Huage001","type":"user"},{"_id":"64a84de2eb47b3552285ef74","avatarUrl":"/avatars/114e0cc393d0aea9680f3af6d84d6f46.svg","isPro":false,"fullname":"Eni Grand","user":"Enigrand","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"647014b0d742e9ef65222bdd","avatarUrl":"/avatars/0f9c5ef9d209928849695c8b1000a16b.svg","isPro":false,"fullname":"lishuyang","user":"lishuyangyang","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Mix Distillation improves small model reasoning by balancing long and short chain-of-thought examples, addressing the Small Model Learnability Gap.
AI-generated summary
Large language models (LLMs) excel in complex reasoning tasks, and distilling
their reasoning capabilities into smaller models has shown promise. However, we
uncover an interesting phenomenon, which we term the Small Model Learnability
Gap: small models (leq3B parameters) do not consistently benefit from long
chain-of-thought (CoT) reasoning or distillation from larger models. Instead,
they perform better when fine-tuned on shorter, simpler reasoning chains that
better align with their intrinsic learning capacity. To address this, we
propose Mix Distillation, a simple yet effective strategy that balances
reasoning complexity by combining long and short CoT examples or reasoning from
both larger and smaller models. Our experiments demonstrate that Mix
Distillation significantly improves small model reasoning performance compared
to training on either data alone. These findings highlight the limitations of
direct strong model distillation and underscore the importance of adapting
reasoning complexity for effective reasoning capability transfer.
Large language models (LLMs) excel in complex reasoning tasks, and distilling their reasoning capabilities into smaller models has shown promise. However, we uncover an interesting phenomenon, which we term the Small Model Learnability Gap: small models (≤3B parameters) do not consistently benefit from long chain-of-thought (CoT) reasoning or distillation from larger models. Instead, they perform better when fine-tuned on shorter, simpler reasoning chains that better align with their intrinsic learning capacity. To address this, we propose Mix Distillation, a simple yet effective strategy that balances reasoning complexity by combining long and short CoT examples or reasoning from both larger and smaller models. Our experiments demonstrate that Mix Distillation significantly improves small model reasoning performance compared to training on either data alone. These findings highlight the limitations of direct strong model distillation and underscore the importance of adapting reasoning complexity for effective reasoning capability transfer.
Mix distillation sounds interesting. From the abstract - The general take away is that tell the candidate model(here SLM) whatever its capacity to learn/hold, rather than bombarding with huge knowledge from which it retains a small percentage of it.
Yes! We think the small models do not posses enough "space" to carry all the complex reasoning knowledge, thus do not have good learning ability to long CoT data. Mix distillation is an eclectic way to balance the reasoning complexity. But I think if we want to solve this issue from the root node, we need to do continued training or use math expert model.