Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
[go: Go Back, main page]

https://github.com/WeiboAI/VibeThinker

\n

\"image\"

\n

\"image\"

\n

\"image\"

\n

\"image\"

\n","updatedAt":"2025-11-11T04:02:50.509Z","author":{"_id":"64d1faaa1ed6649d70d1fa2f","avatarUrl":"/avatars/388ba18df077eaa8e16a89e59bf852fa.svg","fullname":"YinZhiBin","name":"YinZhiBin","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.5748933553695679},"editors":["YinZhiBin"],"editorAvatarUrls":["/avatars/388ba18df077eaa8e16a89e59bf852fa.svg"],"reactions":[],"isReport":false}},{"id":"6913fdc0415e3559c4e1128a","author":{"_id":"6406991ec3ab325efa9b6732","avatarUrl":"/avatars/5ba29d9e25820c1172b5a98b078e416f.svg","fullname":"DenseHub","name":"YiZhouDenseHub","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-11-12T03:23:44.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"An extreme test that if 1.5B model can achieve strong reasoning ability","html":"

An extreme test that if 1.5B model can achieve strong reasoning ability

\n","updatedAt":"2025-11-12T03:23:44.786Z","author":{"_id":"6406991ec3ab325efa9b6732","avatarUrl":"/avatars/5ba29d9e25820c1172b5a98b078e416f.svg","fullname":"DenseHub","name":"YiZhouDenseHub","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8049046993255615},"editors":["YiZhouDenseHub"],"editorAvatarUrls":["/avatars/5ba29d9e25820c1172b5a98b078e416f.svg"],"reactions":[],"isReport":false}},{"id":"691401c478370adf61a6fb89","author":{"_id":"6406991ec3ab325efa9b6732","avatarUrl":"/avatars/5ba29d9e25820c1172b5a98b078e416f.svg","fullname":"DenseHub","name":"YiZhouDenseHub","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-11-12T03:40:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"\n![SimpleTestForVibeThinker](https://cdn-uploads.huggingface.co/production/uploads/6406991ec3ab325efa9b6732/mfilRss7GLVMN2Ne2TYRs.png)\nA simple evaluation (Still recommend you to test this model with competitive math / python algorithm tasks)","html":"

\"SimpleTestForVibeThinker\"
A simple evaluation (Still recommend you to test this model with competitive math / python algorithm tasks)

\n","updatedAt":"2025-11-12T03:40:52.605Z","author":{"_id":"6406991ec3ab325efa9b6732","avatarUrl":"/avatars/5ba29d9e25820c1172b5a98b078e416f.svg","fullname":"DenseHub","name":"YiZhouDenseHub","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6339861750602722},"editors":["YiZhouDenseHub"],"editorAvatarUrls":["/avatars/5ba29d9e25820c1172b5a98b078e416f.svg"],"reactions":[{"reaction":"🚀","users":["SenXbjtu","lsx666","YinZhiBin","dark-pen"],"count":4}],"isReport":false},"replies":[{"id":"6914890efd487463174a5d75","author":{"_id":"663b8ee3d6acd1aa1223eb09","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/663b8ee3d6acd1aa1223eb09/IshSiAz2hZ5gY2iILRCHY.jpeg","fullname":"Sasha","name":"Catlilface","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-12T13:18:06.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"There is no point of testing LLMs, even math-specific ones with such tasks. LLMs won't and are not supposed to be used like this. Please stop testing them like this, use calculator or tool instead.\nVibeThinker is an astonishing model. I've already tested it on writing some algorithms and, despite it's size, it handles it very well. Though, code optimization problem is still unsolvable in any meaningful way.","html":"

There is no point of testing LLMs, even math-specific ones with such tasks. LLMs won't and are not supposed to be used like this. Please stop testing them like this, use calculator or tool instead.
VibeThinker is an astonishing model. I've already tested it on writing some algorithms and, despite it's size, it handles it very well. Though, code optimization problem is still unsolvable in any meaningful way.

\n","updatedAt":"2025-11-12T13:18:06.134Z","author":{"_id":"663b8ee3d6acd1aa1223eb09","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/663b8ee3d6acd1aa1223eb09/IshSiAz2hZ5gY2iILRCHY.jpeg","fullname":"Sasha","name":"Catlilface","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.968446671962738},"editors":["Catlilface"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/663b8ee3d6acd1aa1223eb09/IshSiAz2hZ5gY2iILRCHY.jpeg"],"reactions":[{"reaction":"❤️","users":["YiZhouDenseHub","YinZhiBin","Keldon-Lee","Ankit1063"],"count":4}],"isReport":false,"parentCommentId":"691401c478370adf61a6fb89"}}]},{"id":"6914607e095311866826214c","author":{"_id":"63edb098679c2cc40abc6c2e","avatarUrl":"/avatars/288c7229937c2c3f29fda6d17c7df2eb.svg","fullname":"Xiangyu","name":"xixy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-12T10:25:02.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Nice work! Would you kindly share more details, such as RL training curve and SFT/RL performance?","html":"

Nice work! Would you kindly share more details, such as RL training curve and SFT/RL performance?

\n","updatedAt":"2025-11-12T10:25:02.622Z","author":{"_id":"63edb098679c2cc40abc6c2e","avatarUrl":"/avatars/288c7229937c2c3f29fda6d17c7df2eb.svg","fullname":"Xiangyu","name":"xixy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9760885834693909},"editors":["xixy"],"editorAvatarUrls":["/avatars/288c7229937c2c3f29fda6d17c7df2eb.svg"],"reactions":[],"isReport":false}},{"id":"691535b156b9127d79de4438","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-11-13T01:34:41.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model](https://huggingface.co/papers/2510.18855) (2025)\n* [Teaching Language Models to Reason with Tools](https://huggingface.co/papers/2510.20342) (2025)\n* [DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation](https://huggingface.co/papers/2511.06307) (2025)\n* [BARD: budget-aware reasoning distillation](https://huggingface.co/papers/2511.01470) (2025)\n* [NP-Engine: Empowering Optimization Reasoning in Large Language Models with Verifiable Synthetic NP Problems](https://huggingface.co/papers/2510.16476) (2025)\n* [THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning](https://huggingface.co/papers/2509.13761) (2025)\n* [A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning](https://huggingface.co/papers/2510.12838) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-11-13T01:34:41.046Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7356547117233276},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"69154bcdf1fc4ebecc132da6","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2025-11-13T03:09:01.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/tiny-model-big-logic-diversity-driven-optimization-elicits-large-model-reasoning-ability-in-vibethinker-15b","html":"

arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/tiny-model-big-logic-diversity-driven-optimization-elicits-large-model-reasoning-ability-in-vibethinker-15b

\n","updatedAt":"2025-11-13T03:09:01.692Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6124590635299683},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false}},{"id":"6915ee3ff686dc5e66f71b7e","author":{"_id":"62cd6e89c5cc157be83085b9","avatarUrl":"/avatars/b7b69fb2a9744fe521c6c5e0adcab099.svg","fullname":"Nafiz Hamid ","name":"nafizh","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-13T14:42:07.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Is the end to end training code available somewhere? If not, do you plan to release it? Thanks for your work.","html":"

Is the end to end training code available somewhere? If not, do you plan to release it? Thanks for your work.

\n","updatedAt":"2025-11-13T14:42:07.838Z","author":{"_id":"62cd6e89c5cc157be83085b9","avatarUrl":"/avatars/b7b69fb2a9744fe521c6c5e0adcab099.svg","fullname":"Nafiz Hamid ","name":"nafizh","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9442726373672485},"editors":["nafizh"],"editorAvatarUrls":["/avatars/b7b69fb2a9744fe521c6c5e0adcab099.svg"],"reactions":[],"isReport":false}},{"id":"6916427bf686dc5e66fbc450","author":{"_id":"67dc39d4d86111cac961ab8c","avatarUrl":"/avatars/2a26264e401110b62a53a76930eac5b5.svg","fullname":"Yasser","name":"cse2011","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-13T20:41:31.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Any plans for similar capable multimodal model ?","html":"

Any plans for similar capable multimodal model ?

\n","updatedAt":"2025-11-13T20:41:31.114Z","author":{"_id":"67dc39d4d86111cac961ab8c","avatarUrl":"/avatars/2a26264e401110b62a53a76930eac5b5.svg","fullname":"Yasser","name":"cse2011","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7900184988975525},"editors":["cse2011"],"editorAvatarUrls":["/avatars/2a26264e401110b62a53a76930eac5b5.svg"],"reactions":[],"isReport":false}},{"id":"6919b75a852038da6d1c9108","author":{"_id":"68655cde90bc248a83d761a9","avatarUrl":"/avatars/1dcbcb04b3eaa15b0d77be2a227d4dc7.svg","fullname":"Khan","name":"SAM000000000","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-16T11:36:58.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Off-Topic","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-11-16T11:37:21.767Z","author":{"_id":"68655cde90bc248a83d761a9","avatarUrl":"/avatars/1dcbcb04b3eaa15b0d77be2a227d4dc7.svg","fullname":"Khan","name":"SAM000000000","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"691ebed362d266b6c9a9801b","author":{"_id":"66869361e0917ff785ba7e9f","avatarUrl":"/avatars/3204efe8f92e211c570970d330a45c97.svg","fullname":"qj","name":"insigma","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-11-20T07:10:11.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"any ablation about mgpo vs grpo?","html":"

any ablation about mgpo vs grpo?

\n","updatedAt":"2025-11-20T07:10:11.042Z","author":{"_id":"66869361e0917ff785ba7e9f","avatarUrl":"/avatars/3204efe8f92e211c570970d330a45c97.svg","fullname":"qj","name":"insigma","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.596733808517456},"editors":["insigma"],"editorAvatarUrls":["/avatars/3204efe8f92e211c570970d330a45c97.svg"],"reactions":[],"isReport":false}},{"id":"694ad81f7a5b76389b11202d","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-12-23T17:57:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/tiny-model-big-logic-diversity-driven-optimization-elicits-large-model-reasoning-ability-in-vibethinker-1-5b-939-18244315\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/tiny-model-big-logic-diversity-driven-optimization-elicits-large-model-reasoning-ability-in-vibethinker-1-5b-939-18244315

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2025-12-23T17:57:51.765Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6410487294197083},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2511.06221","authors":[{"_id":"6912a1c7a644ba07c499c6e1","user":{"_id":"67486775ed2e4d9e50fc9117","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67486775ed2e4d9e50fc9117/WrCtPqY9X67ASbkUeloDF.jpeg","isPro":false,"fullname":"Sen Xu","user":"SenXbjtu","type":"user"},"name":"Sen Xu","status":"claimed_verified","statusLastChangedAt":"2025-11-11T19:44:27.021Z","hidden":false},{"_id":"6912a1c7a644ba07c499c6e2","user":{"_id":"6406991ec3ab325efa9b6732","avatarUrl":"/avatars/5ba29d9e25820c1172b5a98b078e416f.svg","isPro":false,"fullname":"DenseHub","user":"YiZhouDenseHub","type":"user"},"name":"Yi Zhou","status":"claimed_verified","statusLastChangedAt":"2025-11-11T19:44:32.245Z","hidden":false},{"_id":"6912a1c7a644ba07c499c6e3","name":"Wei Wang","hidden":false},{"_id":"6912a1c7a644ba07c499c6e4","user":{"_id":"6646fa14b096df5b522fd5f9","avatarUrl":"/avatars/98f67b6f8cb9776575916a2ba027f738.svg","isPro":false,"fullname":"MIN JIXIN","user":"JIXIN0121","type":"user"},"name":"Jixin Min","status":"claimed_verified","statusLastChangedAt":"2025-11-12T12:21:33.041Z","hidden":false},{"_id":"6912a1c7a644ba07c499c6e5","user":{"_id":"64d1faaa1ed6649d70d1fa2f","avatarUrl":"/avatars/388ba18df077eaa8e16a89e59bf852fa.svg","isPro":false,"fullname":"YinZhiBin","user":"YinZhiBin","type":"user"},"name":"Zhibin Yin","status":"claimed_verified","statusLastChangedAt":"2025-11-11T19:44:34.446Z","hidden":false},{"_id":"6912a1c7a644ba07c499c6e6","name":"Yingwei Dai","hidden":false},{"_id":"6912a1c7a644ba07c499c6e7","user":{"_id":"6422eba32f38c0a50cfdc77d","avatarUrl":"/avatars/0b1137ff258ba578c8b0e257a43716fa.svg","isPro":false,"fullname":"lsx","user":"lsx666","type":"user"},"name":"Shixi Liu","status":"claimed_verified","statusLastChangedAt":"2025-11-13T13:07:14.701Z","hidden":false},{"_id":"6912a1c7a644ba07c499c6e8","user":{"_id":"636b5433d524b220830e7a61","avatarUrl":"/avatars/09e9b392a189195aa09b93fc17b70cc3.svg","isPro":false,"fullname":"pangly","user":"pangly","type":"user"},"name":"Lianyu Pang","status":"claimed_verified","statusLastChangedAt":"2025-11-12T12:21:31.027Z","hidden":false},{"_id":"6912a1c7a644ba07c499c6e9","name":"Yirong Chen","hidden":false},{"_id":"6912a1c7a644ba07c499c6ea","user":{"_id":"668b5090101353874ced73d0","avatarUrl":"/avatars/b2ec34a321890140e97ddd69884132a8.svg","isPro":false,"fullname":"junlin zhang","user":"junlinzhang","type":"user"},"name":"Junlin Zhang","status":"claimed_verified","statusLastChangedAt":"2025-11-11T19:44:24.601Z","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6406991ec3ab325efa9b6732/4yVJjn-Y2UONHvzebYMkU.png"],"publishedAt":"2025-11-09T04:37:36.000Z","submittedOnDailyAt":"2025-11-12T00:53:44.764Z","title":"Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model\n Reasoning Ability in VibeThinker-1.5B","submittedOnDailyBy":{"_id":"6406991ec3ab325efa9b6732","avatarUrl":"/avatars/5ba29d9e25820c1172b5a98b078e416f.svg","isPro":false,"fullname":"DenseHub","user":"YiZhouDenseHub","type":"user"},"summary":"Challenging the prevailing consensus that small models inherently lack robust\nreasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense\nmodel developed via our Spectrum-to-Signal Principle (SSP). This challenges the\nprevailing approach of scaling model parameters to enhance capabilities, as\nseen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework\nfirst employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a\nbroad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL)\nto amplify the correct signal. With a total training cost of only $7,800,\nVibeThinker-1.5B demonstrates superior reasoning capabilities compared to\nclosed-source models like Magistral Medium and Claude Opus 4, and performs on\npar with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses\nthe 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8),\nAIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial\nimprovement over its base model (6.7, 4.3, and 0.6, respectively). On\nLiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its\nbase model's 0.0. These findings demonstrate that small models can achieve\nreasoning capabilities comparable to large models, drastically reducing\ntraining and inference costs and thereby democratizing advanced AI research.","upvotes":133,"discussionId":"6912a1c7a644ba07c499c6eb","projectPage":"https://github.com/WeiboAI/VibeThinker","githubRepo":"https://github.com/WeiboAI/VibeThinker","githubRepoAddedBy":"user","ai_summary":"VibeThinker-1.5B, a 1.5B-parameter model using the Spectrum-to-Signal Principle, achieves superior reasoning capabilities compared to larger models at a significantly lower cost.","ai_keywords":["Spectrum-to-Signal Principle","Two-Stage Diversity-Exploring Distillation","MaxEnt-Guided Policy Optimization","VibeThinker-1.5B","DeepSeek R1","Kimi k2","Magistral Medium","Claude Opus 4","GPT OSS-20B Medium","AIME24","AIME25","HMMT25","LiveCodeBench V6"],"githubStars":572,"organization":{"_id":"68c8059479c43cfa50f36156","name":"WeiboAI","fullname":"WeiboAI","avatar":"https://cdn-uploads.huggingface.co/production/uploads/64d1faaa1ed6649d70d1fa2f/lZVm6Yuiif9cdr5KsnfZr.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"68a6e9ac8258e261e1f63d75","avatarUrl":"/avatars/c299ba2255d40a9656dcbffdf9bae7b3.svg","isPro":false,"fullname":"xue","user":"linliaaa","type":"user"},{"_id":"6422eba32f38c0a50cfdc77d","avatarUrl":"/avatars/0b1137ff258ba578c8b0e257a43716fa.svg","isPro":false,"fullname":"lsx","user":"lsx666","type":"user"},{"_id":"636b5433d524b220830e7a61","avatarUrl":"/avatars/09e9b392a189195aa09b93fc17b70cc3.svg","isPro":false,"fullname":"pangly","user":"pangly","type":"user"},{"_id":"6646fa14b096df5b522fd5f9","avatarUrl":"/avatars/98f67b6f8cb9776575916a2ba027f738.svg","isPro":false,"fullname":"MIN JIXIN","user":"JIXIN0121","type":"user"},{"_id":"668b5090101353874ced73d0","avatarUrl":"/avatars/b2ec34a321890140e97ddd69884132a8.svg","isPro":false,"fullname":"junlin zhang","user":"junlinzhang","type":"user"},{"_id":"650abd45b5a04c8675f27b83","avatarUrl":"/avatars/003eec3c598afef2a4d0f5c757098302.svg","isPro":false,"fullname":"Chen","user":"cyrmaster","type":"user"},{"_id":"631a01090d7478ae006c5509","avatarUrl":"/avatars/ddaca6e953d0f7b923b3d67580e144e4.svg","isPro":false,"fullname":"wei mengxi","user":"mengxi8","type":"user"},{"_id":"6566979a87d3a852eec630cc","avatarUrl":"/avatars/d97d98abab0ff644a548a2d69b7b89c9.svg","isPro":false,"fullname":"teng","user":"fei2341","type":"user"},{"_id":"64e581b610b73c283872cf68","avatarUrl":"/avatars/cd455f73d269c9a1a15399ff4d0061ee.svg","isPro":false,"fullname":"jing wang","user":"leisurehippo","type":"user"},{"_id":"6232d58473df4db789c78682","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678843753419-6232d58473df4db789c78682.jpeg","isPro":false,"fullname":"yuanz","user":"yuanzhoulvpi","type":"user"},{"_id":"66a9fce1e0b37f3ea1b6e020","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66a9fce1e0b37f3ea1b6e020/8yIcsub6lXdi1ZSV_sRxk.jpeg","isPro":false,"fullname":"zejuncao","user":"ZejunCao","type":"user"},{"_id":"64a50c18143b1c7b58090b87","avatarUrl":"/avatars/6ad7b142cbdc6a406413ff27bcf8e1ef.svg","isPro":false,"fullname":"Siyao Wu","user":"siya5411","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1,"organization":{"_id":"68c8059479c43cfa50f36156","name":"WeiboAI","fullname":"WeiboAI","avatar":"https://cdn-uploads.huggingface.co/production/uploads/64d1faaa1ed6649d70d1fa2f/lZVm6Yuiif9cdr5KsnfZr.png"}}">
Papers
arxiv:2511.06221

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Published on Nov 9, 2025
· Submitted by
DenseHub
on Nov 12, 2025
#1 Paper of the day
Authors:
Sen Xu ,
,
,
,

Abstract

VibeThinker-1.5B, a 1.5B-parameter model using the Spectrum-to-Signal Principle, achieves superior reasoning capabilities compared to larger models at a significantly lower cost.

AI-generated summary

Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.

Community

Through the innovative Spectrum-to-Signal Principle (SSP) training methodology, the 1.5B-parameter VibeThinker-1.5B surpasses giant models hundreds of times larger across multiple reasoning benchmarks, demonstrating at an extremely low cost that small models can also achieve top-tier reasoning capabilities.

GitHub https://github.com/WeiboAI/VibeThinker

image

image

image

image

Paper author Paper submitter

An extreme test that if 1.5B model can achieve strong reasoning ability

Paper author Paper submitter

SimpleTestForVibeThinker
A simple evaluation (Still recommend you to test this model with competitive math / python algorithm tasks)

·

There is no point of testing LLMs, even math-specific ones with such tasks. LLMs won't and are not supposed to be used like this. Please stop testing them like this, use calculator or tool instead.
VibeThinker is an astonishing model. I've already tested it on writing some algorithms and, despite it's size, it handles it very well. Though, code optimization problem is still unsolvable in any meaningful way.

Nice work! Would you kindly share more details, such as RL training curve and SFT/RL performance?

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Is the end to end training code available somewhere? If not, do you plan to release it? Thanks for your work.

Any plans for similar capable multimodal model ?

This comment has been hidden (marked as Off-Topic)

any ablation about mgpo vs grpo?

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.06221 in a dataset README.md to link it from this page.

Spaces citing this paper 11

Collections including this paper 16