Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
[go: Go Back, main page]

https://github.com/deepseek-ai/DeepSeek-R1

\n","updatedAt":"2025-01-23T03:27:48.688Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9177,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.38184067606925964},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[{"reaction":"🔥","users":["AdinaY","Garysk","dtanow","purewhite42","PilotSB","wgpubs","victor","alex-anast","Cnydo","mapicccy","rksiitd","jpereyra182","sheamusmcg","alehc","pedrobrantes","erdiari"],"count":16}],"isReport":false}},{"id":"6792edf2a1ec7e31d5cea612","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-01-24T01:33:38.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization](https://huggingface.co/papers/2412.18279) (2024)\n* [Offline Reinforcement Learning for LLM Multi-Step Reasoning](https://huggingface.co/papers/2412.16145) (2024)\n* [Reasoning Language Models: A Blueprint](https://huggingface.co/papers/2501.11223) (2025)\n* [Learning to Generate Research Idea with Dynamic Control](https://huggingface.co/papers/2412.14626) (2024)\n* [Search-o1: Agentic Search-Enhanced Large Reasoning Models](https://huggingface.co/papers/2501.05366) (2025)\n* [Skill-Enhanced Reinforcement Learning Acceleration from Demonstrations](https://huggingface.co/papers/2412.06207) (2024)\n* [Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback](https://huggingface.co/papers/2412.06827) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-01-24T01:33:38.424Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7249554395675659},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["lyralix","kkdatahat"],"count":2}],"isReport":false}},{"id":"67952f1a7b3a8e8418ae1a78","author":{"_id":"665edfcf2b842ec980842bd4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665edfcf2b842ec980842bd4/GJHNPJ3ULIMEMq6VGxZaI.png","fullname":"AI Papers Academy","name":"aipapersacademy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-01-25T18:36:10.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"A written and video review - https://aipapersacademy.com/deepseek-r1/","html":"

A written and video review - https://aipapersacademy.com/deepseek-r1/

\n","updatedAt":"2025-01-25T18:36:10.253Z","author":{"_id":"665edfcf2b842ec980842bd4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665edfcf2b842ec980842bd4/GJHNPJ3ULIMEMq6VGxZaI.png","fullname":"AI Papers Academy","name":"aipapersacademy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8594011664390564},"editors":["aipapersacademy"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/665edfcf2b842ec980842bd4/GJHNPJ3ULIMEMq6VGxZaI.png"],"reactions":[],"isReport":false}},{"id":"67966da00972df3a956d3219","author":{"_id":"648a210e9da3cc3506961585","avatarUrl":"/avatars/808e9d7ac99837fe79169d0b8d49c366.svg","fullname":"Ajith V Prabhakar","name":"ajithprabhakar","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-01-26T17:15:12.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Here is the Ajith's AI Pulse article on this paper : https://ajithp.com/2025/01/26/deepseek-r1-ai-reasoning/ ","html":"

Here is the Ajith's AI Pulse article on this paper : https://ajithp.com/2025/01/26/deepseek-r1-ai-reasoning/

\n","updatedAt":"2025-01-26T17:15:12.854Z","author":{"_id":"648a210e9da3cc3506961585","avatarUrl":"/avatars/808e9d7ac99837fe79169d0b8d49c366.svg","fullname":"Ajith V Prabhakar","name":"ajithprabhakar","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8754178285598755},"editors":["ajithprabhakar"],"editorAvatarUrls":["/avatars/808e9d7ac99837fe79169d0b8d49c366.svg"],"reactions":[],"isReport":false}},{"id":"67b55b0611f0113e4783ebb9","author":{"_id":"67078de07b7e85486d5234e3","avatarUrl":"/avatars/f01d934f72ed21245684722688608017.svg","fullname":"DeepNLP","name":"DeepNLP","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":26,"isUserFollowing":false},"createdAt":"2025-02-19T04:16:06.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Bookmark of GRPO Equation Latex Code and Term Explanation\n\n![image.png](https://cdn-uploads.huggingface.co/production/uploads/67078de07b7e85486d5234e3/pYBpQlqSZyBSfJx1G9ycu.png)\n\nEquation: http://www.deepnlp.org/equation/group-relative-policy-optimization-grpo \nEquation Search Engine (http://www.deepnlp.org/search/equation) and Paper related AI Agents List (http://www.deepnlp.org/store/ai-agent)","html":"

Bookmark of GRPO Equation Latex Code and Term Explanation

\n

\"image.png\"

\n

Equation: http://www.deepnlp.org/equation/group-relative-policy-optimization-grpo
Equation Search Engine (http://www.deepnlp.org/search/equation) and Paper related AI Agents List (http://www.deepnlp.org/store/ai-agent)

\n","updatedAt":"2025-02-19T04:16:06.179Z","author":{"_id":"67078de07b7e85486d5234e3","avatarUrl":"/avatars/f01d934f72ed21245684722688608017.svg","fullname":"DeepNLP","name":"DeepNLP","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":26,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.590968132019043},"editors":["DeepNLP"],"editorAvatarUrls":["/avatars/f01d934f72ed21245684722688608017.svg"],"reactions":[],"isReport":false}},{"id":"6812b3ed4f83eb3127f123b7","author":{"_id":"5e67bdd61009063689407479","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583857146757-5e67bdd61009063689407479.jpeg","fullname":"Clem 🤗","name":"clem","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":2868,"isUserFollowing":false},"createdAt":"2025-04-30T23:36:13.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Good ressources to understand the paper: https://huggingface.co/learn/llm-course/chapter12/3 & https://www.youtube.com/watch?v=1xDVbu-WaFo","html":"

Good ressources to understand the paper: https://huggingface.co/learn/llm-course/chapter12/3 & https://www.youtube.com/watch?v=1xDVbu-WaFo

\n","updatedAt":"2025-04-30T23:36:13.554Z","author":{"_id":"5e67bdd61009063689407479","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583857146757-5e67bdd61009063689407479.jpeg","fullname":"Clem 🤗","name":"clem","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":2868,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6316697001457214},"editors":["clem"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1583857146757-5e67bdd61009063689407479.jpeg"],"reactions":[],"isReport":false}},{"id":"68ad7900356b5399efa5c2e8","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2025-08-26T09:06:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning","html":"

arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning

\n","updatedAt":"2025-08-26T09:06:08.215Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7128397822380066},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false}},{"id":"694af0264f715c34acd48b7c","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-12-23T19:40:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning-1485-bee5358e\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning-1485-bee5358e

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2025-12-23T19:40:22.730Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6766231656074524},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}},{"id":"698d2c13df518c098457906e","author":{"_id":"68d1b5913b006bc6dbfaca5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68d1b5913b006bc6dbfaca5e/D08XtKXBkzLT-7Y5LD-pX.png","fullname":"Web4app","name":"Seriki","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-02-12T01:25:39.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"## Paperweb ![`aura`](https://arxiv.org/pdf/2405.01535)","html":"

\n\t\n\t\t\n\t\n\t\n\t\tPaperweb \"`aura`\"\n\t\n

\n","updatedAt":"2026-02-12T01:31:39.541Z","author":{"_id":"68d1b5913b006bc6dbfaca5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68d1b5913b006bc6dbfaca5e/D08XtKXBkzLT-7Y5LD-pX.png","fullname":"Web4app","name":"Seriki","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":3,"identifiedLanguage":{"language":"en","probability":0.35375839471817017},"editors":["Seriki"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68d1b5913b006bc6dbfaca5e/D08XtKXBkzLT-7Y5LD-pX.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2501.12948","authors":[{"_id":"6791b70a76d05e183a411598","name":"DeepSeek-AI","hidden":false},{"_id":"6791b70a76d05e183a411599","user":{"_id":"653df20eaa1f487614da4db1","avatarUrl":"/avatars/12b27ce2c59f53b7e464039deab36a5d.svg","isPro":false,"fullname":"Daya Guo","user":"guoday","type":"user"},"name":"Daya Guo","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:23:30.293Z","hidden":false},{"_id":"6791b70a76d05e183a41159a","user":{"_id":"6225bb44c6e650de3a65dbaa","avatarUrl":"/avatars/99c99ced2461978df572c27c1b3a4904.svg","isPro":false,"fullname":"DejianYang","user":"DejianYang","type":"user"},"name":"Dejian Yang","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:23:47.465Z","hidden":false},{"_id":"6791b70a76d05e183a41159b","name":"Haowei Zhang","hidden":true},{"_id":"6791b70a76d05e183a41159c","user":{"_id":"6565a2dd131d13ccc5d8cb12","avatarUrl":"/avatars/f5c5441ba74791b64c9740911f952bac.svg","isPro":false,"fullname":"Junxiao Song","user":"haha-point","type":"user"},"name":"Junxiao Song","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:24:07.882Z","hidden":false},{"_id":"6791b70a76d05e183a41159d","name":"Ruoyu Zhang","hidden":false},{"_id":"6791b70a76d05e183a41159e","user":{"_id":"672ddc3bf5257413d3f461a0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/8NjMAcCHZeVWoOsUCjsto.png","isPro":false,"fullname":"XuRunXin","user":"AS-7","type":"user"},"name":"Runxin Xu","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:24:44.711Z","hidden":false},{"_id":"6791b70a76d05e183a41159f","user":{"_id":"63cd76b4374057a338e8e703","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63cd76b4374057a338e8e703/i4Qk5-0aYx3oRhC8b50aJ.jpeg","isPro":false,"fullname":"zhuqihao","user":"zqh11","type":"user"},"name":"Qihao Zhu","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:25:09.091Z","hidden":false},{"_id":"6791b70a76d05e183a4115a0","user":{"_id":"6482e57a04f67f5f6056a61b","avatarUrl":"/avatars/b26faf19ba1493b91102ac7978ab3230.svg","isPro":false,"fullname":"Shirong Ma","user":"msr2000","type":"user"},"name":"Shirong Ma","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:25:35.617Z","hidden":false},{"_id":"6791b70a76d05e183a4115a1","user":{"_id":"656873f33fd0bf1f82558695","avatarUrl":"/avatars/7a085da2e2a91d7f41988501a573ebf9.svg","isPro":false,"fullname":"PEIYI, WANG","user":"peiyiwang89","type":"user"},"name":"Peiyi Wang","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:25:43.509Z","hidden":false},{"_id":"6791b70a76d05e183a4115a2","name":"Xiao Bi","hidden":false},{"_id":"6791b70a76d05e183a4115a3","name":"Xiaokang Zhang","hidden":false},{"_id":"6791b70a76d05e183a4115a4","name":"Xingkai Yu","hidden":false},{"_id":"6791b70a76d05e183a4115a5","name":"Yu Wu","hidden":false},{"_id":"6791b70a76d05e183a4115a6","name":"Z. F. Wu","hidden":false},{"_id":"6791b70a76d05e183a4115a7","user":{"_id":"62dcf5d4169bd1d2ef2ca724","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62dcf5d4169bd1d2ef2ca724/oRFFmJDJTLYtPRVPCweQ_.jpeg","isPro":false,"fullname":"Zhibin Gou","user":"zubingou","type":"user"},"name":"Zhibin Gou","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:26:37.876Z","hidden":false},{"_id":"6791b70a76d05e183a4115a8","user":{"_id":"65db64f8b62d242ed8711701","avatarUrl":"/avatars/753e9f980eb6786c6b53b2f1becbf745.svg","isPro":false,"fullname":"Zhihong Shao","user":"ZhihongShao","type":"user"},"name":"Zhihong Shao","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:26:51.359Z","hidden":false},{"_id":"6791b70a76d05e183a4115a9","name":"Zhuoshu Li","hidden":false},{"_id":"6791b70a76d05e183a4115aa","name":"Ziyi Gao","hidden":false},{"_id":"6791b70a76d05e183a4115ab","name":"Aixin Liu","hidden":false},{"_id":"6791b70a76d05e183a4115ac","name":"Bing Xue","hidden":false},{"_id":"6791b70a76d05e183a4115ad","user":{"_id":"6523d81d56fe05f216a559f6","avatarUrl":"/avatars/07fcf56b5b8a0b64c31bdfe8fbf41cc6.svg","isPro":false,"fullname":"Bingxuan Wang","user":"YellowDoge","type":"user"},"name":"Bingxuan Wang","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:27:32.553Z","hidden":false},{"_id":"6791b70a76d05e183a4115ae","name":"Bochao Wu","hidden":false},{"_id":"6791b70a76d05e183a4115af","name":"Bei Feng","hidden":false},{"_id":"6791b70a76d05e183a4115b0","name":"Chengda Lu","hidden":false},{"_id":"6791b70a76d05e183a4115b1","user":{"_id":"66053b1f9e3555d648b21c3d","avatarUrl":"/avatars/c8b33e7f702c4edb17add47f0eafe5e6.svg","isPro":false,"fullname":"Chenggang Zhao","user":"LyricZ","type":"user"},"name":"Chenggang Zhao","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:28:10.434Z","hidden":false},{"_id":"6791b70a76d05e183a4115b2","name":"Chengqi Deng","hidden":false},{"_id":"6791b70a76d05e183a4115b3","name":"Chenyu Zhang","hidden":false},{"_id":"6791b70a76d05e183a4115b4","user":{"_id":"6398203609f12714ed1935c2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6398203609f12714ed1935c2/uXgl0LgKnFYjq1Wz39-a6.jpeg","isPro":false,"fullname":"Chong Ruan","user":"Chester111","type":"user"},"name":"Chong Ruan","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:28:32.927Z","hidden":false},{"_id":"6791b70a76d05e183a4115b5","user":{"_id":"659389f8de82e1ef7b9a8b13","avatarUrl":"/avatars/896ed9f4cdbd317493b303d070b7e12a.svg","isPro":false,"fullname":"Damai Dai","user":"DeepSeekDDM","type":"user"},"name":"Damai Dai","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:28:49.991Z","hidden":false},{"_id":"6791b70a76d05e183a4115b6","name":"Deli Chen","hidden":false},{"_id":"6791b70a76d05e183a4115b7","user":{"_id":"65fce397fdc5e8ee7d07ee3b","avatarUrl":"/avatars/35edef17ecce02939df1e7fdd19b87c8.svg","isPro":false,"fullname":"Dong Jiejie","user":"Dj12138","type":"user"},"name":"Dongjie Ji","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:29:12.901Z","hidden":false},{"_id":"6791b70a76d05e183a4115b8","name":"Erhang Li","hidden":false},{"_id":"6791b70a76d05e183a4115b9","name":"Fangyun Lin","hidden":false},{"_id":"6791b70a76d05e183a4115ba","name":"Fucong Dai","hidden":false},{"_id":"6791b70a76d05e183a4115bb","user":{"_id":"6538815d1bdb3c40db94fbfa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538815d1bdb3c40db94fbfa/id7aSY8JUgKK2agKWLERt.jpeg","isPro":false,"fullname":"Fuli Luo","user":"luofuli","type":"user"},"name":"Fuli Luo","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:29:55.378Z","hidden":false},{"_id":"6791b70a76d05e183a4115bc","name":"Guangbo Hao","hidden":false},{"_id":"6791b70a76d05e183a4115bd","name":"Guanting Chen","hidden":false},{"_id":"6791b70a76d05e183a4115be","name":"Guowei Li","hidden":false},{"_id":"6791b70a76d05e183a4115bf","name":"H. Zhang","hidden":false},{"_id":"6791b70a76d05e183a4115c0","name":"Han Bao","hidden":false},{"_id":"6791b70a76d05e183a4115c1","name":"Hanwei Xu","hidden":false},{"_id":"6791b70a76d05e183a4115c2","name":"Haocheng Wang","hidden":false},{"_id":"6791b70a76d05e183a4115c3","user":{"_id":"65a5d3d203ed327234be0d3e","avatarUrl":"/avatars/7a579207214741c4374c3051c7a1f19f.svg","isPro":false,"fullname":"Honghui Ding","user":"honghuiding","type":"user"},"name":"Honghui Ding","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:30:55.666Z","hidden":false},{"_id":"6791b70a76d05e183a4115c4","user":{"_id":"6532a060a78e70d19c669103","avatarUrl":"/avatars/3cc9309b0e31da0fb83f1c3ef87dbe9f.svg","isPro":false,"fullname":"HuajianXin","user":"HuajianXin","type":"user"},"name":"Huajian Xin","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:31:02.989Z","hidden":false},{"_id":"6791b70a76d05e183a4115c5","user":{"_id":"64e370be59aa5366642ac329","avatarUrl":"/avatars/0fa1eb6ac6c1aeff3e65bc86a6617f64.svg","isPro":false,"fullname":"Huazuo Gao","user":"gaohuazuo","type":"user"},"name":"Huazuo Gao","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:31:09.943Z","hidden":false},{"_id":"6791b70a76d05e183a4115c6","name":"Hui Qu","hidden":false},{"_id":"6791b70a76d05e183a4115c7","name":"Hui Li","hidden":false},{"_id":"6791b70a76d05e183a4115c8","name":"Jianzhong Guo","hidden":false},{"_id":"6791b70a76d05e183a4115c9","user":{"_id":"64fca5f28d50404bc42ca78a","avatarUrl":"/avatars/ae01ac0296d6ce1277dacb6894f570b8.svg","isPro":false,"fullname":"Jiashi Li","user":"Beginlner","type":"user"},"name":"Jiashi Li","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:31:28.452Z","hidden":false},{"_id":"6791b70a76d05e183a4115ca","user":{"_id":"64060b49a577649430bf6974","avatarUrl":"/avatars/74d0d6ed656b593e4c101b09edf18c7a.svg","isPro":false,"fullname":"Jiawei Wang","user":"Jarvis1111","type":"user"},"name":"Jiawei Wang","status":"claimed_verified","statusLastChangedAt":"2025-03-27T14:26:33.045Z","hidden":false},{"_id":"6791b70a76d05e183a4115cb","name":"Jingchang Chen","hidden":false},{"_id":"6791b70a76d05e183a4115cc","name":"Jingyang Yuan","hidden":false},{"_id":"6791b70a76d05e183a4115cd","name":"Junjie Qiu","hidden":false},{"_id":"6791b70a76d05e183a4115ce","user":{"_id":"621e40ac944c7e36aaec2369","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/621e40ac944c7e36aaec2369/Yj-FJRWps3rvsS_B2bnKo.jpeg","isPro":false,"fullname":"Junlong Li","user":"lockon","type":"user"},"name":"Junlong Li","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:32:44.002Z","hidden":false},{"_id":"6791b70a76d05e183a4115cf","name":"J. L. Cai","hidden":false},{"_id":"6791b70a76d05e183a4115d0","name":"Jiaqi Ni","hidden":false},{"_id":"6791b70a76d05e183a4115d1","name":"Jian Liang","hidden":false},{"_id":"6791b70a76d05e183a4115d2","name":"Jin Chen","hidden":false},{"_id":"6791b70a76d05e183a4115d3","name":"Kai Dong","hidden":false},{"_id":"6791b70a76d05e183a4115d4","name":"Kai Hu","hidden":false},{"_id":"6791b70a76d05e183a4115d5","name":"Kaige Gao","hidden":false},{"_id":"6791b70a76d05e183a4115d6","name":"Kang Guan","hidden":false},{"_id":"6791b70a76d05e183a4115d7","name":"Kexin Huang","hidden":false},{"_id":"6791b70a76d05e183a4115d8","name":"Kuai Yu","hidden":false},{"_id":"6791b70a76d05e183a4115d9","name":"Lean Wang","hidden":false},{"_id":"6791b70a76d05e183a4115da","name":"Lecong Zhang","hidden":false},{"_id":"6791b70a76d05e183a4115db","name":"Liang Zhao","hidden":false},{"_id":"6791b70a76d05e183a4115dc","name":"Litong Wang","hidden":false},{"_id":"6791b70a76d05e183a4115dd","user":{"_id":"67367647517b82b436d74930","avatarUrl":"/avatars/34c1f894a3da9f38816d0b30bfdc6d50.svg","isPro":false,"fullname":"Liyue Zhang","user":"Lyriccc","type":"user"},"name":"Liyue Zhang","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:32:51.322Z","hidden":false},{"_id":"6791b70a76d05e183a4115de","name":"Lei Xu","hidden":false},{"_id":"6791b70a76d05e183a4115df","name":"Leyi Xia","hidden":false},{"_id":"6791b70a76d05e183a4115e0","name":"Mingchuan Zhang","hidden":false},{"_id":"6791b70a76d05e183a4115e1","name":"Minghua Zhang","hidden":false},{"_id":"6791b70a76d05e183a4115e2","user":{"_id":"64ad1b6e616d3eb36149d38d","avatarUrl":"/avatars/54d0d104ba430f65c04b5259a6423940.svg","isPro":false,"fullname":"Minghui Tang","user":"weicfd","type":"user"},"name":"Minghui Tang","status":"admin_assigned","statusLastChangedAt":"2025-01-23T09:33:01.949Z","hidden":false},{"_id":"6791b70a76d05e183a4115e3","name":"Meng Li","hidden":false},{"_id":"6791b70a76d05e183a4115e4","name":"Miaojun Wang","hidden":false},{"_id":"6791b70a76d05e183a4115e5","name":"Mingming Li","hidden":false},{"_id":"6791b70a76d05e183a4115e6","name":"Ning Tian","hidden":false},{"_id":"6791b70a76d05e183a4115e7","name":"Panpan Huang","hidden":false},{"_id":"6791b70a76d05e183a4115e8","name":"Peng Zhang","hidden":false},{"_id":"6791b70a76d05e183a4115e9","name":"Qiancheng Wang","hidden":false},{"_id":"6791b70a76d05e183a4115ea","name":"Qinyu Chen","hidden":false},{"_id":"6791b70a76d05e183a4115eb","name":"Qiushi Du","hidden":false},{"_id":"6791b70a76d05e183a4115ec","name":"Ruiqi Ge","hidden":false},{"_id":"6791b70a76d05e183a4115ed","name":"Ruisong Zhang","hidden":false},{"_id":"6791b70a76d05e183a4115ee","name":"Ruizhe Pan","hidden":false},{"_id":"6791b70a76d05e183a4115ef","name":"Runji Wang","hidden":false},{"_id":"6791b70a76d05e183a4115f0","name":"R. J. Chen","hidden":false},{"_id":"6791b70a76d05e183a4115f1","name":"R. L. Jin","hidden":false},{"_id":"6791b70a76d05e183a4115f2","name":"Ruyi Chen","hidden":false},{"_id":"6791b70a76d05e183a4115f3","name":"Shanghao Lu","hidden":false},{"_id":"6791b70a76d05e183a4115f4","name":"Shangyan Zhou","hidden":false},{"_id":"6791b70a76d05e183a4115f5","name":"Shanhuang Chen","hidden":false},{"_id":"6791b70a76d05e183a4115f6","name":"Shengfeng Ye","hidden":false},{"_id":"6791b70a76d05e183a4115f7","name":"Shiyu Wang","hidden":false},{"_id":"6791b70a76d05e183a4115f8","name":"Shuiping Yu","hidden":false},{"_id":"6791b70a76d05e183a4115f9","name":"Shunfeng Zhou","hidden":false},{"_id":"6791b70a76d05e183a4115fa","name":"Shuting Pan","hidden":false},{"_id":"6791b70a76d05e183a4115fb","name":"S. S. Li","hidden":false},{"_id":"6791b70a76d05e183a4115fc","name":"Shuang Zhou","hidden":false},{"_id":"6791b70a76d05e183a4115fd","name":"Shaoqing Wu","hidden":false},{"_id":"6791b70a76d05e183a4115fe","name":"Shengfeng Ye","hidden":false},{"_id":"6791b70a76d05e183a4115ff","name":"Tao Yun","hidden":false},{"_id":"6791b70a76d05e183a411600","name":"Tian Pei","hidden":false},{"_id":"6791b70a76d05e183a411601","name":"Tianyu Sun","hidden":false},{"_id":"6791b70a76d05e183a411602","name":"T. Wang","hidden":false},{"_id":"6791b70a76d05e183a411603","name":"Wangding Zeng","hidden":false},{"_id":"6791b70a76d05e183a411604","name":"Wanjia Zhao","hidden":false},{"_id":"6791b70a76d05e183a411605","user":{"_id":"63198e8802fb322037332f2d","avatarUrl":"/avatars/d3f9c206e387df35beb0ed0ef1cdf865.svg","isPro":false,"fullname":"Wen Liu","user":"doubility123","type":"user"},"name":"Wen Liu","status":"claimed_verified","statusLastChangedAt":"2025-02-05T10:15:38.719Z","hidden":false},{"_id":"6791b70a76d05e183a411606","name":"Wenfeng Liang","hidden":false},{"_id":"6791b70a76d05e183a411607","user":{"_id":"64b00c802307fd350a9e0f6f","avatarUrl":"/avatars/2856f6768793ca1ef74582a68cebfc55.svg","isPro":false,"fullname":"gao","user":"wenjun007","type":"user"},"name":"Wenjun Gao","status":"claimed_verified","statusLastChangedAt":"2025-10-11T14:13:08.466Z","hidden":false},{"_id":"6791b70a76d05e183a411608","name":"Wenqin Yu","hidden":false},{"_id":"6791b70a76d05e183a411609","name":"Wentao Zhang","hidden":false},{"_id":"6791b70a76d05e183a41160a","name":"W. L. Xiao","hidden":false},{"_id":"6791b70a76d05e183a41160b","name":"Wei An","hidden":false},{"_id":"6791b70a76d05e183a41160c","name":"Xiaodong Liu","hidden":false},{"_id":"6791b70a76d05e183a41160d","name":"Xiaohan Wang","hidden":false},{"_id":"6791b70a76d05e183a41160e","user":{"_id":"6635e701420baf7bc3f93561","avatarUrl":"/avatars/cdbb3085fee73ac520888977e2c575ea.svg","isPro":false,"fullname":"Xiaokang Chen","user":"CharlesCXK","type":"user"},"name":"Xiaokang Chen","status":"claimed_verified","statusLastChangedAt":"2025-02-05T15:55:47.839Z","hidden":false},{"_id":"6791b70a76d05e183a41160f","name":"Xiaotao Nie","hidden":false},{"_id":"6791b70a76d05e183a411610","name":"Xin Cheng","hidden":false},{"_id":"6791b70a76d05e183a411611","name":"Xin Liu","hidden":false},{"_id":"6791b70a76d05e183a411612","name":"Xin Xie","hidden":false},{"_id":"6791b70a76d05e183a411613","name":"Xingchao Liu","hidden":false},{"_id":"6791b70a76d05e183a411614","name":"Xinyu Yang","hidden":false},{"_id":"6791b70a76d05e183a411615","name":"Xinyuan Li","hidden":false},{"_id":"6791b70a76d05e183a411616","name":"Xuecheng Su","hidden":false},{"_id":"6791b70a76d05e183a411617","name":"Xuheng Lin","hidden":false},{"_id":"6791b70a76d05e183a411618","name":"X. Q. Li","hidden":false},{"_id":"6791b70a76d05e183a411619","name":"Xiangyue Jin","hidden":false},{"_id":"6791b70a76d05e183a41161a","name":"Xiaojin Shen","hidden":false},{"_id":"6791b70a76d05e183a41161b","name":"Xiaosha Chen","hidden":false},{"_id":"6791b70a76d05e183a41161c","name":"Xiaowen Sun","hidden":false},{"_id":"6791b70a76d05e183a41161d","name":"Xiaoxiang Wang","hidden":false},{"_id":"6791b70a76d05e183a41161e","name":"Xinnan Song","hidden":false},{"_id":"6791b70a76d05e183a41161f","name":"Xinyi Zhou","hidden":false},{"_id":"6791b70a76d05e183a411620","name":"Xianzu Wang","hidden":false},{"_id":"6791b70a76d05e183a411621","name":"Xinxia Shan","hidden":false},{"_id":"6791b70a76d05e183a411622","name":"Y. K. Li","hidden":false},{"_id":"6791b70a76d05e183a411623","name":"Y. Q. Wang","hidden":false},{"_id":"6791b70a76d05e183a411624","name":"Y. X. Wei","hidden":false},{"_id":"6791b70a76d05e183a411625","name":"Yang Zhang","hidden":false},{"_id":"6791b70a76d05e183a411626","name":"Yanhong Xu","hidden":false},{"_id":"6791b70a76d05e183a411627","name":"Yao Li","hidden":false},{"_id":"6791b70a76d05e183a411628","name":"Yao Zhao","hidden":false},{"_id":"6791b70a76d05e183a411629","name":"Yaofeng Sun","hidden":false},{"_id":"6791b70a76d05e183a41162a","name":"Yaohui Wang","hidden":false},{"_id":"6791b70a76d05e183a41162b","name":"Yi Yu","hidden":false},{"_id":"6791b70a76d05e183a41162c","name":"Yichao Zhang","hidden":false},{"_id":"6791b70a76d05e183a41162d","name":"Yifan Shi","hidden":false},{"_id":"6791b70a76d05e183a41162e","name":"Yiliang Xiong","hidden":false},{"_id":"6791b70a76d05e183a41162f","name":"Ying He","hidden":false},{"_id":"6791b70a76d05e183a411630","name":"Yishi Piao","hidden":false},{"_id":"6791b70a76d05e183a411631","name":"Yisong Wang","hidden":false},{"_id":"6791b70a76d05e183a411632","name":"Yixuan Tan","hidden":false},{"_id":"6791b70a76d05e183a411633","name":"Yiyang Ma","hidden":false},{"_id":"6791b70a76d05e183a411634","name":"Yiyuan Liu","hidden":false},{"_id":"6791b70a76d05e183a411635","name":"Yongqiang Guo","hidden":false},{"_id":"6791b70a76d05e183a411636","name":"Yuan Ou","hidden":false},{"_id":"6791b70a76d05e183a411637","name":"Yuduan Wang","hidden":false},{"_id":"6791b70a76d05e183a411638","name":"Yue Gong","hidden":false},{"_id":"6791b70a76d05e183a411639","name":"Yuheng Zou","hidden":false},{"_id":"6791b70a76d05e183a41163a","name":"Yujia He","hidden":false},{"_id":"6791b70a76d05e183a41163b","name":"Yunfan Xiong","hidden":false},{"_id":"6791b70a76d05e183a41163c","name":"Yuxiang Luo","hidden":false},{"_id":"6791b70a76d05e183a41163d","name":"Yuxiang You","hidden":false},{"_id":"6791b70a76d05e183a41163e","name":"Yuxuan Liu","hidden":false},{"_id":"6791b70a76d05e183a41163f","name":"Yuyang Zhou","hidden":false},{"_id":"6791b70a76d05e183a411640","name":"Y. X. Zhu","hidden":false},{"_id":"6791b70a76d05e183a411641","name":"Yanhong Xu","hidden":false},{"_id":"6791b70a76d05e183a411642","name":"Yanping Huang","hidden":false},{"_id":"6791b70a76d05e183a411643","name":"Yaohui Li","hidden":false},{"_id":"6791b70a76d05e183a411644","name":"Yi Zheng","hidden":false},{"_id":"6791b70a76d05e183a411645","name":"Yuchen Zhu","hidden":false},{"_id":"6791b70a76d05e183a411646","name":"Yunxian Ma","hidden":false},{"_id":"6791b70a76d05e183a411647","name":"Ying Tang","hidden":false},{"_id":"6791b70a76d05e183a411648","name":"Yukun Zha","hidden":false},{"_id":"6791b70a76d05e183a411649","name":"Yuting Yan","hidden":false},{"_id":"6791b70a76d05e183a41164a","name":"Z. Z. Ren","hidden":false},{"_id":"6791b70a76d05e183a41164b","name":"Zehui Ren","hidden":false},{"_id":"6791b70a76d05e183a41164c","name":"Zhangli Sha","hidden":false},{"_id":"6791b70a76d05e183a41164d","name":"Zhe Fu","hidden":false},{"_id":"6791b70a76d05e183a41164e","user":{"_id":"676d42cdf51984f73463fbeb","avatarUrl":"/avatars/156938eff95375b35db4b307739e9d89.svg","isPro":false,"fullname":"Zhean Xu","user":"CyanicX","type":"user"},"name":"Zhean Xu","status":"claimed_verified","statusLastChangedAt":"2026-01-02T15:41:53.153Z","hidden":false},{"_id":"6791b70a76d05e183a41164f","name":"Zhenda Xie","hidden":false},{"_id":"6791b70a76d05e183a411650","name":"Zhengyan Zhang","hidden":false},{"_id":"6791b70a76d05e183a411651","name":"Zhewen Hao","hidden":false},{"_id":"6791b70a76d05e183a411652","name":"Zhicheng Ma","hidden":false},{"_id":"6791b70a76d05e183a411653","name":"Zhigang Yan","hidden":false},{"_id":"6791b70a76d05e183a411654","name":"Zhiyu Wu","hidden":false},{"_id":"6791b70a76d05e183a411655","name":"Zihui Gu","hidden":false},{"_id":"6791b70a76d05e183a411656","name":"Zijia Zhu","hidden":false},{"_id":"6791b70a76d05e183a411657","user":{"_id":"6468c76bff18750165a64df3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6468c76bff18750165a64df3/dHhE62SHOSJZjyU60vgh7.jpeg","isPro":true,"fullname":"Zijun Liu","user":"BBQGOD","type":"user"},"name":"Zijun Liu","status":"claimed_verified","statusLastChangedAt":"2025-01-27T10:43:24.441Z","hidden":false},{"_id":"6791b70a76d05e183a411658","name":"Zilin Li","hidden":false},{"_id":"6791b70a76d05e183a411659","name":"Ziwei Xie","hidden":false},{"_id":"6791b70a76d05e183a41165a","name":"Ziyang Song","hidden":false},{"_id":"6791b70a76d05e183a41165b","name":"Zizheng Pan","hidden":false},{"_id":"6791b70a76d05e183a41165c","name":"Zhen Huang","hidden":false},{"_id":"6791b70a76d05e183a41165d","name":"Zhipeng Xu","hidden":false},{"_id":"6791b70a76d05e183a41165e","name":"Zhongyu Zhang","hidden":false},{"_id":"6791b70a76d05e183a41165f","name":"Zhen Zhang","hidden":false}],"publishedAt":"2025-01-22T15:19:35.000Z","submittedOnDailyAt":"2025-01-23T00:57:48.680Z","title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via\n Reinforcement Learning","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We introduce our first-generation reasoning models, DeepSeek-R1-Zero and\nDeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement\nlearning (RL) without supervised fine-tuning (SFT) as a preliminary step,\ndemonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero\nnaturally emerges with numerous powerful and intriguing reasoning behaviors.\nHowever, it encounters challenges such as poor readability, and language\nmixing. To address these issues and further enhance reasoning performance, we\nintroduce DeepSeek-R1, which incorporates multi-stage training and cold-start\ndata before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217\non reasoning tasks. To support the research community, we open-source\nDeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B,\n70B) distilled from DeepSeek-R1 based on Qwen and Llama.","upvotes":441,"discussionId":"6791b70c76d05e183a4116bf","githubRepo":"https://github.com/deepseek-ai/deepseek-r1","githubRepoAddedBy":"auto","ai_summary":"DeepSeek-R1-Zero and DeepSeek-R1 utilize reinforcement learning and multi-stage training to enhance reasoning capabilities, with DeepSeek-R1 achieving performance comparable to OpenAI-o1-1217.","ai_keywords":["reinforcement learning","multi-stage training","cold-start data","Qwen","Llama"],"githubStars":91838,"organization":{"_id":"652faff917096ceb6bf53f3f","name":"deepseek-ai","fullname":"DeepSeek","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6538815d1bdb3c40db94fbfa/xMBly9PUMphrFVMxLX4kq.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"652219bbfe5881ad35d5499b","avatarUrl":"/avatars/c2a4a7b0b2354d4b3307749640e76ef7.svg","isPro":false,"fullname":"wannanfeng","user":"WanNanfeng","type":"user"},{"_id":"63a3511e1a19cbf69e840b53","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676027302959-63a3511e1a19cbf69e840b53.png","isPro":false,"fullname":"Huu-Thien Tran","user":"mathesics","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"65decc75beffeb39ba679eba","avatarUrl":"/avatars/735b678bd5863a0c1b1bdd3bbf8858fa.svg","isPro":true,"fullname":"r","user":"oceansweep","type":"user"},{"_id":"633e570be7d5ce7bfe037a53","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/633e570be7d5ce7bfe037a53/zV8ULv4Mu7YIGZ8D3JtmK.jpeg","isPro":false,"fullname":"Zhaocheng Liu","user":"zhaocheng","type":"user"},{"_id":"642e97dbc1b0f8e4e76c2b30","avatarUrl":"/avatars/60adf4470baf12d5687d53a6c3299bcd.svg","isPro":false,"fullname":"james curry","user":"ainbo","type":"user"},{"_id":"62471d933da3618636e973b8","avatarUrl":"/avatars/e58bfd7ffc943a6f5a3c6e90fb80d36c.svg","isPro":false,"fullname":"Shangzhi Zhang","user":"Snorlax","type":"user"},{"_id":"62f847d692950415b63c6011","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1660437733795-noauth.png","isPro":false,"fullname":"Yassine Ennaour","user":"Lyte","type":"user"},{"_id":"67919d669bed6c0d6aa573f5","avatarUrl":"/avatars/f115862341a80dec2336f27f8ee8d38f.svg","isPro":false,"fullname":"yang","user":"fengfan933","type":"user"},{"_id":"632a97bf15a8aeac60249b15","avatarUrl":"/avatars/ab0e77850451725928bd990b47ffe97d.svg","isPro":false,"fullname":"NAN","user":"nan1248","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"6371ad82f0fe906bdc5b15f6","avatarUrl":"/avatars/ddc61e1edae5bd6b19530e1bc5e15d53.svg","isPro":false,"fullname":"Dotanoob7","user":"Dotanoob","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1,"organization":{"_id":"652faff917096ceb6bf53f3f","name":"deepseek-ai","fullname":"DeepSeek","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6538815d1bdb3c40db94fbfa/xMBly9PUMphrFVMxLX4kq.png"}}">
Papers
arxiv:2501.12948

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Published on Jan 22, 2025
· Submitted by
AK
on Jan 23, 2025
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

DeepSeek-R1-Zero and DeepSeek-R1 utilize reinforcement learning and multi-stage training to enhance reasoning capabilities, with DeepSeek-R1 achieving performance comparable to OpenAI-o1-1217.

AI-generated summary

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

Community

Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

A written and video review - https://aipapersacademy.com/deepseek-r1/

Here is the Ajith's AI Pulse article on this paper : https://ajithp.com/2025/01/26/deepseek-r1-ai-reasoning/

Bookmark of GRPO Equation Latex Code and Term Explanation

image.png

Equation: http://www.deepnlp.org/equation/group-relative-policy-optimization-grpo
Equation Search Engine (http://www.deepnlp.org/search/equation) and Paper related AI Agents List (http://www.deepnlp.org/store/ai-agent)

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning-1485-bee5358e

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Paperweb `aura`

Sign up or log in to comment

Models citing this paper 338

Browse 338 models citing this paper

Datasets citing this paper 10

Browse 10 datasets citing this paper

Spaces citing this paper 2,731

Collections including this paper 127