Are you just trying to promote yourself?
\n","updatedAt":"2025-11-30T05:45:22.488Z","author":{"_id":"68a48912bf16affffc83d9a9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a48912bf16affffc83d9a9/Hh2SUqjvpVhmHtZnNc6wb.png","fullname":"Taylon","name":"taylonmcfly","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9900121092796326},"editors":["taylonmcfly"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68a48912bf16affffc83d9a9/Hh2SUqjvpVhmHtZnNc6wb.png"],"reactions":[],"isReport":false,"parentCommentId":"682b67a68476118ef61ce66f"}},{"id":"694014248794a7b34e27e2aa","author":{"_id":"6350ca6607080feca52ee6e1","avatarUrl":"/avatars/6b0d82cac279e34b23e261862bfa58fe.svg","fullname":"Kentaro-Machida","name":"Kentaro-Machida","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-12-15T13:59:00.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-12-15T14:01:03.682Z","author":{"_id":"6350ca6607080feca52ee6e1","avatarUrl":"/avatars/6b0d82cac279e34b23e261862bfa58fe.svg","fullname":"Kentaro-Machida","name":"Kentaro-Machida","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[],"parentCommentId":"682b67a68476118ef61ce66f"}},{"id":"69405cde552452184f99f68d","author":{"_id":"68a48912bf16affffc83d9a9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a48912bf16affffc83d9a9/Hh2SUqjvpVhmHtZnNc6wb.png","fullname":"Taylon","name":"taylonmcfly","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-12-15T19:09:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Your comment is hidden","html":"Your comment is hidden
\n","updatedAt":"2025-12-15T19:09:18.950Z","author":{"_id":"68a48912bf16affffc83d9a9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a48912bf16affffc83d9a9/Hh2SUqjvpVhmHtZnNc6wb.png","fullname":"Taylon","name":"taylonmcfly","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7696377635002136},"editors":["taylonmcfly"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68a48912bf16affffc83d9a9/Hh2SUqjvpVhmHtZnNc6wb.png"],"reactions":[],"isReport":false,"parentCommentId":"682b67a68476118ef61ce66f"}}]},{"id":"682bdc25311e78ce31956e19","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-05-20T01:34:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Compass-V2 Technical Report](https://huggingface.co/papers/2504.15527) (2025)\n* [Reasoning Beyond Limits: Advances and Open Problems for LLMs](https://huggingface.co/papers/2503.22732) (2025)\n* [AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale](https://huggingface.co/papers/2505.08311) (2025)\n* [Llama-Nemotron: Efficient Reasoning Models](https://huggingface.co/papers/2505.00949) (2025)\n* [OpenCodeReasoning: Advancing Data Distillation for Competitive Coding](https://huggingface.co/papers/2504.01943) (2025)\n* [m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models](https://huggingface.co/papers/2504.00869) (2025)\n* [BitNet b1.58 2B4T Technical Report](https://huggingface.co/papers/2504.12285) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\nThe following papers were recommended by the Semantic Scholar API
\n- \n
- Compass-V2 Technical Report (2025) \n
- Reasoning Beyond Limits: Advances and Open Problems for LLMs (2025) \n
- AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale (2025) \n
- Llama-Nemotron: Efficient Reasoning Models (2025) \n
- OpenCodeReasoning: Advancing Data Distillation for Competitive Coding (2025) \n
- m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models (2025) \n
- BitNet b1.58 2B4T Technical Report (2025) \n
Please give a thumbs up to this comment if you found it helpful!
\nIf you want recommendations for any Paper on Hugging Face checkout this Space
\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
Thank you for the excellent technical report!
\nI have a question regarding your experience with reasoning generation: have you encountered issues with excessive or endless repetitions, particularly when generating reasoning in languages other than English or Chinese? In my case, I attempted to distill reasoning traces from Qwen3-32B to Qwen3-8B in a target language, and observed that the smaller model frequently produced repeated reasoning seeds, whereas the both original 32B/8B model rarely did.
\nDid you observe similar behavior during SFT/GRPO/distillation to smaller models stages? If so, how did you address it?
\nThe HuggingFace model card recommends using a presence penalty to reduce repetition, but this can negatively affect overall performance. I would be very interested to hear if you found alternative approaches or tuning strategies that helped mitigate this issue more effectively.
\nThanks again for sharing your work!
\n","updatedAt":"2025-06-16T22:58:01.389Z","author":{"_id":"64f4c8739ee58d48e8507e0e","avatarUrl":"/avatars/4be540dfb4a949f37cba2d3c3729fbde.svg","fullname":"Dmitrii Stoianov","name":"heylimon","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9489943385124207},"editors":["heylimon"],"editorAvatarUrls":["/avatars/4be540dfb4a949f37cba2d3c3729fbde.svg"],"reactions":[],"isReport":false}},{"id":"686240f9b312d372eac194a7","author":{"_id":"683549df9575cb89d6f109bd","avatarUrl":"/avatars/79d48dc030fd512238196205b8c4dad5.svg","fullname":"D","name":"Yahhhh","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-06-30T07:47:05.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Off-Topic","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-06-30T07:47:34.107Z","author":{"_id":"683549df9575cb89d6f109bd","avatarUrl":"/avatars/79d48dc030fd512238196205b8c4dad5.svg","fullname":"D","name":"Yahhhh","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"686edb86d19e2949ea8dfae0","author":{"_id":"6854adb8732ab42e74fe2b02","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Ig9wNbA5wy1XxRmgvgTgk.jpeg","fullname":"Mohamed naif","name":"mohmmaddd","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2025-07-09T21:13:42.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Off-Topic","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2026-02-05T09:21:31.630Z","author":{"_id":"6854adb8732ab42e74fe2b02","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Ig9wNbA5wy1XxRmgvgTgk.jpeg","fullname":"Mohamed naif","name":"mohmmaddd","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]},"replies":[{"id":"688380af08193681b527012a","author":{"_id":"6527e89a8808d80ccff88b7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6527e89a8808d80ccff88b7a/CuGNmF1Et8KMQ0mCd1NEJ.jpeg","fullname":"Not Lain","name":"not-lain","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3264,"isUserFollowing":false},"createdAt":"2025-07-25T13:03:43.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hi @mohmmaddd kindly consider removing the comment above as it is not related to the paper or to the AI field at all.\nThanks for your consideration!","html":"Hi \n\n@mohmmaddd\n\t kindly consider removing the comment above as it is not related to the paper or to the AI field at all.
Thanks for your consideration!
arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/qwen3-technical-report
\n","updatedAt":"2025-07-29T06:06:52.964Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6586546897888184},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[{"reaction":"👍","users":["hhhhhhhz"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2505.09388","authors":[{"_id":"68299e3128752b51372d31ea","user":{"_id":"62088594a5943c8a8fc94560","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644733028938-62088594a5943c8a8fc94560.png","isPro":false,"fullname":"An Yang","user":"yangapku","type":"user"},"name":"An Yang","status":"claimed_verified","statusLastChangedAt":"2025-05-19T06:43:00.733Z","hidden":false},{"_id":"68299e3128752b51372d31eb","user":{"_id":"6799128b9da39716ab1ebd95","avatarUrl":"/avatars/677d8ae2087137134c3f0e58f4cf769f.svg","isPro":false,"fullname":"Anfeng Li","user":"laf070810","type":"user"},"name":"Anfeng Li","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:15:44.771Z","hidden":false},{"_id":"68299e3128752b51372d31ec","user":{"_id":"64b0a77df12b47366663884c","avatarUrl":"/avatars/a212ea862abb5966060e439dd0e7656f.svg","isPro":false,"fullname":"Baosong Yang","user":"Baosong","type":"user"},"name":"Baosong Yang","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:15:37.853Z","hidden":false},{"_id":"68299e3128752b51372d31ed","user":{"_id":"64b93578ee257c3a4cfceed1","avatarUrl":"/avatars/e6188562254f75a09b4048b800860016.svg","isPro":false,"fullname":"Beichen Zhang","user":"BeichenZhang","type":"user"},"name":"Beichen Zhang","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:16:13.672Z","hidden":false},{"_id":"68299e3128752b51372d31ee","user":{"_id":"61e4c4ca1ab24785ac11ba69","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e4c4ca1ab24785ac11ba69/1Q1zhhyGSJ9RJG9MzwxVv.jpeg","isPro":false,"fullname":"Binyuan Hui","user":"huybery","type":"user"},"name":"Binyuan Hui","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:16:22.151Z","hidden":false},{"_id":"68299e3128752b51372d31ef","name":"Bo Zheng","hidden":false},{"_id":"68299e3128752b51372d31f0","user":{"_id":"6583ab7983a9e1460c67d876","avatarUrl":"/avatars/74400bc448c3f07e23a4cd53d68a6af7.svg","isPro":false,"fullname":"bowen","user":"bowenYu","type":"user"},"name":"Bowen Yu","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:16:31.453Z","hidden":false},{"_id":"68299e3128752b51372d31f1","name":"Chang Gao","hidden":false},{"_id":"68299e3128752b51372d31f2","name":"Chengen Huang","hidden":false},{"_id":"68299e3128752b51372d31f3","name":"Chenxu Lv","hidden":false},{"_id":"68299e3128752b51372d31f4","user":{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","isPro":false,"fullname":"Chujie Zheng","user":"chujiezheng","type":"user"},"name":"Chujie Zheng","status":"claimed_verified","statusLastChangedAt":"2025-05-19T06:43:04.798Z","hidden":false},{"_id":"68299e3128752b51372d31f5","user":{"_id":"6434d4989bd5a84b5dd0b0f5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6434d4989bd5a84b5dd0b0f5/0Elf9qbfG9Hkgypm9pTGm.jpeg","isPro":false,"fullname":"Dayiheng Liu","user":"Losin94","type":"user"},"name":"Dayiheng Liu","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:17:32.677Z","hidden":false},{"_id":"68299e3128752b51372d31f6","name":"Fan Zhou","hidden":false},{"_id":"68299e3128752b51372d31f7","name":"Fei Huang","hidden":false},{"_id":"68299e3128752b51372d31f8","name":"Feng Hu","hidden":false},{"_id":"68299e3128752b51372d31f9","name":"Hao Ge","hidden":false},{"_id":"68299e3128752b51372d31fa","user":{"_id":"6436618aeef1f55654a9f458","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6436618aeef1f55654a9f458/OvxGtuDg2GAFG9As-2hzW.jpeg","isPro":false,"fullname":"Haoran Wei","user":"HaoranWei","type":"user"},"name":"Haoran Wei","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:17:56.110Z","hidden":false},{"_id":"68299e3128752b51372d31fb","name":"Huan Lin","hidden":false},{"_id":"68299e3128752b51372d31fc","user":{"_id":"63281d05ac205d01918b5fc7","avatarUrl":"/avatars/fc3e0f7285bb2869a92670f764dfc535.svg","isPro":false,"fullname":"Jialong Tang","user":"Jialong","type":"user"},"name":"Jialong Tang","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:18:16.959Z","hidden":false},{"_id":"68299e3128752b51372d31fd","name":"Jian Yang","hidden":false},{"_id":"68299e3128752b51372d31fe","user":{"_id":"654bead777401b47e6424f88","avatarUrl":"/avatars/7bcbdbb051c93b004f0dc3ad36c4a0ce.svg","isPro":false,"fullname":"Jianhong Tu","user":"JianhongTu","type":"user"},"name":"Jianhong Tu","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:18:30.045Z","hidden":false},{"_id":"68299e3128752b51372d31ff","name":"Jianwei Zhang","hidden":false},{"_id":"68299e3128752b51372d3200","name":"Jianxin Yang","hidden":false},{"_id":"68299e3128752b51372d3201","user":{"_id":"646df403ad20c6fa4f30b7ec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646df403ad20c6fa4f30b7ec/Q64-XMghOcBoo3itZDGYA.jpeg","isPro":false,"fullname":"Jiaxi Yang","user":"jx-yang","type":"user"},"name":"Jiaxi Yang","status":"claimed_verified","statusLastChangedAt":"2026-02-03T10:12:18.903Z","hidden":false},{"_id":"68299e3128752b51372d3202","name":"Jing Zhou","hidden":false},{"_id":"68299e3128752b51372d3203","user":{"_id":"602f88f5e8149a962412a667","avatarUrl":"/avatars/b78f0e583df8e5d5e3365934fe5f4900.svg","isPro":false,"fullname":"Zhou","user":"Jingren","type":"user"},"name":"Jingren Zhou","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:20:51.253Z","hidden":false},{"_id":"68299e3128752b51372d3204","name":"Junyang Lin","hidden":false},{"_id":"68299e3128752b51372d3205","name":"Kai Dang","hidden":false},{"_id":"68299e3128752b51372d3206","name":"Keqin Bao","hidden":false},{"_id":"68299e3128752b51372d3207","name":"Kexin Yang","hidden":false},{"_id":"68299e3128752b51372d3208","user":{"_id":"6455056fe4952d1c6cb45e49","avatarUrl":"/avatars/bbdcd4405497f11ff3774352277d1ec7.svg","isPro":false,"fullname":"Le Yu","user":"vanillaOVO","type":"user"},"name":"Le Yu","status":"claimed_verified","statusLastChangedAt":"2025-06-04T09:04:46.635Z","hidden":false},{"_id":"68299e3128752b51372d3209","name":"Lianghao Deng","hidden":false},{"_id":"68299e3128752b51372d320a","name":"Mei Li","hidden":false},{"_id":"68299e3128752b51372d320b","user":{"_id":"5f8946925d083370c711f296","avatarUrl":"/avatars/14246aae3b1f8b7ad050f8ff2c8b260e.svg","isPro":false,"fullname":"Mingfeng Xue","user":"mingfengxue","type":"user"},"name":"Mingfeng Xue","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:21:56.048Z","hidden":false},{"_id":"68299e3128752b51372d320c","name":"Mingze Li","hidden":false},{"_id":"68299e3128752b51372d320d","name":"Pei Zhang","hidden":false},{"_id":"68299e3128752b51372d320e","user":{"_id":"62f220ccee7d7af44979efc7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f220ccee7d7af44979efc7/RImNglMumGCpAKB5gin6k.jpeg","isPro":false,"fullname":"Peng Wang","user":"ZJUPeng","type":"user"},"name":"Peng Wang","status":"claimed_verified","statusLastChangedAt":"2025-05-19T06:43:02.813Z","hidden":false},{"_id":"68299e3128752b51372d320f","name":"Qin Zhu","hidden":false},{"_id":"68299e3128752b51372d3210","name":"Rui Men","hidden":false},{"_id":"68299e3128752b51372d3211","user":{"_id":"6629ed94aabce1b25c3db90c","avatarUrl":"/avatars/cbc39db81c8e8f950d3bd2c2e03f71c8.svg","isPro":false,"fullname":"Ruize Gao","user":"gaoruize","type":"user"},"name":"Ruize Gao","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:21:46.295Z","hidden":false},{"_id":"68299e3128752b51372d3212","user":{"_id":"64e72776e9fc9d0475ef5188","avatarUrl":"/avatars/d32b9d4e1da5486c3d5f9b04fa29d167.svg","isPro":false,"fullname":"Shixuan Liu","user":"liusx","type":"user"},"name":"Shixuan Liu","status":"claimed_verified","statusLastChangedAt":"2025-07-29T07:11:31.990Z","hidden":false},{"_id":"68299e3128752b51372d3213","name":"Shuang Luo","hidden":false},{"_id":"68299e3128752b51372d3214","name":"Tianhao Li","hidden":false},{"_id":"68299e3128752b51372d3215","name":"Tianyi Tang","hidden":false},{"_id":"68299e3128752b51372d3216","name":"Wenbiao Yin","hidden":false},{"_id":"68299e3128752b51372d3217","name":"Xingzhang Ren","hidden":false},{"_id":"68299e3128752b51372d3218","user":{"_id":"64b73f9317570fdff9b0d1c4","avatarUrl":"/avatars/62124ae3e929b53f99f37e97226a877d.svg","isPro":false,"fullname":"Wang Xinyu","user":"oriuta","type":"user"},"name":"Xinyu Wang","status":"claimed_verified","statusLastChangedAt":"2025-08-14T13:41:43.095Z","hidden":false},{"_id":"68299e3128752b51372d3219","name":"Xinyu Zhang","hidden":false},{"_id":"68299e3128752b51372d321a","name":"Xuancheng Ren","hidden":false},{"_id":"68299e3128752b51372d321b","name":"Yang Fan","hidden":false},{"_id":"68299e3128752b51372d321c","name":"Yang Su","hidden":false},{"_id":"68299e3128752b51372d321d","name":"Yichang Zhang","hidden":false},{"_id":"68299e3128752b51372d321e","name":"Yinger Zhang","hidden":false},{"_id":"68299e3128752b51372d321f","name":"Yu Wan","hidden":false},{"_id":"68299e3128752b51372d3220","user":{"_id":"666aacfb918ba11c7c598194","avatarUrl":"/avatars/45bee8f1fdbdd256ee47d25e4bf01a7a.svg","isPro":false,"fullname":"Yuqiong Liu","user":"lyq333","type":"user"},"name":"Yuqiong Liu","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:20:06.363Z","hidden":false},{"_id":"68299e3128752b51372d3221","user":{"_id":"656832dfbd65fd41ee7aa8cd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656832dfbd65fd41ee7aa8cd/HHkyetTqNq1wIBPipzjQA.jpeg","isPro":false,"fullname":"Zekun Wang","user":"kugwzk","type":"user"},"name":"Zekun Wang","status":"claimed_verified","statusLastChangedAt":"2026-01-02T15:41:43.191Z","hidden":false},{"_id":"68299e3128752b51372d3222","user":{"_id":"672c25ca8cfb61188128eb6f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/FJWy9Tt7UQmu9KcTOx3Rt.png","isPro":false,"fullname":"Zeyu Cui","user":"misakamage","type":"user"},"name":"Zeyu Cui","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:19:43.843Z","hidden":false},{"_id":"68299e3128752b51372d3223","user":{"_id":"64704e973601bb7b06643e98","avatarUrl":"/avatars/52e51f4d1be6769e4397b8be2799cf32.svg","isPro":false,"fullname":"Zhenru Zhang","user":"Zhenru","type":"user"},"name":"Zhenru Zhang","status":"claimed_verified","statusLastChangedAt":"2025-10-17T04:16:12.083Z","hidden":false},{"_id":"68299e3128752b51372d3224","name":"Zhipeng Zhou","hidden":false},{"_id":"68299e3128752b51372d3225","user":{"_id":"647ccbd6e07cf9bb2d485244","avatarUrl":"/avatars/e8915abaff04f6762247e196b7cf84df.svg","isPro":false,"fullname":"Zihan Qiu","user":"QwQZh","type":"user"},"name":"Zihan Qiu","status":"admin_assigned","statusLastChangedAt":"2025-05-19T07:18:58.545Z","hidden":false}],"publishedAt":"2025-05-14T13:41:34.000Z","submittedOnDailyAt":"2025-05-19T01:23:20.310Z","title":"Qwen3 Technical Report","submittedOnDailyBy":{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","isPro":false,"fullname":"Chujie Zheng","user":"chujiezheng","type":"user"},"summary":"In this work, we present Qwen3, the latest version of the Qwen model family.\nQwen3 comprises a series of large language models (LLMs) designed to advance\nperformance, efficiency, and multilingual capabilities. The Qwen3 series\nincludes models of both dense and Mixture-of-Expert (MoE) architectures, with\nparameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is\nthe integration of thinking mode (for complex, multi-step reasoning) and\nnon-thinking mode (for rapid, context-driven responses) into a unified\nframework. This eliminates the need to switch between different models--such as\nchat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g.,\nQwQ-32B)--and enables dynamic mode switching based on user queries or chat\ntemplates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing\nusers to allocate computational resources adaptively during inference, thereby\nbalancing latency and performance based on task complexity. Moreover, by\nleveraging the knowledge from the flagship models, we significantly reduce the\ncomputational resources required to build smaller-scale models, while ensuring\ntheir highly competitive performance. Empirical evaluations demonstrate that\nQwen3 achieves state-of-the-art results across diverse benchmarks, including\ntasks in code generation, mathematical reasoning, agent tasks, etc.,\ncompetitive against larger MoE models and proprietary models. Compared to its\npredecessor Qwen2.5, Qwen3 expands multilingual support from 29 to 119\nlanguages and dialects, enhancing global accessibility through improved\ncross-lingual understanding and generation capabilities. To facilitate\nreproducibility and community-driven research and development, all Qwen3 models\nare publicly accessible under Apache 2.0.","upvotes":333,"discussionId":"68299e3228752b51372d325f","projectPage":"https://qwenlm.github.io/blog/qwen3/","githubRepo":"https://github.com/QwenLM/Qwen3","githubRepoAddedBy":"user","ai_summary":"Qwen3, a unified series of large language models, integrates thinking and non-thinking modes, reduces computational resources, and achieves state-of-the-art performance across various tasks and languages.","ai_keywords":["large language models","dense architecture","Mixture-of-Expert","thinking mode","non-thinking mode","thinking budget mechanism","knowledge transfer","multilingual capabilities","code generation","mathematical reasoning","agent tasks","cross-lingual understanding","generation capabilities"],"githubStars":26635,"organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-uploads.huggingface.co/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62430a8522549d0917bfeb5a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62430a8522549d0917bfeb5a/l8jr2cvCp9YBK41XaV27R.jpeg","isPro":false,"fullname":"cheng","user":"littlebird13","type":"user"},{"_id":"635b8b6a37c6a2c12e2cce00","avatarUrl":"/avatars/229fb72180529141515d1df797b33709.svg","isPro":false,"fullname":"Fei Huang","user":"hzhwcmhf","type":"user"},{"_id":"62c695ad5aae1c624ca992e2","avatarUrl":"/avatars/20d10fb3338e4bd4dc59e88a18cb2617.svg","isPro":false,"fullname":"Bo Zheng","user":"bzheng","type":"user"},{"_id":"61e4c4ca1ab24785ac11ba69","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e4c4ca1ab24785ac11ba69/1Q1zhhyGSJ9RJG9MzwxVv.jpeg","isPro":false,"fullname":"Binyuan Hui","user":"huybery","type":"user"},{"_id":"641190aebeea38a6513bc558","avatarUrl":"/avatars/ff2f97adf98861de563017d3589961f1.svg","isPro":false,"fullname":"Pei Zhang","user":"psyangqi","type":"user"},{"_id":"62088594a5943c8a8fc94560","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644733028938-62088594a5943c8a8fc94560.png","isPro":false,"fullname":"An Yang","user":"yangapku","type":"user"},{"_id":"6293243ee8de908bda81d4db","avatarUrl":"/avatars/3013edb81a3853401c9e5eb331fe7863.svg","isPro":false,"fullname":"Tianyi Tang","user":"StevenTang","type":"user"},{"_id":"620760a26e3b7210c2ff1943","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/VC-rKqimF6yxGESNVlPoR.jpeg","isPro":false,"fullname":"Junyang Lin","user":"JustinLin610","type":"user"},{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","isPro":false,"fullname":"Chujie Zheng","user":"chujiezheng","type":"user"},{"_id":"646df403ad20c6fa4f30b7ec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646df403ad20c6fa4f30b7ec/Q64-XMghOcBoo3itZDGYA.jpeg","isPro":false,"fullname":"Jiaxi Yang","user":"jx-yang","type":"user"},{"_id":"64704e973601bb7b06643e98","avatarUrl":"/avatars/52e51f4d1be6769e4397b8be2799cf32.svg","isPro":false,"fullname":"Zhenru Zhang","user":"Zhenru","type":"user"},{"_id":"62f220ccee7d7af44979efc7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f220ccee7d7af44979efc7/RImNglMumGCpAKB5gin6k.jpeg","isPro":false,"fullname":"Peng Wang","user":"ZJUPeng","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1,"organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-uploads.huggingface.co/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"}}">Abstract
Qwen3, a unified series of large language models, integrates thinking and non-thinking modes, reduces computational resources, and achieves state-of-the-art performance across various tasks and languages.
In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework. This eliminates the need to switch between different models--such as chat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g., QwQ-32B)--and enables dynamic mode switching based on user queries or chat templates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing users to allocate computational resources adaptively during inference, thereby balancing latency and performance based on task complexity. Moreover, by leveraging the knowledge from the flagship models, we significantly reduce the computational resources required to build smaller-scale models, while ensuring their highly competitive performance. Empirical evaluations demonstrate that Qwen3 achieves state-of-the-art results across diverse benchmarks, including tasks in code generation, mathematical reasoning, agent tasks, etc., competitive against larger MoE models and proprietary models. Compared to its predecessor Qwen2.5, Qwen3 expands multilingual support from 29 to 119 languages and dialects, enhancing global accessibility through improved cross-lingual understanding and generation capabilities. To facilitate reproducibility and community-driven research and development, all Qwen3 models are publicly accessible under Apache 2.0.
Community
Qwen3 technical report
Are you just trying to promote yourself?
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Compass-V2 Technical Report (2025)
- Reasoning Beyond Limits: Advances and Open Problems for LLMs (2025)
- AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale (2025)
- Llama-Nemotron: Efficient Reasoning Models (2025)
- OpenCodeReasoning: Advancing Data Distillation for Competitive Coding (2025)
- m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models (2025)
- BitNet b1.58 2B4T Technical Report (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Thank you for the excellent technical report!
I have a question regarding your experience with reasoning generation: have you encountered issues with excessive or endless repetitions, particularly when generating reasoning in languages other than English or Chinese? In my case, I attempted to distill reasoning traces from Qwen3-32B to Qwen3-8B in a target language, and observed that the smaller model frequently produced repeated reasoning seeds, whereas the both original 32B/8B model rarely did.
Did you observe similar behavior during SFT/GRPO/distillation to smaller models stages? If so, how did you address it?
The HuggingFace model card recommends using a presence penalty to reduce repetition, but this can negatively affect overall performance. I would be very interested to hear if you found alternative approaches or tuning strategies that helped mitigate this issue more effectively.
Thanks again for sharing your work!
Hi
@mohmmaddd
kindly consider removing the comment above as it is not related to the paper or to the AI field at all.
Thanks for your consideration!