Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
https://x.com/victormustar/status/2023423300278583727\n","updatedAt":"2026-02-17T11:04:28.943Z","author":{"_id":"6947f69751d7ae7c3c7b6908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/PuIDZB9XDShHohKhYmdmp.png","fullname":"Ben Kelly","name":"YellowjacketGames","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.5886402130126953},"editors":["YellowjacketGames"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/PuIDZB9XDShHohKhYmdmp.png"],"reactions":[],"isReport":false}},{"id":"6995186e0569bcb55ab31dca","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-18T01:39:58.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Step-DeepResearch Technical Report](https://huggingface.co/papers/2512.20491) (2025)\n* [O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL](https://huggingface.co/papers/2601.03743) (2026)\n* [TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning](https://huggingface.co/papers/2512.20312) (2025)\n* [PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning](https://huggingface.co/papers/2601.05593) (2026)\n* [AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents](https://huggingface.co/papers/2602.06485) (2026)\n* [Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis](https://huggingface.co/papers/2602.03279) (2026)\n* [Kimi K2.5: Visual Agentic Intelligence](https://huggingface.co/papers/2602.02276) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-02-18T01:39:58.610Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6960902214050293},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.13367","authors":[{"_id":"699431ed50fb2c0be4783e65","name":"Chen Yang","hidden":false},{"_id":"699431ed50fb2c0be4783e66","name":"Guangyue Peng","hidden":false},{"_id":"699431ed50fb2c0be4783e67","name":"Jiaying Zhu","hidden":false},{"_id":"699431ed50fb2c0be4783e68","name":"Ran Le","hidden":false},{"_id":"699431ed50fb2c0be4783e69","name":"Ruixiang Feng","hidden":false},{"_id":"699431ed50fb2c0be4783e6a","name":"Tao Zhang","hidden":false},{"_id":"699431ed50fb2c0be4783e6b","name":"Xiyun Xu","hidden":false},{"_id":"699431ed50fb2c0be4783e6c","name":"Yang Song","hidden":false},{"_id":"699431ed50fb2c0be4783e6d","name":"Yiming Jia","hidden":false},{"_id":"699431ed50fb2c0be4783e6e","name":"Yuntao Wen","hidden":false},{"_id":"699431ed50fb2c0be4783e6f","name":"Yunzhi Xu","hidden":false},{"_id":"699431ed50fb2c0be4783e70","name":"Zekai Wang","hidden":false},{"_id":"699431ed50fb2c0be4783e71","name":"Zhenwei An","hidden":false},{"_id":"699431ed50fb2c0be4783e72","name":"Zhicong Sun","hidden":false},{"_id":"699431ed50fb2c0be4783e73","name":"Zongchao Chen","hidden":false}],"publishedAt":"2026-02-13T13:10:46.000Z","submittedOnDailyAt":"2026-02-17T08:33:06.878Z","title":"Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts","submittedOnDailyBy":{"_id":"6947f69751d7ae7c3c7b6908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/PuIDZB9XDShHohKhYmdmp.png","isPro":true,"fullname":"Ben Kelly","user":"YellowjacketGames","type":"user"},"summary":"We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.","upvotes":21,"discussionId":"699431ed50fb2c0be4783e74","projectPage":"https://huggingface.co/Nanbeige/Nanbeige4.1-3B","ai_summary":"Nanbeige4.1-3B is a 3B-parameter unified language model that demonstrates superior performance in agentic behavior, code generation, and reasoning compared to larger models through advanced reward modeling and training techniques.","ai_keywords":["unified generalist language model","reward modeling","reinforcement learning","tool-call turns","deep search","complex data synthesis","turn-level supervision","point-wise reward modeling","pair-wise reward modeling","code generation","general reasoning","agentic behavior","human-aligned responses","model optimization"],"organization":{"_id":"6533c00a9860c1cb37bff25f","name":"Nanbeige","fullname":"Nanbeige LLM Lab","avatar":"https://cdn-uploads.huggingface.co/production/uploads/646f0d118ff94af23bc44aab/GXHCollpMRgvYqUXQ2BQ7.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6947f69751d7ae7c3c7b6908","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/PuIDZB9XDShHohKhYmdmp.png","isPro":true,"fullname":"Ben Kelly","user":"YellowjacketGames","type":"user"},{"_id":"672e0fe92ed6d70d42558967","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/fBkgP7lH1Z5YbZ7m_RkeW.png","isPro":false,"fullname":"osea","user":"Nick13856","type":"user"},{"_id":"646f0d118ff94af23bc44aab","avatarUrl":"/avatars/3ad7d3bf34c29ff324c4d8be1eb06e6e.svg","isPro":false,"fullname":"songyang","user":"magicsongyang","type":"user"},{"_id":"649ab4bdd4ae399f67296904","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649ab4bdd4ae399f67296904/9czHlG4RwfkYW07sPP4IL.png","isPro":false,"fullname":"Yuntao Wen","user":"skysss","type":"user"},{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":false,"fullname":"Urro","user":"urroxyz","type":"user"},{"_id":"69555842ca93f97f4129bf30","avatarUrl":"/avatars/5b0bb5a430d2487f91ded60d66f9c069.svg","isPro":false,"fullname":"Wael Antar","user":"Wael-Antar","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"},{"_id":"6900b7a8ecfa14b7b16368fb","avatarUrl":"/avatars/183456c0fa0d5e21c27813ce48d3bad2.svg","isPro":false,"fullname":"Sentinel","user":"Sentinel7","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"6460c3811db65f878513bcaf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6460c3811db65f878513bcaf/CRdJ8lXixDku3k8Rm5Stn.jpeg","isPro":false,"fullname":"Jingwei Zuo","user":"JingweiZuo","type":"user"},{"_id":"64834b399b352597e41816ac","avatarUrl":"/avatars/63d9d123bffa90f43186a0bdc4455cbd.svg","isPro":false,"fullname":"Shaobai Jiang","user":"shaobaij","type":"user"},{"_id":"62e2cb662be89f0bf5d6e8d1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659030364276-noauth.jpeg","isPro":false,"fullname":"Daryl Tucker","user":"daryltucker","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"6533c00a9860c1cb37bff25f","name":"Nanbeige","fullname":"Nanbeige LLM Lab","avatar":"https://cdn-uploads.huggingface.co/production/uploads/646f0d118ff94af23bc44aab/GXHCollpMRgvYqUXQ2BQ7.png"}}">
Nanbeige4.1-3B is a 3B-parameter unified language model that demonstrates superior performance in agentic behavior, code generation, and reasoning compared to larger models through advanced reward modeling and training techniques.
AI-generated summary
We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.