https://arxiv.org/abs/2512.07461

Code: https://github.com/bigai-nlco/Native-Parallel-Reasoner

Model & Data: https://huggingface.co/bigai-NPR

Website: https://bigai-nlco.github.io/Native-Parallel-Reasoner

\n","updatedAt":"2025-12-09T06:42:55.967Z","author":{"_id":"63a95a6a7930fa8c7dd63d4e","avatarUrl":"/avatars/d9d0420f7ddfe2f3a7e029fb05f1c89f.svg","fullname":"Zilong Zheng","name":"zlzheng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.58427494764328},"editors":["zlzheng"],"editorAvatarUrls":["/avatars/d9d0420f7ddfe2f3a7e029fb05f1c89f.svg"],"reactions":[{"reaction":"🔥","users":["AdinaY","Kiy-K","jacklanda"],"count":3}],"isReport":false}},{"id":"6938cf2716167bb16075dea6","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-12-10T01:38:47.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning](https://huggingface.co/papers/2510.25992) (2025)\n* [ORION: Teaching Language Models to Reason Efficiently in the Language of Thought](https://huggingface.co/papers/2511.22891) (2025)\n* [Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning](https://huggingface.co/papers/2510.10974) (2025)\n* [Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards](https://huggingface.co/papers/2511.17473) (2025)\n* [A2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning](https://huggingface.co/papers/2510.12838) (2025)\n* [Tailored Primitive Initialization is the Secret Key to Reinforcement Learning](https://huggingface.co/papers/2511.12429) (2025)\n* [SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning](https://huggingface.co/papers/2512.03244) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-12-10T01:38:47.605Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7385526299476624},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"694376abb39a0c279c620c92","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-12-18T03:36:11.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/native-parallel-reasoner-reasoning-in-parallelism-via-self-distilled-reinforcement-learning-5052-11bb72d9\n- Key Findings\n- Executive Summary \n- Detailed Breakdown\n- Practical Applications","html":"

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/native-parallel-reasoner-reasoning-in-parallelism-via-self-distilled-reinforcement-learning-5052-11bb72d9

Key Findings
Executive Summary
Detailed Breakdown
Practical Applications

\n","updatedAt":"2025-12-18T03:36:11.926Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7659336924552917},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false},"replies":[{"id":"694386d08a5bb923d6d6d776","author":{"_id":"6191cc9e6d34e827404cebab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674119843175-6191cc9e6d34e827404cebab.jpeg","fullname":"Yang","name":"jacklanda","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2025-12-18T04:45:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks so much!","html":"

Thanks so much!

\n","updatedAt":"2025-12-18T04:45:04.451Z","author":{"_id":"6191cc9e6d34e827404cebab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674119843175-6191cc9e6d34e827404cebab.jpeg","fullname":"Yang","name":"jacklanda","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.29927581548690796},"editors":["jacklanda"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674119843175-6191cc9e6d34e827404cebab.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"694376abb39a0c279c620c92"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2512.07461","authors":[{"_id":"6937b96219d912300c34a398","user":{"_id":"626b889ff451470f861d8c78","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1651214465695-noauth.jpeg","isPro":false,"fullname":"victor wu","user":"victor-wu","type":"user"},"name":"Tong Wu","status":"claimed_verified","statusLastChangedAt":"2025-12-09T09:22:22.731Z","hidden":false},{"_id":"6937b96219d912300c34a399","user":{"_id":"6191cc9e6d34e827404cebab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674119843175-6191cc9e6d34e827404cebab.jpeg","isPro":false,"fullname":"Yang","user":"jacklanda","type":"user"},"name":"Yang Liu","status":"claimed_verified","statusLastChangedAt":"2025-12-09T09:22:20.278Z","hidden":false},{"_id":"6937b96219d912300c34a39a","user":{"_id":"624505fcd083d28d314de3dd","avatarUrl":"/avatars/92cf6b6a1d81d7958dbbd21f0bf63f8f.svg","isPro":false,"fullname":"Jun Bai","user":"ba1jun","type":"user"},"name":"Jun Bai","status":"claimed_verified","statusLastChangedAt":"2025-12-09T09:22:17.404Z","hidden":false},{"_id":"6937b96219d912300c34a39b","name":"Zixia Jia","hidden":false},{"_id":"6937b96219d912300c34a39c","name":"Shuyi Zhang","hidden":false},{"_id":"6937b96219d912300c34a39d","name":"Ziyong Lin","hidden":false},{"_id":"6937b96219d912300c34a39e","user":{"_id":"64b119c4372d43407723136b","avatarUrl":"/avatars/d523e181993eea06b7f6a71a592c995e.svg","isPro":false,"fullname":"YANTING WANG","user":"Noane","type":"user"},"name":"Yanting Wang","status":"claimed_verified","statusLastChangedAt":"2025-12-09T09:22:14.418Z","hidden":false},{"_id":"6937b96219d912300c34a39f","name":"Song-Chun Zhu","hidden":false},{"_id":"6937b96219d912300c34a3a0","user":{"_id":"63a95a6a7930fa8c7dd63d4e","avatarUrl":"/avatars/d9d0420f7ddfe2f3a7e029fb05f1c89f.svg","isPro":false,"fullname":"Zilong Zheng","user":"zlzheng","type":"user"},"name":"Zilong Zheng","status":"claimed_verified","statusLastChangedAt":"2025-12-10T09:10:05.315Z","hidden":false}],"publishedAt":"2025-12-08T11:39:43.000Z","submittedOnDailyAt":"2025-12-09T04:12:55.960Z","title":"Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning","submittedOnDailyBy":{"_id":"63a95a6a7930fa8c7dd63d4e","avatarUrl":"/avatars/d9d0420f7ddfe2f3a7e029fb05f1c89f.svg","isPro":false,"fullname":"Zilong Zheng","user":"zlzheng","type":"user"},"summary":"We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start'' format discovery to strict topological constraints without external supervision; 2) a novel Parallel-Aware Policy Optimization (PAPO) algorithm that optimizes branching policies directly within the execution graph, allowing the model to learn adaptive decomposition via trial and error; and 3) a robust NPR Engine that refactors memory management and flow control of SGLang to enable stable, large-scale parallel RL training. Across eight reasoning benchmarks, NPR trained on Qwen3-4B achieves performance gains of up to 24.5% and inference speedups up to 4.6x. Unlike prior baselines that often fall back to autoregressive decoding, NPR demonstrates 100% genuine parallel execution, establishing a new standard for self-evolving, efficient, and scalable agentic reasoning.","upvotes":78,"discussionId":"6937b96219d912300c34a3a1","projectPage":"https://bigai-nlco.github.io/Native-Parallel-Reasoner/","githubRepo":"https://github.com/bigai-nlco/Native-Parallel-Reasoner","githubRepoAddedBy":"user","ai_summary":"NPR, a teacher-free framework, enhances Large Language Models with native parallel reasoning capabilities through self-distilled training, Parallel-Aware Policy Optimization, and a robust NPR Engine, achieving substantial performance and speed improvements.","ai_keywords":["Native Parallel Reasoner","Large Language Models","self-evolve","parallel reasoning","self-distilled progressive training","cold-start format discovery","topological constraints","Parallel-Aware Policy Optimization","branching policies","execution graph","adaptive decomposition","trial and error","NPR Engine","memory management","flow control","parallel RL training","reasoning benchmarks","Qwen3-4B","genuine parallel execution","autoregressive decoding","agentic reasoning"],"githubStars":100,"organization":{"_id":"63a95ac93453852ef5399a77","name":"bigai","fullname":"Beijing Institute for General Artificial Intelligence","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1672043197974-63a95a6a7930fa8c7dd63d4e.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64d1ebe52f92537fbc4c8bc2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d1ebe52f92537fbc4c8bc2/cGaVNHDdi_YtSeJ5IxJ7x.png","isPro":false,"fullname":"Callter","user":"Callter","type":"user"},{"_id":"6191cc9e6d34e827404cebab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674119843175-6191cc9e6d34e827404cebab.jpeg","isPro":false,"fullname":"Yang","user":"jacklanda","type":"user"},{"_id":"624505fcd083d28d314de3dd","avatarUrl":"/avatars/92cf6b6a1d81d7958dbbd21f0bf63f8f.svg","isPro":false,"fullname":"Jun Bai","user":"ba1jun","type":"user"},{"_id":"64b7ae6cf53ae848e72b997d","avatarUrl":"/avatars/b55dd3d6fcb3ccac2e3880d01a9bdc63.svg","isPro":false,"fullname":"Zixia Jia","user":"vickyandkekey","type":"user"},{"_id":"645dd6407b6b366101f17cc0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/MrW2o9sOGWjIw4WjGtDKP.png","isPro":false,"fullname":"CCAE","user":"CCAE","type":"user"},{"_id":"6765437423607315da283746","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/XAHr4ztvvHuP2N02t9xBz.png","isPro":false,"fullname":"LM-Lexicon","user":"LM-Lexicon","type":"user"},{"_id":"667b99f081adc76dc72f14c6","avatarUrl":"/avatars/48f39cffd3a5dee50a707671c5751f09.svg","isPro":false,"fullname":"Shuyi","user":"shuyi-zsy","type":"user"},{"_id":"63a95a6a7930fa8c7dd63d4e","avatarUrl":"/avatars/d9d0420f7ddfe2f3a7e029fb05f1c89f.svg","isPro":false,"fullname":"Zilong Zheng","user":"zlzheng","type":"user"},{"_id":"64c20c59e75fd66a71d2b419","avatarUrl":"/avatars/7171fd26734672a4291595c12060d5df.svg","isPro":false,"fullname":"Hongtao Li","user":"Aurumting","type":"user"},{"_id":"6440656b757aa3c2ad86a67a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6440656b757aa3c2ad86a67a/oVSwa0QfQYJjTBUPwXhO-.jpeg","isPro":false,"fullname":"Wenming Tu","user":"tutu0604","type":"user"},{"_id":"6494e048958eddc720c2c3e0","avatarUrl":"/avatars/f594c4db7ea930376936d9a9bc61ab35.svg","isPro":false,"fullname":"Buwei He","user":"Hermi2023","type":"user"},{"_id":"662efb707bff6a69de0adf3c","avatarUrl":"/avatars/99b6ad3a4a07a670cf5b3bd03020a391.svg","isPro":false,"fullname":"Yipeng","user":"fringsoo","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1,"organization":{"_id":"63a95ac93453852ef5399a77","name":"bigai","fullname":"Beijing Institute for General Artificial Intelligence","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1672043197974-63a95a6a7930fa8c7dd63d4e.png"}}">

Papers

arxiv:2512.07461

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Published on Dec 8, 2025

· Submitted by

Zilong Zheng on Dec 9, 2025

#1 Paper of the day

Beijing Institute for General Artificial Intelligence

Upvote

Authors:

Tong Wu ,

Yang Liu ,

Jun Bai ,

Yanting Wang ,

Zilong Zheng

Abstract

NPR, a teacher-free framework, enhances Large Language Models with native parallel reasoning capabilities through self-distilled training, Parallel-Aware Policy Optimization, and a robust NPR Engine, achieving substantial performance and speed improvements.

AI-generated summary

We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start'' format discovery to strict topological constraints without external supervision; 2) a novel Parallel-Aware Policy Optimization (PAPO) algorithm that optimizes branching policies directly within the execution graph, allowing the model to learn adaptive decomposition via trial and error; and 3) a robust NPR Engine that refactors memory management and flow control of SGLang to enable stable, large-scale parallel RL training. Across eight reasoning benchmarks, NPR trained on Qwen3-4B achieves performance gains of up to 24.5% and inference speedups up to 4.6x. Unlike prior baselines that often fall back to autoregressive decoding, NPR demonstrates 100% genuine parallel execution, establishing a new standard for self-evolving, efficient, and scalable agentic reasoning.