
\n","updatedAt":"2025-04-23T02:00:52.898Z","author":{"_id":"63797c273f575acc2f6893c0","avatarUrl":"/avatars/32d7a6a8881c8c4d80a097b732ed24b6.svg","fullname":"Long(Tony) Lian","name":"longlian","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":14,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8857055306434631},"editors":["longlian"],"editorAvatarUrls":["/avatars/32d7a6a8881c8c4d80a097b732ed24b6.svg"],"reactions":[],"isReport":false}},{"id":"6809959b2f1ae2e5bc88384c","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-04-24T01:36:27.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems](https://huggingface.co/papers/2504.09037) (2025)\n* [Self-Steering Language Models](https://huggingface.co/papers/2504.07081) (2025)\n* [InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models](https://huggingface.co/papers/2503.06692) (2025)\n* [Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching](https://huggingface.co/papers/2503.05179) (2025)\n* [Efficient Reasoning Models: A Survey](https://huggingface.co/papers/2504.10903) (2025)\n* [Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models](https://huggingface.co/papers/2503.24377) (2025)\n* [Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models](https://huggingface.co/papers/2503.16419) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
\n
\n
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-04-24T01:36:27.623Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7120380997657776},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2504.15466","authors":[{"_id":"6808480c49c8f78b6a4e492f","user":{"_id":"61568f37272f2d87a99ba884","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61568f37272f2d87a99ba884/lgvkl5f0rEyiQRVU5FE32.png","isPro":false,"fullname":"Jiayi Pan","user":"Jiayi-Pan","type":"user"},"name":"Jiayi Pan","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:46:37.415Z","hidden":false},{"_id":"6808480c49c8f78b6a4e4930","user":{"_id":"644570ba2d91b15b4c7f6311","avatarUrl":"/avatars/d5e66012066d0c330b8f23718b1499d8.svg","isPro":false,"fullname":"Xiuyu Li","user":"xiuyul","type":"user"},"name":"Xiuyu Li","status":"claimed_verified","statusLastChangedAt":"2025-04-23T08:27:59.248Z","hidden":false},{"_id":"6808480c49c8f78b6a4e4931","user":{"_id":"63797c273f575acc2f6893c0","avatarUrl":"/avatars/32d7a6a8881c8c4d80a097b732ed24b6.svg","isPro":true,"fullname":"Long(Tony) Lian","user":"longlian","type":"user"},"name":"Long Lian","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:46:44.955Z","hidden":false},{"_id":"6808480c49c8f78b6a4e4932","user":{"_id":"605d07fa8a3450814bada877","avatarUrl":"/avatars/eafe986982057fbaba962b99d5543477.svg","isPro":false,"fullname":"Charlie Snell","user":"sea-snell","type":"user"},"name":"Charlie Snell","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:46:52.865Z","hidden":false},{"_id":"6808480c49c8f78b6a4e4933","user":{"_id":"64fd0229e0dc35986bd3c0e5","avatarUrl":"/avatars/94f5698f9104dad7288edb4460026fd8.svg","isPro":false,"fullname":"Yifei Zhou","user":"yifeizhou","type":"user"},"name":"Yifei Zhou","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:47:00.482Z","hidden":false},{"_id":"6808480c49c8f78b6a4e4934","user":{"_id":"6333a9195a032dcd095dda13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1664329996201-noauth.jpeg","isPro":true,"fullname":"Adam Yala","user":"yala","type":"user"},"name":"Adam Yala","status":"claimed_verified","statusLastChangedAt":"2025-04-23T08:28:02.029Z","hidden":false},{"_id":"6808480c49c8f78b6a4e4935","user":{"_id":"64cbdf02f103036e23d1c7f3","avatarUrl":"/avatars/496069463900dea20929b57381182d39.svg","isPro":false,"fullname":"Trevor Darrell","user":"trevordarrell","type":"user"},"name":"Trevor Darrell","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:47:06.847Z","hidden":false},{"_id":"6808480c49c8f78b6a4e4936","user":{"_id":"6251bf4b183aa4266924ad91","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678041834400-6251bf4b183aa4266924ad91.jpeg","isPro":true,"fullname":"Kurt Keutzer","user":"kurtkeutzer","type":"user"},"name":"Kurt Keutzer","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:47:13.477Z","hidden":false},{"_id":"6808480c49c8f78b6a4e4937","user":{"_id":"6611e6e1188ff298b0dd0b79","avatarUrl":"/avatars/3a495283955ec9e06e1829c7eb2cd9a4.svg","isPro":false,"fullname":"Alane Suhr","user":"alsuhr","type":"user"},"name":"Alane Suhr","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:47:19.192Z","hidden":false}],"publishedAt":"2025-04-21T22:29:02.000Z","submittedOnDailyAt":"2025-04-23T00:30:52.876Z","title":"Learning Adaptive Parallel Reasoning with Language Models","submittedOnDailyBy":{"_id":"63797c273f575acc2f6893c0","avatarUrl":"/avatars/32d7a6a8881c8c4d80a097b732ed24b6.svg","isPro":true,"fullname":"Long(Tony) Lian","user":"longlian","type":"user"},"summary":"Scaling inference-time computation has substantially improved the reasoning\ncapabilities of language models. However, existing methods have significant\nlimitations: serialized chain-of-thought approaches generate overly long\noutputs, leading to increased latency and exhausted context windows, while\nparallel methods such as self-consistency suffer from insufficient\ncoordination, resulting in redundant computations and limited performance\ngains. To address these shortcomings, we propose Adaptive Parallel Reasoning\n(APR), a novel reasoning framework that enables language models to orchestrate\nboth serialized and parallel computations end-to-end. APR generalizes existing\nreasoning methods by enabling adaptive multi-threaded inference using spawn()\nand join() operations. A key innovation is our end-to-end reinforcement\nlearning strategy, optimizing both parent and child inference threads to\nenhance task success rate without requiring predefined reasoning structures.\nExperiments on the Countdown reasoning task demonstrate significant benefits of\nAPR: (1) higher performance within the same context window (83.4% vs. 60.0% at\n4k context); (2) superior scalability with increased computation (80.1% vs.\n66.6% at 20k total tokens); (3) improved accuracy at equivalent latency (75.2%\nvs. 57.3% at approximately 5,000ms). APR represents a step towards enabling\nlanguage models to autonomously optimize their reasoning processes through\nadaptive allocation of computation.","upvotes":44,"discussionId":"6808480c49c8f78b6a4e4968","githubRepo":"https://github.com/Parallel-Reasoning/APR","githubRepoAddedBy":"user","ai_summary":"Adaptive Parallel Reasoning (APR) enhances language model performance by optimally combining serialized and parallel computations through adaptive multi-threading and reinforcement learning.","ai_keywords":["Adaptive Parallel Reasoning","APR","serialized","parallel computations","spawn()","join()","reinforcement learning","Countdown reasoning task","multi-threaded inference","context window","scalability","computation allocation"],"githubStars":141},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63797c273f575acc2f6893c0","avatarUrl":"/avatars/32d7a6a8881c8c4d80a097b732ed24b6.svg","isPro":true,"fullname":"Long(Tony) Lian","user":"longlian","type":"user"},{"_id":"6333a9195a032dcd095dda13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1664329996201-noauth.jpeg","isPro":true,"fullname":"Adam Yala","user":"yala","type":"user"},{"_id":"6611e6e1188ff298b0dd0b79","avatarUrl":"/avatars/3a495283955ec9e06e1829c7eb2cd9a4.svg","isPro":false,"fullname":"Alane Suhr","user":"alsuhr","type":"user"},{"_id":"6629dac35e13d8145e3a605e","avatarUrl":"/avatars/95938f20ab9e067838f37aca6ea235ae.svg","isPro":false,"fullname":"Jiaxin Ge","user":"JiaxinGe","type":"user"},{"_id":"62b86b86b2c1f79981888931","avatarUrl":"/avatars/b77ece8943360aa7223b906818f11771.svg","isPro":false,"fullname":"Rodolfo Corona","user":"rcorona","type":"user"},{"_id":"630489e5dae2eb7d083e78b1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630489e5dae2eb7d083e78b1/RKjwiquU1JzaHoV4op42F.jpeg","isPro":false,"fullname":"Ritwik Gupta","user":"RitwikGupta","type":"user"},{"_id":"605d07fa8a3450814bada877","avatarUrl":"/avatars/eafe986982057fbaba962b99d5543477.svg","isPro":false,"fullname":"Charlie Snell","user":"sea-snell","type":"user"},{"_id":"61568f37272f2d87a99ba884","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61568f37272f2d87a99ba884/lgvkl5f0rEyiQRVU5FE32.png","isPro":false,"fullname":"Jiayi Pan","user":"Jiayi-Pan","type":"user"},{"_id":"67daff267f78cc8481c7a87e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Du5cDMFmxUyTxNtXFdouF.png","isPro":false,"fullname":"Natalia Harguindeguy","user":"nharguindeguy","type":"user"},{"_id":"62f0ecd2700bdc19558360de","avatarUrl":"/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg","isPro":false,"fullname":"Junyi Zhang","user":"Junyi42","type":"user"},{"_id":"6397bae577384db7e680e914","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670888141818-noauth.jpeg","isPro":false,"fullname":"Junjie Zhang","user":"junjayz","type":"user"},{"_id":"644570ba2d91b15b4c7f6311","avatarUrl":"/avatars/d5e66012066d0c330b8f23718b1499d8.svg","isPro":false,"fullname":"Xiuyu Li","user":"xiuyul","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Learning Adaptive Parallel Reasoning with Language Models
Published on Apr 21, 2025
Abstract
Adaptive Parallel Reasoning (APR) enhances language model performance by optimally combining serialized and parallel computations through adaptive multi-threading and reinforcement learning.
Scaling inference-time computation has substantially improved the reasoning
capabilities of language models. However, existing methods have significant
limitations: serialized chain-of-thought approaches generate overly long
outputs, leading to increased latency and exhausted context windows, while
parallel methods such as self-consistency suffer from insufficient
coordination, resulting in redundant computations and limited performance
gains. To address these shortcomings, we propose Adaptive Parallel Reasoning
(APR), a novel reasoning framework that enables language models to orchestrate
both serialized and parallel computations end-to-end. APR generalizes existing
reasoning methods by enabling adaptive multi-threaded inference using spawn()
and join() operations. A key innovation is our end-to-end reinforcement
learning strategy, optimizing both parent and child inference threads to
enhance task success rate without requiring predefined reasoning structures.
Experiments on the Countdown reasoning task demonstrate significant benefits of
APR: (1) higher performance within the same context window (83.4% vs. 60.0% at
4k context); (2) superior scalability with increased computation (80.1% vs.
66.6% at 20k total tokens); (3) improved accuracy at equivalent latency (75.2%
vs. 57.3% at approximately 5,000ms). APR represents a step towards enabling
language models to autonomously optimize their reasoning processes through
adaptive allocation of computation.