Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-17T01:35:54.565Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7442055344581604},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"696ba49a357a407075279186","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2026-01-17T15:02:50.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv explained breakdown of this paper ๐Ÿ‘‰ https://arxivexplained.com/papers/collaborative-multi-agent-test-time-reinforcement-learning-for-reasoning\n","html":"

arXiv explained breakdown of this paper ๐Ÿ‘‰ https://arxivexplained.com/papers/collaborative-multi-agent-test-time-reinforcement-learning-for-reasoning

\n","updatedAt":"2026-01-17T15:02:50.055Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7987817525863647},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false}},{"id":"696c8e6a85619ece0de2e4fd","author":{"_id":"61de7e35c201b96c5db3bec6","avatarUrl":"/avatars/d69c9fcbeff60f6f1c442b2a1226307a.svg","fullname":"Sayan Nag","name":"nagsayan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-01-18T07:40:26.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2026-01-18T07:41:40.220Z","author":{"_id":"61de7e35c201b96c5db3bec6","avatarUrl":"/avatars/d69c9fcbeff60f6f1c442b2a1226307a.svg","fullname":"Sayan Nag","name":"nagsayan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.09667","authors":[{"_id":"6969b0f732f0333869ff9476","user":{"_id":"64351475901c5734bcb64248","avatarUrl":"/avatars/12346d4301c1bfb00ce0ea128a93cc15.svg","isPro":false,"fullname":"Zhiyuan Hu","user":"zhiyuanhucs","type":"user"},"name":"Zhiyuan Hu","status":"admin_assigned","statusLastChangedAt":"2026-01-16T15:33:48.445Z","hidden":false},{"_id":"6969b0f732f0333869ff9477","user":{"_id":"662b4e3bc709a61df840fda1","avatarUrl":"/avatars/fc73c63a4e1f8fbb084ec43ec9af0af0.svg","isPro":false,"fullname":"Hu Yunhai","user":"AlexCCtop","type":"user"},"name":"Yunhai Hu","status":"claimed_verified","statusLastChangedAt":"2026-01-16T14:37:06.706Z","hidden":false},{"_id":"6969b0f732f0333869ff9478","user":{"_id":"650026d30339dae3dba2cec5","avatarUrl":"/avatars/fcc9ea4336f8d4bb177e5c9eacdd05c9.svg","isPro":false,"fullname":"Juncheng Liu","user":"juncliu","type":"user"},"name":"Juncheng Liu","status":"claimed_verified","statusLastChangedAt":"2026-01-16T10:29:33.401Z","hidden":false},{"_id":"6969b0f732f0333869ff9479","name":"Shuyue Stella Li","hidden":false},{"_id":"6969b0f732f0333869ff947a","user":{"_id":"6653f759b6316cf6f63fafec","avatarUrl":"/avatars/23770cd11e885305c3626d5596308770.svg","isPro":false,"fullname":"Yucheng Wang","user":"Echoandland","type":"user"},"name":"Yucheng Wang","status":"claimed_verified","statusLastChangedAt":"2026-01-26T08:33:53.054Z","hidden":false},{"_id":"6969b0f732f0333869ff947b","user":{"_id":"638e40d450a4e4beef98196b","avatarUrl":"/avatars/fe27e019baf48caeb44e19b7289db9fb.svg","isPro":false,"fullname":"Zhen Xu","user":"zhenxu","type":"user"},"name":"Zhen Xu","status":"admin_assigned","statusLastChangedAt":"2026-01-16T15:34:04.868Z","hidden":false},{"_id":"6969b0f732f0333869ff947c","name":"See-Kiong Ng","hidden":false},{"_id":"6969b0f732f0333869ff947d","user":{"_id":"655722e80438e0854fae7554","avatarUrl":"/avatars/b93a74f7c7880f9fe0f3ffb47e2aef5e.svg","isPro":false,"fullname":"Luu Anh Tuan","user":"anhtuanluu36","type":"user"},"name":"Anh Tuan Luu","status":"admin_assigned","statusLastChangedAt":"2026-01-16T15:34:15.855Z","hidden":false},{"_id":"6969b0f732f0333869ff947e","name":"Xinxing Xu","hidden":false},{"_id":"6969b0f732f0333869ff947f","user":{"_id":"651d8032c50012d33e914f2f","avatarUrl":"/avatars/0a44c9f51fc50ce86582e328c361ea00.svg","isPro":false,"fullname":"Bryan Hooi","user":"bhooi","type":"user"},"name":"Bryan Hooi","status":"admin_assigned","statusLastChangedAt":"2026-01-16T15:34:25.577Z","hidden":false},{"_id":"6969b0f732f0333869ff9480","user":{"_id":"672793ffa5255a517fd02045","avatarUrl":"/avatars/a2569be6f2e952b5b00e5d4b89a7cede.svg","isPro":false,"fullname":"Cynthia Breazeal","user":"cynthiabreazeal","type":"user"},"name":"Cynthia Breazeal","status":"admin_assigned","statusLastChangedAt":"2026-01-16T15:34:31.289Z","hidden":false},{"_id":"6969b0f732f0333869ff9481","user":{"_id":"682352cdb1c5350f850dd952","avatarUrl":"/avatars/5426efe0195ac8f914839e6585b1a112.svg","isPro":false,"fullname":"Hae Won Park","user":"robohaewon","type":"user"},"name":"Hae Won Park","status":"admin_assigned","statusLastChangedAt":"2026-01-16T15:34:36.481Z","hidden":false}],"publishedAt":"2026-01-14T17:57:43.000Z","submittedOnDailyAt":"2026-01-16T01:01:32.343Z","title":"Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning","submittedOnDailyBy":{"_id":"64351475901c5734bcb64248","avatarUrl":"/avatars/12346d4301c1bfb00ce0ea128a93cc15.svg","isPro":false,"fullname":"Zhiyuan Hu","user":"zhiyuanhucs","type":"user"},"summary":"Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often sparse and high-variance. Therefore, we introduce Multi-Agent Test-Time Reinforcement Learning (MATTRL), a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67\\% over a multi-agent baseline, and by 8.67\\% over comparable single-agent baselines. Ablation studies examine different credit-assignment schemes and provide a detailed comparison of how they affect training outcomes. MATTRL offers a stable, effective and efficient path to distribution-shift-robust multi-agent reasoning without tuning.","upvotes":91,"discussionId":"6969b0f832f0333869ff9482","githubRepo":"https://github.com/zhiyuanhubj/MATTRL","githubRepoAddedBy":"auto","ai_summary":"Multi-Agent Test-Time Reinforcement Learning (MATTRL) enhances multi-agent reasoning through structured textual experience injection and consensus-based decision making at inference time.","ai_keywords":["multi-agent systems","reinforcement learning","test-time reinforcement learning","multi-agent reinforcement learning","credit assignment","multi-expert teams","dialogue systems","distribution-shift-robust reasoning"],"githubStars":10,"organization":{"_id":"63728bde14d543d507ae970d","name":"MIT","fullname":"Massachusetts Institute of Technology","avatar":"https://cdn-uploads.huggingface.co/production/uploads/noauth/S90qoeEJeEYaYf-c7Zs8g.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64351475901c5734bcb64248","avatarUrl":"/avatars/12346d4301c1bfb00ce0ea128a93cc15.svg","isPro":false,"fullname":"Zhiyuan Hu","user":"zhiyuanhucs","type":"user"},{"_id":"61a4a4743205e107691e0d68","avatarUrl":"/avatars/df900a465a244dc749c007009336b4d6.svg","isPro":false,"fullname":"Jiaying","user":"Judit","type":"user"},{"_id":"6650c77a74664a42ddfb9187","avatarUrl":"/avatars/92001bbe0ae9b14309730316b639cede.svg","isPro":false,"fullname":"yueliu1999","user":"yueliu1999","type":"user"},{"_id":"6653f759b6316cf6f63fafec","avatarUrl":"/avatars/23770cd11e885305c3626d5596308770.svg","isPro":false,"fullname":"Yucheng Wang","user":"Echoandland","type":"user"},{"_id":"65b0ca7170773c0ab8fd981e","avatarUrl":"/avatars/fda6502ada338dca9756877db42d8d08.svg","isPro":false,"fullname":"luojueling","user":"xiaoluo11","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6969b98f4f243d1b42a1f4b2","avatarUrl":"/avatars/f0fd2506c531e49452e79641fa3f91c6.svg","isPro":false,"fullname":"Hairui","user":"Venus2020","type":"user"},{"_id":"6684f1993bfdaa40c2f5b2b8","avatarUrl":"/avatars/70b876daed30f45a393da61dd700c198.svg","isPro":false,"fullname":"zhu","user":"zhu-thu-22","type":"user"},{"_id":"62f662bcc58915315c4eccea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f662bcc58915315c4eccea/zOAQLONfMP88zr70sxHK-.jpeg","isPro":true,"fullname":"Yilun Zhao","user":"yilunzhao","type":"user"},{"_id":"66f16d166f7038039a1e2770","avatarUrl":"/avatars/0a30d3e9af3b109ce4b82396b0e8d685.svg","isPro":false,"fullname":"Yibo Wang","user":"yiboowang","type":"user"},{"_id":"6969b95098631084adaa5b16","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/MsQHDz650n7tyKEdH30FI.jpeg","isPro":false,"fullname":"DENG MINGYU","user":"Cici050304","type":"user"},{"_id":"6539bc7756c9b35961021fa8","avatarUrl":"/avatars/b0140589c0a435c903c93d93a1a6ee8b.svg","isPro":false,"fullname":"Jiaqi Wei","user":"VitaCoco","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"63728bde14d543d507ae970d","name":"MIT","fullname":"Massachusetts Institute of Technology","avatar":"https://cdn-uploads.huggingface.co/production/uploads/noauth/S90qoeEJeEYaYf-c7Zs8g.png"}}">
Papers
arxiv:2601.09667

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Published on Jan 14
ยท Submitted by
Zhiyuan Hu
on Jan 16
Authors:
,
,
,

Abstract

Multi-Agent Test-Time Reinforcement Learning (MATTRL) enhances multi-agent reasoning through structured textual experience injection and consensus-based decision making at inference time.

AI-generated summary

Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often sparse and high-variance. Therefore, we introduce Multi-Agent Test-Time Reinforcement Learning (MATTRL), a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67\% over a multi-agent baseline, and by 8.67\% over comparable single-agent baselines. Ablation studies examine different credit-assignment schemes and provide a detailed comparison of how they affect training outcomes. MATTRL offers a stable, effective and efficient path to distribution-shift-robust multi-agent reasoning without tuning.

Community

Paper author Paper submitter

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Excellent work!

nice~

This comment has been hidden

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.09667 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.09667 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.09667 in a Space README.md to link it from this page.

Collections including this paper 7