Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-02-13T01:40:52.941Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7370595932006836},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.02192","authors":[{"_id":"698b4deb6052d3bed96307aa","name":"Jie Xiao","hidden":false},{"_id":"698b4deb6052d3bed96307ab","name":"Meng Chen","hidden":false},{"_id":"698b4deb6052d3bed96307ac","name":"Qingnan Ren","hidden":false},{"_id":"698c9e456c5152984e4f3c98","name":"Jingwei Song","hidden":false},{"_id":"698b4deb6052d3bed96307ae","name":"Jiaqi Huang","hidden":false},{"_id":"698b4deb6052d3bed96307af","name":"Yangshen Deng","hidden":false},{"_id":"698b4deb6052d3bed96307b0","name":"Chris Tong","hidden":false},{"_id":"698b4deb6052d3bed96307b1","name":"Wanyi Chen","hidden":false},{"_id":"698b4deb6052d3bed96307b2","name":"Suli Wang","hidden":false},{"_id":"698b4deb6052d3bed96307b3","name":"Ziqian Bi","hidden":false},{"_id":"698b4deb6052d3bed96307b4","name":"Shuo Lu","hidden":false},{"_id":"698b4deb6052d3bed96307b5","name":"Yiqun Duan","hidden":false},{"_id":"698b4deb6052d3bed96307b6","name":"Xu Wang","hidden":false},{"_id":"698b4deb6052d3bed96307b7","name":"Rymon Yu","hidden":false},{"_id":"698b4deb6052d3bed96307b8","name":"Ween Yang","hidden":false},{"_id":"698b4deb6052d3bed96307b9","name":"Lynn Ai","hidden":false},{"_id":"698b4deb6052d3bed96307ba","name":"Eric Yang","hidden":false},{"_id":"698b4deb6052d3bed96307bb","name":"Bill Shi","hidden":false},{"_id":"698b4deb6052d3bed96307ad","user":{"_id":"6703c272c265fe2565f90e5c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6703c272c265fe2565f90e5c/Fu3--OfA6tFYeASPrxc_c.png","isPro":false,"fullname":"Jingwei Song","user":"JingweiSong","type":"user"},"name":"Song Jingwei","status":"claimed_verified","statusLastChangedAt":"2026-02-11T11:16:34.503Z","hidden":false}],"publishedAt":"2026-02-02T14:57:53.000Z","submittedOnDailyAt":"2026-02-12T13:34:09.565Z","title":"ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning","submittedOnDailyBy":{"_id":"6703c272c265fe2565f90e5c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6703c272c265fe2565f90e5c/Fu3--OfA6tFYeASPrxc_c.png","isPro":false,"fullname":"Jingwei Song","user":"JingweiSong","type":"user"},"summary":"Reinforcement learning (RL) is a critical stage in post-training large language models (LLMs), involving repeated interaction between rollout generation, reward evaluation, and centralized learning. Distributing rollout execution offers opportunities to leverage more cost-efficient inference resources, but introduces challenges in wide-area coordination and policy dissemination. We present ECHO-2, a distributed RL framework for post-training with remote inference workers and non-negligible dissemination latency. ECHO-2 combines centralized learning with distributed rollouts and treats bounded policy staleness as a user-controlled parameter, enabling rollout generation, dissemination, and training to overlap. We introduce an overlap-based capacity model that relates training time, dissemination latency, and rollout throughput, yielding a practical provisioning rule for sustaining learner utilization. To mitigate dissemination bottlenecks and lower cost, ECHO-2 employs peer-assisted pipelined broadcast and cost-aware activation of heterogeneous workers. Experiments on GRPO post-training of 4B and 8B models under real wide-area bandwidth regimes show that ECHO-2 significantly improves cost efficiency while preserving RL reward comparable to strong baselines.","upvotes":12,"discussionId":"698b4deb6052d3bed96307bc","ai_summary":"ECHO-2 is a distributed reinforcement learning framework that enables efficient post-training of large language models by overlapping rollout generation, dissemination, and training while managing policy staleness and network latency.","ai_keywords":["reinforcement learning","post-training","large language models","distributed RL","rollout generation","reward evaluation","centralized learning","policy staleness","peer-assisted pipelined broadcast","cost-aware activation","GRPO"],"organization":{"_id":"68906a5d50f806919e80a6ef","name":"GradientResearch","fullname":"Gradient","avatar":"https://cdn-uploads.huggingface.co/production/uploads/68906890eb018fabfe8487bc/ZPXtRlYSostqSRvrt4AWY.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63f3d3da4745321de355e895","avatarUrl":"/avatars/fa2af1698f81b7bf3ddd7184fc5b1775.svg","isPro":false,"fullname":"Tianyu Shi","user":"tianyushi","type":"user"},{"_id":"698b5281cfb7af87c44dddbd","avatarUrl":"/avatars/ecbcf5104c1ab2c6b2a3f1ea3039a142.svg","isPro":false,"fullname":"Xia Boyang","user":"Ludwig655","type":"user"},{"_id":"698b72c4ba263f7bd9d0a951","avatarUrl":"/avatars/75d97c362b3a6122ca115df826308bdb.svg","isPro":false,"fullname":"Alex Mirran","user":"VinnyLinguini","type":"user"},{"_id":"6419309f22270b3ccf177c77","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6419309f22270b3ccf177c77/KQa1586iBBKqucUlfpuPp.jpeg","isPro":false,"fullname":"William Li","user":"williamium","type":"user"},{"_id":"6703c272c265fe2565f90e5c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6703c272c265fe2565f90e5c/Fu3--OfA6tFYeASPrxc_c.png","isPro":false,"fullname":"Jingwei Song","user":"JingweiSong","type":"user"},{"_id":"6532cecc7139c5dd8d527685","avatarUrl":"/avatars/b36027822d6b00831eb9c232031194f0.svg","isPro":true,"fullname":"Jarrod Barnes","user":"Jarrodbarnes","type":"user"},{"_id":"67dbc93c86c9b17c0b66e9f2","avatarUrl":"/avatars/acb991aee1c6de0b26a2cd7c5522a943.svg","isPro":false,"fullname":"harry harrah","user":"harryharrah","type":"user"},{"_id":"698e8a3617aa21b2c925d8cb","avatarUrl":"/avatars/15f10d126bae1020c0cc8d9483f1921b.svg","isPro":false,"fullname":"Chengzi","user":"Chengzi404","type":"user"},{"_id":"645bbc26bccccb946f827aac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645bbc26bccccb946f827aac/N_diuV7tEMHEmIzt6taTV.png","isPro":false,"fullname":"Polarbear","user":"PolarBear123","type":"user"},{"_id":"698e9421520817ad75fa5cb6","avatarUrl":"/avatars/338677c0d73eba657f9e2af577e244af.svg","isPro":false,"fullname":"cathywong","user":"cathywong","type":"user"},{"_id":"68d3aa3550ae5dbc0ed3bdd1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mqHj3wNTC5ChPoZ1ZzzYs.png","isPro":false,"fullname":"Samuel","user":"lamsumchat","type":"user"},{"_id":"698ec10efedf07cfa5e565c7","avatarUrl":"/avatars/f58ac5e896c677a2d265f1e910b3acc4.svg","isPro":false,"fullname":"Han","user":"Chelsea125","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"68906a5d50f806919e80a6ef","name":"GradientResearch","fullname":"Gradient","avatar":"https://cdn-uploads.huggingface.co/production/uploads/68906890eb018fabfe8487bc/ZPXtRlYSostqSRvrt4AWY.jpeg"}}">
ECHO-2 is a distributed reinforcement learning framework that enables efficient post-training of large language models by overlapping rollout generation, dissemination, and training while managing policy staleness and network latency.
AI-generated summary
Reinforcement learning (RL) is a critical stage in post-traininglarge language models (LLMs), involving repeated interaction between rollout generation, reward evaluation, and centralized learning. Distributing rollout execution offers opportunities to leverage more cost-efficient inference resources, but introduces challenges in wide-area coordination and policy dissemination. We present ECHO-2, a distributed RL framework for post-training with remote inference workers and non-negligible dissemination latency. ECHO-2 combines centralized learning with distributed rollouts and treats bounded policy staleness as a user-controlled parameter, enabling rollout generation, dissemination, and training to overlap. We introduce an overlap-based capacity model that relates training time, dissemination latency, and rollout throughput, yielding a practical provisioning rule for sustaining learner utilization. To mitigate dissemination bottlenecks and lower cost, ECHO-2 employs peer-assisted pipelined broadcast and cost-aware activation of heterogeneous workers. Experiments on GRPOpost-training of 4B and 8B models under real wide-area bandwidth regimes show that ECHO-2 significantly improves cost efficiency while preserving RL reward comparable to strong baselines.
Current RLHF/RLAIF is bottlenecked by rollouts and wasteful GPU idling. ECHO-2 changes the cost structure: we decouple RL into three planes—rollout (global inference swarm), learning (staleness-aware multi-step updates), and data/reward (fully modular)—and coordinate them with lightweight versioning and pipelined broadcast. The result is near-continuous learner utilization even under heterogeneous, unreliable WAN workers, enabling RL to scale out across a global fleet rather than up inside datacenters. We validate on GRPO-based reasoning/code tasks and a poker sandbox integration.