Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
https://arxivexplained.com/papers/autoenv-automated-environments-for-measuring-cross-environment-agent-learning\n","updatedAt":"2025-11-26T06:03:26.148Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6600323915481567},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false}},{"id":"6927ac3aac195e35286ecaaa","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-11-27T01:41:14.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Agent Learning via Early Experience](https://huggingface.co/papers/2510.08558) (2025)\n* [Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey](https://huggingface.co/papers/2511.09586) (2025)\n* [Scaling Agent Learning via Experience Synthesis](https://huggingface.co/papers/2511.03773) (2025)\n* [Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting](https://huggingface.co/papers/2510.10304) (2025)\n* [CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards](https://huggingface.co/papers/2510.08529) (2025)\n* [CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions](https://huggingface.co/papers/2510.26852) (2025)\n* [Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling](https://huggingface.co/papers/2511.05951) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-11-27T01:41:14.010Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7364323735237122},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2511.19304","authors":[{"_id":"6925274e16eb3a9f13103998","user":{"_id":"65f40e83653c231cbaf7defe","avatarUrl":"/avatars/afa5ce72324112739e539865c9aee26b.svg","isPro":false,"fullname":"Jiayi Zhang","user":"didiforhugface","type":"user"},"name":"Jiayi Zhang","status":"claimed_verified","statusLastChangedAt":"2025-11-25T09:05:29.887Z","hidden":false},{"_id":"6925274e16eb3a9f13103999","name":"Yiran Peng","hidden":false},{"_id":"6925274e16eb3a9f1310399a","user":{"_id":"6621e02cf34ab6caed18e9c6","avatarUrl":"/avatars/15888b2060d1cc56be9fa55fd4b34005.svg","isPro":false,"fullname":"Fanqi Kong","user":"Fancylalala","type":"user"},"name":"Fanqi Kong","status":"claimed_verified","statusLastChangedAt":"2025-11-25T09:05:25.655Z","hidden":false},{"_id":"6925274e16eb3a9f1310399b","user":{"_id":"67c443afb753bd020f9c97d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xbACBNLSopWmN5G1K8h_Y.png","isPro":false,"fullname":"Cheng","user":"YangC777","type":"user"},"name":"Yang Cheng","status":"claimed_verified","statusLastChangedAt":"2025-11-25T09:05:18.820Z","hidden":false},{"_id":"6925274e16eb3a9f1310399c","name":"Yifan Wu","hidden":false},{"_id":"6925274e16eb3a9f1310399d","name":"Zhaoyang Yu","hidden":false},{"_id":"6925274e16eb3a9f1310399e","user":{"_id":"649ea7106282cb41e77760bc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649ea7106282cb41e77760bc/HlWjaqxr03ob93vdKg_LQ.jpeg","isPro":false,"fullname":"Isaac","user":"XiangJinYu","type":"user"},"name":"Jinyu Xiang","status":"claimed_verified","statusLastChangedAt":"2025-11-25T09:05:23.607Z","hidden":false},{"_id":"6925274e16eb3a9f1310399f","user":{"_id":"68a435cc22fdf7356962ccb9","avatarUrl":"/avatars/467f4732ade5f47b42433ff354acdeef.svg","isPro":false,"fullname":"jianhao ruan","user":"Aurorra1123","type":"user"},"name":"Jianhao Ruan","status":"claimed_verified","statusLastChangedAt":"2025-11-25T15:53:17.306Z","hidden":false},{"_id":"6925274e16eb3a9f131039a0","name":"Jinlin Wang","hidden":false},{"_id":"6925274e16eb3a9f131039a1","name":"Maojia Song","hidden":false},{"_id":"6925274e16eb3a9f131039a2","user":{"_id":"6632160088f75d987d1a156f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6632160088f75d987d1a156f/mYlMQfK1BGWeEbOSMmeSb.jpeg","isPro":false,"fullname":"Hongzhang Liu","user":"Alphamasterliu","type":"user"},"name":"HongZhang Liu","status":"claimed_verified","statusLastChangedAt":"2025-11-25T09:05:27.575Z","hidden":false},{"_id":"6925274e16eb3a9f131039a3","name":"Xiangru Tang","hidden":false},{"_id":"6925274e16eb3a9f131039a4","name":"Bang Liu","hidden":false},{"_id":"6925274e16eb3a9f131039a5","name":"Chenglin Wu","hidden":false},{"_id":"6925274e16eb3a9f131039a6","name":"Yuyu Luo","hidden":false}],"publishedAt":"2025-11-24T16:54:23.000Z","submittedOnDailyAt":"2025-11-25T01:26:13.029Z","title":"AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning","submittedOnDailyBy":{"_id":"65f40e83653c231cbaf7defe","avatarUrl":"/avatars/afa5ce72324112739e539865c9aee26b.svg","isPro":false,"fullname":"Jiayi Zhang","user":"didiforhugface","type":"user"},"summary":"Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically demonstrate improvements via self-evolving within a single domain, implicitly assuming a fixed environment distribution. Cross-environment learning has remained largely unmeasured: there is no standard collection of controllable, heterogeneous environments, nor a unified way to represent how agents learn. We address these gaps in two steps. First, we propose AutoEnv, an automated framework that treats environments as factorizable distributions over transitions, observations, and rewards, enabling low-cost (4.12 USD on average) generation of heterogeneous worlds. Using AutoEnv, we construct AutoEnv-36, a dataset of 36 environments with 358 validated levels, on which seven language models achieve 12-49% normalized reward, demonstrating the challenge of AutoEnv-36. Second, we formalize agent learning as a component-centric process driven by three stages of Selection, Optimization, and Evaluation applied to an improvable agent component. Using this formulation, we design eight learning methods and evaluate them on AutoEnv-36. Empirically, the gain of any single learning method quickly decrease as the number of environments increases, revealing that fixed learning methods do not scale across heterogeneous environments. Environment-adaptive selection of learning methods substantially improves performance but exhibits diminishing returns as the method space expands. These results highlight both the necessity and the current limitations of agent learning for scalable cross-environment generalization, and position AutoEnv and AutoEnv-36 as a testbed for studying cross-environment agent learning. The code is avaiable at https://github.com/FoundationAgents/AutoEnv.","upvotes":91,"discussionId":"6925274f16eb3a9f131039a7","githubRepo":"https://github.com/FoundationAgents/AutoEnv","githubRepoAddedBy":"auto","ai_summary":"AutoEnv and AutoEnv-36 provide a standardized framework and dataset for evaluating cross-environment learning in agents, highlighting the challenges and limitations of existing learning methods.","ai_keywords":["AutoEnv","factorizable distributions","heterogeneous environments","AutoEnv-36","language models","normalized reward","component-centric process","Selection","Optimization","Evaluation","learning methods","environment-adaptive selection"],"githubStars":50},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65f40e83653c231cbaf7defe","avatarUrl":"/avatars/afa5ce72324112739e539865c9aee26b.svg","isPro":false,"fullname":"Jiayi Zhang","user":"didiforhugface","type":"user"},{"_id":"6621e02cf34ab6caed18e9c6","avatarUrl":"/avatars/15888b2060d1cc56be9fa55fd4b34005.svg","isPro":false,"fullname":"Fanqi Kong","user":"Fancylalala","type":"user"},{"_id":"65685ef7d0a121b8e81afe2a","avatarUrl":"/avatars/bdf540632ff7e27564353ba4d799f9c9.svg","isPro":false,"fullname":"Evan","user":"Evanwu50020","type":"user"},{"_id":"68916bcc7010947276e5d04b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68916bcc7010947276e5d04b/pBsPjyMswdkyqNYcC8mAn.jpeg","isPro":false,"fullname":"YiRan Peng","user":"amagipeng","type":"user"},{"_id":"67c443afb753bd020f9c97d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xbACBNLSopWmN5G1K8h_Y.png","isPro":false,"fullname":"Cheng","user":"YangC777","type":"user"},{"_id":"68e8e058c2602d5619674d18","avatarUrl":"/avatars/c66942867bd4aabc5e75f1bbce6d3bbe.svg","isPro":false,"fullname":"7","user":"canyon77","type":"user"},{"_id":"68e8df75363c9b3201cf9829","avatarUrl":"/avatars/2f2571e89733d5696c71e0a45595ea52.svg","isPro":false,"fullname":"ilovehzr","user":"ilovehzr","type":"user"},{"_id":"68e8ea07098a95a3b0aca870","avatarUrl":"/avatars/5286dde615b83de47612571099a1c4fc.svg","isPro":false,"fullname":"Jinpengyu","user":"Jinpengyu7","type":"user"},{"_id":"69085e5062d83fda653a7826","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69085e5062d83fda653a7826/e0NreSRF7YBZchjYcLNhN.jpeg","isPro":false,"fullname":"steamedbun","user":"steam-ed-bun","type":"user"},{"_id":"690068a5f3022902d825a65a","avatarUrl":"/avatars/b86895b24ea2183d123a2c1335cece77.svg","isPro":false,"fullname":"logic9817","user":"Logic9817HF","type":"user"},{"_id":"68e8dee9c2602d5619672e3f","avatarUrl":"/avatars/3b66cbdfd9e99a76ba6f893cb6016fc7.svg","isPro":false,"fullname":"ranhongzhe","user":"ruanhongzhe","type":"user"},{"_id":"6601fe539e1cf5eb4126d253","avatarUrl":"/avatars/5ddfa6e42a34ef3ad985e2e6dc2f6fe8.svg","isPro":false,"fullname":"zhitao","user":"wangbulai","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">
AutoEnv and AutoEnv-36 provide a standardized framework and dataset for evaluating cross-environment learning in agents, highlighting the challenges and limitations of existing learning methods.
AI-generated summary
Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically demonstrate improvements via self-evolving within a single domain, implicitly assuming a fixed environment distribution. Cross-environment learning has remained largely unmeasured: there is no standard collection of controllable, heterogeneous environments, nor a unified way to represent how agents learn. We address these gaps in two steps. First, we propose AutoEnv, an automated framework that treats environments as factorizable distributions over transitions, observations, and rewards, enabling low-cost (4.12 USD on average) generation of heterogeneous worlds. Using AutoEnv, we construct AutoEnv-36, a dataset of 36 environments with 358 validated levels, on which seven language models achieve 12-49% normalized reward, demonstrating the challenge of AutoEnv-36. Second, we formalize agent learning as a component-centric process driven by three stages of Selection, Optimization, and Evaluation applied to an improvable agent component. Using this formulation, we design eight learning methods and evaluate them on AutoEnv-36. Empirically, the gain of any single learning method quickly decrease as the number of environments increases, revealing that fixed learning methods do not scale across heterogeneous environments. Environment-adaptive selection of learning methods substantially improves performance but exhibits diminishing returns as the method space expands. These results highlight both the necessity and the current limitations of agent learning for scalable cross-environment generalization, and position AutoEnv and AutoEnv-36 as a testbed for studying cross-environment agent learning. The code is avaiable at https://github.com/FoundationAgents/AutoEnv.
AUTOENV: Automated Environments for Measuring Cross-Environment Agent Learning This paper tackles a gap in current agentic learning work: most “self-improving” agents are evaluated in a single domain, so we don’t really know how well they learn across environments with very different dynamics, observations, and rewards. The authors introduce AUTOENV, a framework that uses LLM-driven code generation plus self-repair to automatically build RL-style environments in three abstraction layers (BaseEnv / ObsEnv / SkinEnv). From 100 themes they obtain AUTOENV-36, a curated set of 36 heterogeneous environments (binary vs accumulative reward, full vs partial observation, aligned vs inverse semantics) where strong LLM agents still only reach 12–49% normalized reward, making it a challenging and discriminative benchmark. On top of this, they propose a component-centric formulation of agent learning as Selection–Optimization–Evaluation over improvable components like prompts and agent code, instantiate 8 concrete learning methods, and define a “learning upper bound” that picks the best method per environment. Experiments show that any single fixed learning strategy quickly breaks down as environment diversity grows, while simple environment-adaptive selection recovers part of the upper bound but still leaves a sizeable gap. The work argues that future progress in agentic learning will require automated design and selection of learning strategies themselves, not just better prompts or models.