Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents (2026)
SWE-World: Building Software Engineering Agents in Docker-Free Environments (2026)
daVinci-Dev: Agent-native Mid-training for Software Engineering (2026)
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering (2026)
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration (2026)
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development (2026)
EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration (2026)

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-13T01:43:11.659Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7210378050804138},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.10999","authors":[{"_id":"698d437e65c0d15a6d162132","name":"Yusong Lin","hidden":false},{"_id":"698d437e65c0d15a6d162133","user":{"_id":"65f43c3cc9940817caaf4434","avatarUrl":"/avatars/ecec2856ba7a7d3421a2071a0a88800b.svg","isPro":false,"fullname":"Haiyang Wang","user":"Haiyang-W","type":"user"},"name":"Haiyang Wang","status":"claimed_verified","statusLastChangedAt":"2026-02-12T13:28:33.349Z","hidden":false},{"_id":"698d437e65c0d15a6d162134","name":"Shuzhe Wu","hidden":false},{"_id":"698d437e65c0d15a6d162135","name":"Lue Fan","hidden":false},{"_id":"698d437e65c0d15a6d162136","name":"Feiyang Pan","hidden":false},{"_id":"698d437e65c0d15a6d162137","name":"Sanyuan Zhao","hidden":false},{"_id":"698d437e65c0d15a6d162138","name":"Dandan Tu","hidden":false}],"publishedAt":"2026-02-11T16:22:18.000Z","submittedOnDailyAt":"2026-02-12T00:35:37.899Z","title":"CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Agentic coding requires agents to effectively interact with runtime environments, e.g., command line interfaces (CLI), so as to complete tasks like resolving dependency issues, fixing system problems, etc. But it remains underexplored how such environment-intensive tasks can be obtained at scale to enhance agents' capabilities. To address this, based on an analogy between the Dockerfile and the agentic task, we propose to employ agents to simulate and explore environment histories, guided by execution feedback. By tracing histories of a healthy environment, its state can be inverted to an earlier one with runtime failures, from which a task can be derived by packing the buggy state and the corresponding error messages. With our method, named CLI-Gym, a total of 1,655 environment-intensive tasks are derived, being the largest collection of its kind. Moreover, with curated successful trajectories, our fine-tuned model, named LiberCoder, achieves substantial absolute improvements of +21.1% (to 46.1%) on Terminal-Bench, outperforming various strong baselines. To our knowledge, this is the first public pipeline for scalable derivation of environment-intensive tasks.","upvotes":10,"discussionId":"698d437e65c0d15a6d162139","githubRepo":"https://github.com/LiberCoders/CLI-Gym","githubRepoAddedBy":"user","ai_summary":"CLI-Gym enables scalable derivation of environment-intensive tasks by simulating and exploring environment histories, while LiberCoder achieves significant performance improvements on Terminal-Bench through fine-tuning.","ai_keywords":["CLI-Gym","LiberCoder","environment-intensive tasks","Dockerfile","agentic task","execution feedback","Terminal-Bench"],"githubStars":10},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646b43deb1202bc77c1024a4","avatarUrl":"/avatars/cf791574ab986bac274e7fbcf04e2a59.svg","isPro":false,"fullname":"hangyu guo","user":"Rosiness","type":"user"},{"_id":"6868f58a4757672a6da7c417","avatarUrl":"/avatars/73154b7e0f1af68054b97f10a6c2e670.svg","isPro":false,"fullname":"JiaCheng Zhang","user":"jiachengzhg","type":"user"},{"_id":"649ecf9827145c4463240177","avatarUrl":"/avatars/27696cf31790a3d58d8be2e0c983800e.svg","isPro":false,"fullname":"Lue Fan","user":"Abyssaledge","type":"user"},{"_id":"695f8d5d9e778cf056e56017","avatarUrl":"/avatars/a1ec00149303e4c69e43821d2ee43218.svg","isPro":false,"fullname":"Xiaoche","user":"x1aoche","type":"user"},{"_id":"639a8f29b2740bf1474e82c1","avatarUrl":"/avatars/306ac149819c80b66386e4a719662130.svg","isPro":false,"fullname":"Hongbo Wang","user":"Larer","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"65f43c3cc9940817caaf4434","avatarUrl":"/avatars/ecec2856ba7a7d3421a2071a0a88800b.svg","isPro":false,"fullname":"Haiyang Wang","user":"Haiyang-W","type":"user"},{"_id":"666aa99cd1652853e4f9a8b9","avatarUrl":"/avatars/7cd5a0c34b5ccb8eff5a353d88d15a93.svg","isPro":false,"fullname":"HanXiao","user":"HanXiao1999","type":"user"},{"_id":"684d57f26e04c265777ead3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/cuOj-bQqukSZreXgUJlfm.png","isPro":false,"fullname":"Joakim Lee","user":"Reinforcement4All","type":"user"},{"_id":"698e5f796150030676e603d9","avatarUrl":"/avatars/a6e51a0fbe6a193e5277899827e54c75.svg","isPro":false,"fullname":"Anna-MariaHernandez","user":"goosemaria149","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2602.10999

CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

Published on Feb 11

· Submitted by

taesiri on Feb 12

Upvote

Authors:

Haiyang Wang ,

Abstract

CLI-Gym enables scalable derivation of environment-intensive tasks by simulating and exploring environment histories, while LiberCoder achieves significant performance improvements on Terminal-Bench through fine-tuning.

AI-generated summary

Agentic coding requires agents to effectively interact with runtime environments, e.g., command line interfaces (CLI), so as to complete tasks like resolving dependency issues, fixing system problems, etc. But it remains underexplored how such environment-intensive tasks can be obtained at scale to enhance agents' capabilities. To address this, based on an analogy between the Dockerfile and the agentic task, we propose to employ agents to simulate and explore environment histories, guided by execution feedback. By tracing histories of a healthy environment, its state can be inverted to an earlier one with runtime failures, from which a task can be derived by packing the buggy state and the corresponding error messages. With our method, named CLI-Gym, a total of 1,655 environment-intensive tasks are derived, being the largest collection of its kind. Moreover, with curated successful trajectories, our fine-tuned model, named LiberCoder, achieves substantial absolute improvements of +21.1% (to 46.1%) on Terminal-Bench, outperforming various strong baselines. To our knowledge, this is the first public pipeline for scalable derivation of environment-intensive tasks.