Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - LLM-in-Sandbox Elicits General Agentic Intelligence
[go: Go Back, main page]

https://llm-in-sandbox.github.io
šŸ’» Code: https://github.com/llm-in-sandbox/llm-in-sandbox

\n
pip install llm-in-sandbox\n
\n

Feel free to open issues or discussions šŸ¤—

\n","updatedAt":"2026-01-23T02:57:12.316Z","author":{"_id":"649e6761f9134a06ed1e0cea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649e6761f9134a06ed1e0cea/XNeKceE8xSwI0xWwWUwwJ.jpeg","fullname":"Daixuan Cheng","name":"daixuancheng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8085643649101257},"editors":["daixuancheng"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/649e6761f9134a06ed1e0cea/XNeKceE8xSwI0xWwWUwwJ.jpeg"],"reactions":[{"reaction":"šŸ”„","users":["Permahuman","Uday","sugatoray"],"count":3}],"isReport":false}},{"id":"6973cef319551f6faff555e9","author":{"_id":"6960eca92f7ad9b043b5cbe0","avatarUrl":"/avatars/e68dcc7fd04f143d849d40414866e633.svg","fullname":"Noah","name":"noahml","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-01-23T19:41:39.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I created a podcast explaining the key concepts from the paper: \nhttps://researchpod-share.vercel.app/episode/aa22898a-3a1c-406d-b2b0-79d008c522f5","html":"

I created a podcast explaining the key concepts from the paper:
https://researchpod-share.vercel.app/episode/aa22898a-3a1c-406d-b2b0-79d008c522f5

\n","updatedAt":"2026-01-23T19:41:39.006Z","author":{"_id":"6960eca92f7ad9b043b5cbe0","avatarUrl":"/avatars/e68dcc7fd04f143d849d40414866e633.svg","fullname":"Noah","name":"noahml","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7248733639717102},"editors":["noahml"],"editorAvatarUrls":["/avatars/e68dcc7fd04f143d849d40414866e633.svg"],"reactions":[{"reaction":"ā¤ļø","users":["daixuancheng","unilm"],"count":2}],"isReport":false}},{"id":"6974225278465b18dd578f49","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false},"createdAt":"2026-01-24T01:37:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Training Versatile Coding Agents in Synthetic Environments](https://huggingface.co/papers/2512.12216) (2025)\n* [DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models](https://huggingface.co/papers/2512.02556) (2025)\n* [ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration](https://huggingface.co/papers/2511.21689) (2025)\n* [Learning to Orchestrate Agents in Natural Language with the Conductor](https://huggingface.co/papers/2512.04388) (2025)\n* [One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents](https://huggingface.co/papers/2512.20957) (2025)\n* [GTM: Simulating the World of Tools for AI Agents](https://huggingface.co/papers/2512.04535) (2025)\n* [From Failure to Mastery: Generating Hard Samples for Tool-use Agents](https://huggingface.co/papers/2601.01498) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-24T01:37:22.943Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7137067914009094},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"ā¤ļø","users":["daixuancheng"],"count":1}],"isReport":false}},{"id":"6975337dc7e92e2fdd161106","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2026-01-24T21:02:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv explained breakdown of this paper šŸ‘‰ https://arxivexplained.com/papers/llm-in-sandbox-elicits-general-agentic-intelligence\n","html":"

arXiv explained breakdown of this paper šŸ‘‰ https://arxivexplained.com/papers/llm-in-sandbox-elicits-general-agentic-intelligence

\n","updatedAt":"2026-01-24T21:02:53.867Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6913228034973145},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[{"reaction":"ā¤ļø","users":["daixuancheng","unilm"],"count":2}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.16206","authors":[{"_id":"6972e04dfb12c92b735b73cf","user":{"_id":"649e6761f9134a06ed1e0cea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649e6761f9134a06ed1e0cea/XNeKceE8xSwI0xWwWUwwJ.jpeg","isPro":false,"fullname":"Daixuan Cheng","user":"daixuancheng","type":"user"},"name":"Daixuan Cheng","status":"claimed_verified","statusLastChangedAt":"2026-01-23T09:38:28.070Z","hidden":false},{"_id":"6972e04dfb12c92b735b73d0","name":"Shaohan Huang","hidden":false},{"_id":"6972e04dfb12c92b735b73d1","name":"Yuxian Gu","hidden":false},{"_id":"6972e04dfb12c92b735b73d2","name":"Huatong Song","hidden":false},{"_id":"6972e04dfb12c92b735b73d3","name":"Guoxin Chen","hidden":false},{"_id":"6972e04dfb12c92b735b73d4","user":{"_id":"5df85abada6d0311fd3d5408","avatarUrl":"/avatars/2331cf703c1b5d3a62e2050b1a6eb108.svg","isPro":false,"fullname":"Li Dong","user":"unilm","type":"user"},"name":"Li Dong","status":"claimed_verified","statusLastChangedAt":"2026-01-26T08:32:53.463Z","hidden":false},{"_id":"6972e04dfb12c92b735b73d5","name":"Wayne Xin Zhao","hidden":false},{"_id":"6972e04dfb12c92b735b73d6","name":"Ji-Rong Wen","hidden":false},{"_id":"6972e04dfb12c92b735b73d7","name":"Furu Wei","hidden":false}],"publishedAt":"2026-01-22T18:57:09.000Z","submittedOnDailyAt":"2026-01-23T00:27:12.305Z","title":"LLM-in-Sandbox Elicits General Agentic Intelligence","submittedOnDailyBy":{"_id":"649e6761f9134a06ed1e0cea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649e6761f9134a06ed1e0cea/XNeKceE8xSwI0xWwWUwwJ.jpeg","isPro":false,"fullname":"Daixuan Cheng","user":"daixuancheng","type":"user"},"summary":"We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.","upvotes":84,"discussionId":"6972e04dfb12c92b735b73d8","projectPage":"https://llm-in-sandbox.github.io","githubRepo":"https://github.com/llm-in-sandbox/llm-in-sandbox","githubRepoAddedBy":"user","ai_summary":"LLM-in-Sandbox enables large language models to perform general intelligence tasks across diverse domains by allowing them to explore a code sandbox environment, achieving robust generalization without additional training.","ai_keywords":["LLM-in-Sandbox","code sandbox","virtual computer","reinforcement learning","non-agentic data","sandbox exploration","general intelligence","long-context understanding","instruction following"],"githubStars":177,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"649e6761f9134a06ed1e0cea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649e6761f9134a06ed1e0cea/XNeKceE8xSwI0xWwWUwwJ.jpeg","isPro":false,"fullname":"Daixuan Cheng","user":"daixuancheng","type":"user"},{"_id":"650801ced5578ef7e20b33d4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/650801ced5578ef7e20b33d4/oLptSnKMecbu62EgglmO6.png","isPro":false,"fullname":"AdaptLLM","user":"AdaptLLM","type":"user"},{"_id":"66711d2ee12fa6cc5f5dfc89","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66711d2ee12fa6cc5f5dfc89/uOzD5ztCzmexXZF24UVxh.png","isPro":false,"fullname":"instruction-pretrain","user":"instruction-pretrain","type":"user"},{"_id":"624ac662102fcdff87be51b9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/624ac662102fcdff87be51b9/rzNahZFFkp194170tactJ.jpeg","isPro":false,"fullname":"Yuxian Gu","user":"t1101675","type":"user"},{"_id":"647ffddeb82adfa7cc1a10d9","avatarUrl":"/avatars/26aa168d6b2068298ebb16584aa52b6c.svg","isPro":false,"fullname":"zhu","user":"xuekai","type":"user"},{"_id":"63f06116f1a47aaea5bd497b","avatarUrl":"/avatars/7d99ffa59c4579599e852a0ffb261268.svg","isPro":false,"fullname":"Guoxin Chen","user":"GuoxinChen","type":"user"},{"_id":"66a72d5f25bcf18873ea202e","avatarUrl":"/avatars/80c458feabfd4c1040d50d254fb23159.svg","isPro":false,"fullname":"TTTTT","user":"TTTT-TTTT","type":"user"},{"_id":"632bd2f72d6a805eeb4bc601","avatarUrl":"/avatars/6e1533e8a599f3068290aa69ac82cab7.svg","isPro":false,"fullname":"HUANG SHAOHAN","user":"buaahsh","type":"user"},{"_id":"6342796a0875f2c99cfd313b","avatarUrl":"/avatars/98575092404c4197b20c929a6499a015.svg","isPro":false,"fullname":"Yuseung \"Phillip\" Lee","user":"phillipinseoul","type":"user"},{"_id":"6972e75fb2266860477c147e","avatarUrl":"/avatars/2ae8d458c8fb1ecdd37ae517e778f708.svg","isPro":false,"fullname":"zhuxl","user":"zhuzhu00","type":"user"},{"_id":"5df85abada6d0311fd3d5408","avatarUrl":"/avatars/2331cf703c1b5d3a62e2050b1a6eb108.svg","isPro":false,"fullname":"Li Dong","user":"unilm","type":"user"},{"_id":"622474f38dc6b0b64f5e903d","avatarUrl":"/avatars/d6b60a014277a8ec7d564163c5f644aa.svg","isPro":false,"fullname":"Yuxin Zuo","user":"yuxinzuo","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}}">
Papers
arxiv:2601.16206

LLM-in-Sandbox Elicits General Agentic Intelligence

Published on Jan 22
Ā· Submitted by
Daixuan Cheng
on Jan 23
#1 Paper of the day
Authors:
,
,
,
,
,
,

Abstract

LLM-in-Sandbox enables large language models to perform general intelligence tasks across diverse domains by allowing them to explore a code sandbox environment, achieving robust generalization without additional training.

AI-generated summary

We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.

Community

Paper author Paper submitter

Introducing LLM-in-Sandbox — put your LLM in a virtual computer to unlock general agentic intelligence for non-code tasks!

Significant gains for chemistry, long-context QA, instruction following, and more. No extra training needed.

🌐 Demo: https://llm-in-sandbox.github.io
šŸ’» Code: https://github.com/llm-in-sandbox/llm-in-sandbox

pip install llm-in-sandbox

Feel free to open issues or discussions šŸ¤—

I created a podcast explaining the key concepts from the paper:
https://researchpod-share.vercel.app/episode/aa22898a-3a1c-406d-b2b0-79d008c522f5

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.16206 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.16206 in a Space README.md to link it from this page.

Collections including this paper 15