Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - AgentOCR: Reimagining Agent History via Optical Self-Compression
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-01-13T01:39:51.631Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6961530447006226},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"696b8dfde7a76925b9373a51","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-01-17T13:26:21.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/agentocr-reimagining-agent-history-via-optical-self-compression-3792-8cb0fe47\n\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"
\n","updatedAt":"2026-01-17T13:26:21.350Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6177768111228943},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.04786","authors":[{"_id":"6964788b138cc47cbd76533c","user":{"_id":"66ba29dd59e8e7a957154c5f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ba29dd59e8e7a957154c5f/VvVS7IZNPUIB023GAEf5u.png","isPro":false,"fullname":"Lang Feng","user":"langfeng01","type":"user"},"name":"Lang Feng","status":"claimed_verified","statusLastChangedAt":"2026-01-12T10:33:41.004Z","hidden":false},{"_id":"6964788b138cc47cbd76533d","name":"Fuchao Yang","hidden":false},{"_id":"6964788b138cc47cbd76533e","name":"Feng Chen","hidden":false},{"_id":"6964788b138cc47cbd76533f","name":"Xin Cheng","hidden":false},{"_id":"6964788b138cc47cbd765340","user":{"_id":"645b10e80c73ea27d13f7aca","avatarUrl":"/avatars/95e565306472a15067440b5b43e07a6f.svg","isPro":false,"fullname":"xuhaiyang","user":"xhyandwyy","type":"user"},"name":"Haiyang Xu","status":"claimed_verified","statusLastChangedAt":"2026-01-12T10:33:39.142Z","hidden":false},{"_id":"6964788b138cc47cbd765341","name":"Zhenglin Wan","hidden":false},{"_id":"6964788b138cc47cbd765342","name":"Ming Yan","hidden":false},{"_id":"6964788b138cc47cbd765343","name":"Bo An","hidden":false}],"publishedAt":"2026-01-08T10:10:20.000Z","submittedOnDailyAt":"2026-01-12T02:18:55.645Z","title":"AgentOCR: Reimagining Agent History via Optical Self-Compression","submittedOnDailyBy":{"_id":"66ba29dd59e8e7a957154c5f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ba29dd59e8e7a957154c5f/VvVS7IZNPUIB023GAEf5u.png","isPro":false,"fullname":"Lang Feng","user":"langfeng01","type":"user"},"summary":"Recent advances in large language models (LLMs) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked by rapidly growing textual histories that inflate token budgets and memory usage. We introduce AgentOCR, a framework that exploits the superior information density of visual tokens by representing the accumulated observation-action history as a compact rendered image. To make multi-turn rollouts scalable, AgentOCR proposes segment optical caching. By decomposing history into hashable segments and maintaining a visual cache, this mechanism eliminates redundant re-rendering. Beyond fixed rendering, AgentOCR introduces agentic self-compression, where the agent actively emits a compression rate and is trained with compression-aware reward to adaptively balance task success and token efficiency. We conduct extensive experiments on challenging agentic benchmarks, ALFWorld and search-based QA. Remarkably, results demonstrate that AgentOCR preserves over 95\\% of text-based agent performance while substantially reducing token consumption (>50\\%), yielding consistent token and memory efficiency. Our further analysis validates a 20x rendering speedup from segment optical caching and the effective strategic balancing of self-compression.","upvotes":30,"discussionId":"6964788b138cc47cbd765344","ai_summary":"AgentOCR reduces token consumption in agentic systems by representing interaction history as visual tokens and employing visual caching and self-compression techniques.","ai_keywords":["large language models","reinforcement learning","multi-turn interaction trajectories","visual tokens","segment optical caching","agentic self-compression","token efficiency","rendering speedup"],"organization":{"_id":"6508b28cf36bb51c50faad98","name":"NanyangTechnologicalUniversity","fullname":"Nanyang Technological University","avatar":"https://cdn-uploads.huggingface.co/production/uploads/630ca0817dacb93b33506ce7/ZPD1fvei0bcIGeDXxeSkn.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6728558fc5e9db9a8c054fff","avatarUrl":"/avatars/83635ae0fbe52331a4560b1350f69c7e.svg","isPro":false,"fullname":"Jack Chen","user":"Chen-Jack","type":"user"},{"_id":"67165da0108f14aaeb19e35c","avatarUrl":"/avatars/aca030c93b29b9cd54f47a0ae94c22de.svg","isPro":false,"fullname":"Fuchao Yang","user":"yangfc","type":"user"},{"_id":"66ba29dd59e8e7a957154c5f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ba29dd59e8e7a957154c5f/VvVS7IZNPUIB023GAEf5u.png","isPro":false,"fullname":"Lang Feng","user":"langfeng01","type":"user"},{"_id":"6594db22e9d0cd61bc7b7de8","avatarUrl":"/avatars/3ef34c3d3db195b90e438a77a9efed37.svg","isPro":false,"fullname":"Zhenglin Wan","user":"Carlos133386","type":"user"},{"_id":"6660319b253289136b63b219","avatarUrl":"/avatars/19fb26e4d3f8f39487a583b9b887b9a8.svg","isPro":false,"fullname":"XinCheng","user":"ZERO9215","type":"user"},{"_id":"683efe968ca7afd0ec40097e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/BMqnMwl_Ns-Sfc54ZsN-h.png","isPro":false,"fullname":"zhang","user":"Lin-0106","type":"user"},{"_id":"63edd2d1f765928ceeb49057","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676530369930-noauth.png","isPro":false,"fullname":"Yaorui SHI","user":"yrshi","type":"user"},{"_id":"664395621b88258a527cd7d1","avatarUrl":"/avatars/8489ccebe4fd1262679ba63a5cb50bb8.svg","isPro":false,"fullname":"Kira","user":"Kira-wang","type":"user"},{"_id":"645b10e80c73ea27d13f7aca","avatarUrl":"/avatars/95e565306472a15067440b5b43e07a6f.svg","isPro":false,"fullname":"xuhaiyang","user":"xhyandwyy","type":"user"},{"_id":"671115df5a5e17e587983ca1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/RnNsCnUDqfauN1ajzHxnC.png","isPro":false,"fullname":"Sun","user":"Agiao123","type":"user"},{"_id":"64fb128552e82dd432682b06","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64fb128552e82dd432682b06/GYcOiwa4R3RrgcM2tSuV_.png","isPro":false,"fullname":"Zhaoyang Chu","user":"chuzy","type":"user"},{"_id":"64771cfdd7cf39f2e9381aa9","avatarUrl":"/avatars/48adf00c3b653df02628f80511639e19.svg","isPro":false,"fullname":"Ming","user":"MingYan123","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"6508b28cf36bb51c50faad98","name":"NanyangTechnologicalUniversity","fullname":"Nanyang Technological University","avatar":"https://cdn-uploads.huggingface.co/production/uploads/630ca0817dacb93b33506ce7/ZPD1fvei0bcIGeDXxeSkn.png"}}">
AgentOCR reduces token consumption in agentic systems by representing interaction history as visual tokens and employing visual caching and self-compression techniques.
AI-generated summary
Recent advances in large language models (LLMs) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked by rapidly growing textual histories that inflate token budgets and memory usage. We introduce AgentOCR, a framework that exploits the superior information density of visual tokens by representing the accumulated observation-action history as a compact rendered image. To make multi-turn rollouts scalable, AgentOCR proposes segment optical caching. By decomposing history into hashable segments and maintaining a visual cache, this mechanism eliminates redundant re-rendering. Beyond fixed rendering, AgentOCR introduces agentic self-compression, where the agent actively emits a compression rate and is trained with compression-aware reward to adaptively balance task success and token efficiency. We conduct extensive experiments on challenging agentic benchmarks, ALFWorld and search-based QA. Remarkably, results demonstrate that AgentOCR preserves over 95\% of text-based agent performance while substantially reducing token consumption (>50\%), yielding consistent token and memory efficiency. Our further analysis validates a 20x rendering speedup from segment optical caching and the effective strategic balancing of self-compression.
We’re introducing AgentOCR, a new way to scale LLM agents by reimagining long interaction histories as compact rendered images, leveraging the higher information density of visual tokens to curb exploding context costs. To make long-horizon rollouts practical, we add segment optical caching, splitting history into hashable segments and caching the visuals, so agents avoid redundant re-rendering as trajectories grow. We go beyond fixed compression with agentic self-compression: the agent actively emits a compression rate and is trained with a compression-aware reward to balance task success against token efficiency. Across ALFWorld and search-based QA, AgentOCR keeps >95% of text-agent performance while cutting token use by >50% average and ~80% in peak, and our analysis shows up to a 20× rendering speedup thanks to our segment optical caching