Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-02-03T01:41:24.037Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6887815594673157},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["yrshi"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.21468","authors":[{"_id":"697c7da1a67238fac88cc2d0","user":{"_id":"63edd2d1f765928ceeb49057","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676530369930-noauth.png","isPro":false,"fullname":"Yaorui SHI","user":"yrshi","type":"user"},"name":"Yaorui Shi","status":"claimed_verified","statusLastChangedAt":"2026-02-04T12:33:57.232Z","hidden":false},{"_id":"697c7da1a67238fac88cc2d1","name":"Shugui Liu","hidden":false},{"_id":"697c7da1a67238fac88cc2d2","name":"Yu Yang","hidden":false},{"_id":"697c7da1a67238fac88cc2d3","name":"Wenyu Mao","hidden":false},{"_id":"697c7da1a67238fac88cc2d4","name":"Yuxin Chen","hidden":false},{"_id":"697c7da1a67238fac88cc2d5","name":"Qi GU","hidden":false},{"_id":"697c7da1a67238fac88cc2d6","name":"Hui Su","hidden":false},{"_id":"697c7da1a67238fac88cc2d7","name":"Xunliang Cai","hidden":false},{"_id":"697c7da1a67238fac88cc2d8","name":"Xiang Wang","hidden":false},{"_id":"697c7da1a67238fac88cc2d9","name":"An Zhang","hidden":false}],"publishedAt":"2026-01-29T09:47:17.000Z","submittedOnDailyAt":"2026-02-02T01:45:23.265Z","title":"MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning","submittedOnDailyBy":{"_id":"63edd2d1f765928ceeb49057","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676530369930-noauth.png","isPro":false,"fullname":"Yaorui SHI","user":"yrshi","type":"user"},"summary":"Long-horizon agentic reasoning necessitates effectively compressing growing interaction histories into a limited context window. Most existing memory systems serialize history as text, where token-level cost is uniform and scales linearly with length, often spending scarce budget on low-value details. To this end, we introduce MemOCR, a multimodal memory agent that improves long-horizon reasoning under tight context budgets by allocating memory space with adaptive information density through visual layout. Concretely, MemOCR maintains a structured rich-text memory (e.g., headings, highlights) and renders it into an image that the agent consults for memory access, visually prioritizing crucial evidence while aggressively compressing auxiliary details. To ensure robustness across varying memory budgets, we train MemOCR with reinforcement learning under budget-aware objectives that expose the agent to diverse compression levels. Across long-context multi-hop and single-hop question-answering benchmarks, MemOCR outperforms strong text-based baselines and achieves more effective context utilization under extreme budgets.","upvotes":25,"discussionId":"697c7da1a67238fac88cc2da","projectPage":"https://github.com/syr-cn/MemOCR","githubRepo":"https://github.com/syr-cn/MemOCR","githubRepoAddedBy":"user","ai_summary":"MemOCR is a multimodal memory agent that enhances long-horizon reasoning by adaptively compressing interaction histories into visual layouts, enabling efficient context utilization under tight budget constraints.","ai_keywords":["memory systems","context window","visual layout","structured rich-text memory","reinforcement learning","context utilization","long-horizon reasoning"],"githubStars":8},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63edd2d1f765928ceeb49057","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676530369930-noauth.png","isPro":false,"fullname":"Yaorui SHI","user":"yrshi","type":"user"},{"_id":"66a5db1c94d2b190a23c9a46","avatarUrl":"/avatars/e2caafde5b9383d63d9917d9476c9ee3.svg","isPro":false,"fullname":"Yuqi Yin","user":"chitanda-eru","type":"user"},{"_id":"6495b0b844bc2e9ce6cc849b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/j6aucl_tefMHwtD-bdUAw.jpeg","isPro":false,"fullname":"Siyuan Wang","user":"OldKingMeister","type":"user"},{"_id":"6629d7c9fa14eaccf07d8633","avatarUrl":"/avatars/dceb2f6c804c583adf15a3536c8c995b.svg","isPro":false,"fullname":"Nan Chen","user":"CNcreator0331","type":"user"},{"_id":"646dbba74ad7f907279dd486","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646dbba74ad7f907279dd486/XGHJMFEIpWeDlh0cn9Slu.png","isPro":false,"fullname":"Mingxuan Du","user":"Ayanami0730","type":"user"},{"_id":"6434b6619bd5a84b5dcfa4de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6434b6619bd5a84b5dcfa4de/h8Q6kPNjFNc03wmdboHzq.jpeg","isPro":true,"fullname":"Young-Jun Lee","user":"passing2961","type":"user"},{"_id":"67247adb73d1eb17b6bfd27c","avatarUrl":"/avatars/57bdbb7362f9854c87dd0a71ae071652.svg","isPro":false,"fullname":"Zefeng He","user":"yhx12","type":"user"},{"_id":"62f662bcc58915315c4eccea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f662bcc58915315c4eccea/zOAQLONfMP88zr70sxHK-.jpeg","isPro":true,"fullname":"Yilun Zhao","user":"yilunzhao","type":"user"},{"_id":"668bb539fc68877e6d429594","avatarUrl":"/avatars/23d0fde8c21724c0992faaebfbd8f356.svg","isPro":false,"fullname":"tanguy launay","user":"tlaunay","type":"user"},{"_id":"6708edcae69f6e30a816af9f","avatarUrl":"/avatars/c4daa9b0cb2f4bb2a7db0e78b22034cb.svg","isPro":false,"fullname":"Yao","user":"distant-yuan","type":"user"},{"_id":"68ecf0768ac20a4da83744d0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/1GVwRHtJFJGMYn2sEdMnG.png","isPro":false,"fullname":"Qi, GU","user":"guqi03","type":"user"},{"_id":"66f547c5405af2cd3256ba27","avatarUrl":"/avatars/fc52bac1785191af0eb9a1d108c4397e.svg","isPro":false,"fullname":"JP Zhu","user":"JPZhu","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
MemOCR is a multimodal memory agent that enhances long-horizon reasoning by adaptively compressing interaction histories into visual layouts, enabling efficient context utilization under tight budget constraints.
AI-generated summary
Long-horizon agentic reasoning necessitates effectively compressing growing interaction histories into a limited context window. Most existing memory systems serialize history as text, where token-level cost is uniform and scales linearly with length, often spending scarce budget on low-value details. To this end, we introduce MemOCR, a multimodal memory agent that improves long-horizon reasoning under tight context budgets by allocating memory space with adaptive information density through visual layout. Concretely, MemOCR maintains a structured rich-text memory (e.g., headings, highlights) and renders it into an image that the agent consults for memory access, visually prioritizing crucial evidence while aggressively compressing auxiliary details. To ensure robustness across varying memory budgets, we train MemOCR with reinforcement learning under budget-aware objectives that expose the agent to diverse compression levels. Across long-context multi-hop and single-hop question-answering benchmarks, MemOCR outperforms strong text-based baselines and achieves more effective context utilization under extreme budgets.
MemOCR is a multimodal memory agent that enhances long-horizon reasoning by adaptively compressing interaction histories into visual layouts, enabling efficient context utilization under tight budget constraints.