Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling
https://github.com/taco-group/agent-banana\n","updatedAt":"2026-02-11T02:41:45.952Z","author":{"_id":"62548d5fef3debb2ddf91217","avatarUrl":"/avatars/14975b45568f9c399c92c3986b6ce83e.svg","fullname":"Zhengzhong Tu","name":"vztu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5671966671943665},"editors":["vztu"],"editorAvatarUrls":["/avatars/14975b45568f9c399c92c3986b6ce83e.svg"],"reactions":[{"reaction":"🔥","users":["Franck-Dernoncourt"],"count":1}],"isReport":false}},{"id":"698d2ff4c98ef9ef0a7a07c5","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-12T01:42:12.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing](https://huggingface.co/papers/2601.03741) (2026)\n* [TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering](https://huggingface.co/papers/2512.16270) (2025)\n* [SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing](https://huggingface.co/papers/2512.14140) (2025)\n* [MiLDEdit: Reasoning-Based Multi-Layer Design Document Editing](https://huggingface.co/papers/2601.04589) (2026)\n* [Agentic Retoucher for Text-To-Image Generation](https://huggingface.co/papers/2601.02046) (2026)\n* [VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis](https://huggingface.co/papers/2512.19243) (2025)\n* [EZBlender: Efficient 3D Editing with Plan-and-ReAct Agent](https://huggingface.co/papers/2601.07143) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-02-12T01:42:12.502Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.725027322769165},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.09084","authors":[{"_id":"698be8996052d3bed96309ac","name":"Ruijie Ye","hidden":false},{"_id":"698be8996052d3bed96309ad","name":"Jiayi Zhang","hidden":false},{"_id":"698be8996052d3bed96309ae","name":"Zhuoxin Liu","hidden":false},{"_id":"698be8996052d3bed96309af","user":{"_id":"66dd321b41074b6a3df723d4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yeFbjcbmTT0U3c7i0eTdZ.jpeg","isPro":false,"fullname":"Zihao Zhu","user":"SingleBicycle","type":"user"},"name":"Zihao Zhu","status":"claimed_verified","statusLastChangedAt":"2026-02-11T11:15:16.736Z","hidden":false},{"_id":"698be8996052d3bed96309b0","name":"Siyuan Yang","hidden":false},{"_id":"698be8996052d3bed96309b1","name":"Li Li","hidden":false},{"_id":"698be8996052d3bed96309b2","name":"Tianfu Fu","hidden":false},{"_id":"698be8996052d3bed96309b3","user":{"_id":"62c5947524171688a9feb992","avatarUrl":"/avatars/5a151713b9eae8dc566f5957acee3475.svg","isPro":false,"fullname":"Franck Dernoncourt","user":"Franck-Dernoncourt","type":"user"},"name":"Franck Dernoncourt","status":"claimed_verified","statusLastChangedAt":"2026-02-17T15:50:37.345Z","hidden":false},{"_id":"698be8996052d3bed96309b4","name":"Yue Zhao","hidden":false},{"_id":"698be8996052d3bed96309b5","name":"Jiacheng Zhu","hidden":false},{"_id":"698be8996052d3bed96309b6","name":"Ryan Rossi","hidden":false},{"_id":"698be8996052d3bed96309b7","name":"Wenhao Chai","hidden":false},{"_id":"698be8996052d3bed96309b8","name":"Zhengzhong Tu","hidden":false}],"publishedAt":"2026-02-09T18:59:18.000Z","submittedOnDailyAt":"2026-02-11T00:04:02.116Z","title":"Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling","submittedOnDailyBy":{"_id":"62548d5fef3debb2ddf91217","avatarUrl":"/avatars/14975b45568f9c399c92c3986b6ce83e.svg","isPro":false,"fullname":"Zhengzhong Tu","user":"vztu","type":"user"},"summary":"We study instruction-based image editing under professional workflows and identify three persistent challenges: (i) editors often over-edit, modifying content beyond the user's intent; (ii) existing models are largely single-turn, while multi-turn edits can alter object faithfulness; and (iii) evaluation at around 1K resolution is misaligned with real workflows that often operate on ultra high-definition images (e.g., 4K). We propose Agent Banana, a hierarchical agentic planner-executor framework for high-fidelity, object-aware, deliberative editing. Agent Banana introduces two key mechanisms: (1) Context Folding, which compresses long interaction histories into structured memory for stable long-horizon control; and (2) Image Layer Decomposition, which performs localized layer-based edits to preserve non-target regions while enabling native-resolution outputs. To support rigorous evaluation, we build HDD-Bench, a high-definition, dialogue-based benchmark featuring verifiable stepwise targets and native 4K images (11.8M pixels) for diagnosing long-horizon failures. On HDD-Bench, Agent Banana achieves the best multi-turn consistency and background fidelity (e.g., IC 0.871, SSIM-OM 0.84, LPIPS-OM 0.12) while remaining competitive on instruction following, and also attains strong performance on standard single-turn editing benchmarks. We hope this work advances reliable, professional-grade agentic image editing and its integration into real workflows.","upvotes":27,"discussionId":"698be89a6052d3bed96309b9","projectPage":"https://agent-banana.github.io/","githubRepo":"https://github.com/taco-group/agent-banana","githubRepoAddedBy":"user","ai_summary":"Agent Banana addresses challenges in instruction-based image editing through a hierarchical framework with context folding and image layer decomposition for high-fidelity, multi-turn editing at ultra-high resolution.","ai_keywords":["agentic planner-executor framework","context folding","image layer decomposition","multi-turn editing","high-fidelity editing","object-aware editing","deliberative editing","HDD-Bench","dialogue-based benchmark","native-resolution outputs","long-horizon control","stepwise targets"],"githubStars":38,"organization":{"_id":"693049768605dfa68334b46d","name":"TexasAMUniversity","fullname":"Texas A&M University","avatar":"https://cdn-uploads.huggingface.co/production/uploads/68e396f2b5bb631e9b2fac9a/uv9z1cu15X7vyo70DW0tH.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"684cab3fc1303dc17096c1d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Y2FeSP6wKEMFGGpgMmZLj.png","isPro":false,"fullname":"Fangzhou Lin","user":"Arklin123","type":"user"},{"_id":"6736d289c8a9bf8f86936201","avatarUrl":"/avatars/ca56298f9db458ba65c469b1baabda2c.svg","isPro":false,"fullname":"MingyangWu","user":"mingyang-wu","type":"user"},{"_id":"68255bb2ca547a948367d97c","avatarUrl":"/avatars/85946a7dc31a661e6bca986b2e87ff70.svg","isPro":false,"fullname":"PERSONA","user":"PERSONABench","type":"user"},{"_id":"637c7503fe115289cfecbe6b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676361945047-637c7503fe115289cfecbe6b.jpeg","isPro":false,"fullname":"Wenhao Chai","user":"wchai","type":"user"},{"_id":"66260d07c8920ec35135e3cc","avatarUrl":"/avatars/aead72b72c655bc78a54aad8a349e98e.svg","isPro":false,"fullname":"ZhuoxinLiu","user":"hulalalala","type":"user"},{"_id":"65dd9cdfda7f517db82f2c55","avatarUrl":"/avatars/81d53f6e4d15554db5ff0c0b66496a1b.svg","isPro":false,"fullname":"Xianshun Jiang","user":"KanadeJiang","type":"user"},{"_id":"66dd321b41074b6a3df723d4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yeFbjcbmTT0U3c7i0eTdZ.jpeg","isPro":false,"fullname":"Zihao Zhu","user":"SingleBicycle","type":"user"},{"_id":"631d955cb9a0b2ff1cb945a4","avatarUrl":"/avatars/14956fcecc130543d3d172395cb22cfc.svg","isPro":false,"fullname":"Reedyoung","user":"reedyang","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"67d6935ce7df2a66a39d300b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/GGrJPTJkhPIyPb0KCOwyE.png","isPro":false,"fullname":"jerry ye","user":"jerryye123","type":"user"},{"_id":"698bf05abc9a9b695781a684","avatarUrl":"/avatars/067f368659cb84ca390f1a1cf2f13253.svg","isPro":false,"fullname":"Jiayi Zhang","user":"JiayiZhang14","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"693049768605dfa68334b46d","name":"TexasAMUniversity","fullname":"Texas A&M University","avatar":"https://cdn-uploads.huggingface.co/production/uploads/68e396f2b5bb631e9b2fac9a/uv9z1cu15X7vyo70DW0tH.png"}}">
Agent Banana addresses challenges in instruction-based image editing through a hierarchical framework with context folding and image layer decomposition for high-fidelity, multi-turn editing at ultra-high resolution.
AI-generated summary
We study instruction-based image editing under professional workflows and identify three persistent challenges: (i) editors often over-edit, modifying content beyond the user's intent; (ii) existing models are largely single-turn, while multi-turn edits can alter object faithfulness; and (iii) evaluation at around 1K resolution is misaligned with real workflows that often operate on ultra high-definition images (e.g., 4K). We propose Agent Banana, a hierarchical agentic planner-executor framework for high-fidelity, object-aware, deliberative editing. Agent Banana introduces two key mechanisms: (1) Context Folding, which compresses long interaction histories into structured memory for stable long-horizon control; and (2) Image Layer Decomposition, which performs localized layer-based edits to preserve non-target regions while enabling native-resolution outputs. To support rigorous evaluation, we build HDD-Bench, a high-definition, dialogue-based benchmark featuring verifiable stepwise targets and native 4K images (11.8M pixels) for diagnosing long-horizon failures. On HDD-Bench, Agent Banana achieves the best multi-turn consistency and background fidelity (e.g., IC 0.871, SSIM-OM 0.84, LPIPS-OM 0.12) while remaining competitive on instruction following, and also attains strong performance on standard single-turn editing benchmarks. We hope this work advances reliable, professional-grade agentic image editing and its integration into real workflows.