Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-01-14T01:38:52.293Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7086265087127686},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"696b8b2785619ece0dd07170","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-01-17T13:14:15.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/os-symphony-a-holistic-framework-for-robust-and-generalist-computer-using-agent-507-33a355e6\n\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"
\n","updatedAt":"2026-01-17T13:14:15.167Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7250910401344299},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.07779","authors":[{"_id":"6965f217fc8c4ecc02c7fa6b","user":{"_id":"66abb1d1930106b4b433f295","avatarUrl":"/avatars/12e3375fb6aa04fc24d5092bf40cdecd.svg","isPro":false,"fullname":"ybw","user":"YYangzzzz","type":"user"},"name":"Bowen Yang","status":"claimed_verified","statusLastChangedAt":"2026-01-14T14:21:49.683Z","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa6c","user":{"_id":"6763dadd396e4cbdd6ab9a54","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ver55upgXTsPn7_dFJaIY.png","isPro":false,"fullname":"Jin Kaiming","user":"jkm1104","type":"user"},"name":"Kaiming Jin","status":"claimed_verified","statusLastChangedAt":"2026-01-16T10:34:22.055Z","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa6d","name":"Zhenyu Wu","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa6e","name":"Zhaoyang Liu","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa6f","user":{"_id":"6064a0eeb1703ddba0d458b9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1617207525789-noauth.png","isPro":false,"fullname":"Qiushi","user":"QiushiSun","type":"user"},"name":"Qiushi Sun","status":"claimed_verified","statusLastChangedAt":"2026-01-13T08:23:13.865Z","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa70","name":"Zehao Li","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa71","name":"JingJing Xie","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa72","name":"Zhoumianze Liu","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa73","user":{"_id":"64e6cf78ecce34cb442dc889","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64e6cf78ecce34cb442dc889/qVZFiUEpBpSkmH8SQeinm.jpeg","isPro":false,"fullname":"Fangzhi Xu","user":"xufangzhi","type":"user"},"name":"Fangzhi Xu","status":"claimed_verified","statusLastChangedAt":"2026-02-06T18:57:05.183Z","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa74","name":"Kanzhi Cheng","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa75","name":"Qingyun Li","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa76","name":"Yian Wang","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa77","name":"Yu Qiao","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa78","name":"Zun Wang","hidden":false},{"_id":"6965f217fc8c4ecc02c7fa79","user":{"_id":"642b9861bb77f8456634b048","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642b9861bb77f8456634b048/VrNmmcdgX7FufQmdP5YaG.jpeg","isPro":false,"fullname":"Zichen Ding","user":"heroding77","type":"user"},"name":"Zichen Ding","status":"claimed_verified","statusLastChangedAt":"2026-01-13T08:23:16.476Z","hidden":false}],"publishedAt":"2026-01-12T17:55:51.000Z","submittedOnDailyAt":"2026-01-13T05:38:29.779Z","title":"OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent","submittedOnDailyBy":{"_id":"642b9861bb77f8456634b048","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642b9861bb77f8456634b048/VrNmmcdgX7FufQmdP5YaG.jpeg","isPro":false,"fullname":"Zichen Ding","user":"heroding77","type":"user"},"summary":"While Vision-Language Models (VLMs) have significantly advanced Computer-Using Agents (CUAs), current frameworks struggle with robustness in long-horizon workflows and generalization in novel domains. These limitations stem from a lack of granular control over historical visual context curation and the absence of visual-aware tutorial retrieval. To bridge these gaps, we introduce OS-Symphony, a holistic framework that comprises an Orchestrator coordinating two key innovations for robust automation: (1) a Reflection-Memory Agent that utilizes milestone-driven long-term memory to enable trajectory-level self-correction, effectively mitigating visual context loss in long-horizon tasks; (2) Versatile Tool Agents featuring a Multimodal Searcher that adopts a SeeAct paradigm to navigate a browser-based sandbox to synthesize live, visually aligned tutorials, thereby resolving fidelity issues in unseen scenarios. Experimental results demonstrate that OS-Symphony delivers substantial performance gains across varying model scales, establishing new state-of-the-art results on three online benchmarks, notably achieving 65.84% on OSWorld.","upvotes":28,"discussionId":"6965f217fc8c4ecc02c7fa7a","projectPage":"https://os-copilot.github.io/OS-Symphony","githubRepo":"https://github.com/OS-Copilot/OS-Symphony","githubRepoAddedBy":"user","ai_summary":"OS-Symphony presents a comprehensive framework for computer-using agents that enhances robustness in long-horizon tasks through reflection-memory and multimodal search capabilities.","ai_keywords":["Vision-Language Models","Computer-Using Agents","long-horizon workflows","visual context curation","visual-aware tutorial retrieval","Orchestrator","Reflection-Memory Agent","milestone-driven long-term memory","trajectory-level self-correction","Versatile Tool Agents","Multimodal Searcher","SeeAct paradigm","browser-based sandbox","live visually aligned tutorials"],"githubStars":29,"organization":{"_id":"61d8000084231b832e5bbd99","name":"ustc","fullname":"university of science and technology of china","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1641545773772-61d7fdeb22a383817a543b68.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"642b9861bb77f8456634b048","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642b9861bb77f8456634b048/VrNmmcdgX7FufQmdP5YaG.jpeg","isPro":false,"fullname":"Zichen Ding","user":"heroding77","type":"user"},{"_id":"66abb1d1930106b4b433f295","avatarUrl":"/avatars/12e3375fb6aa04fc24d5092bf40cdecd.svg","isPro":false,"fullname":"ybw","user":"YYangzzzz","type":"user"},{"_id":"6763dadd396e4cbdd6ab9a54","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ver55upgXTsPn7_dFJaIY.png","isPro":false,"fullname":"Jin Kaiming","user":"jkm1104","type":"user"},{"_id":"669e3aafaa500cd99d1336ac","avatarUrl":"/avatars/e43cd4b4ed9e86bb0fb308e5da59dc3b.svg","isPro":false,"fullname":"Jiaqi Tan","user":"Renaissancejq","type":"user"},{"_id":"64d09c16c0c627dfa7f22599","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d09c16c0c627dfa7f22599/TCV-PmAmPcbRpd2Nc11CL.jpeg","isPro":false,"fullname":"jianxiangyu","user":"ffjasonyu","type":"user"},{"_id":"670519fabada0b0c3fd7b8ac","avatarUrl":"/avatars/6958cf623e6d7fbba783ba15ba7def72.svg","isPro":false,"fullname":"Jindong Tian","user":"Philosober","type":"user"},{"_id":"628c5da32f09ccf530204dbe","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653366416287-628c5da32f09ccf530204dbe.jpeg","isPro":false,"fullname":"Zhangyue Yin","user":"yinzhangyue","type":"user"},{"_id":"6502f241b1792803da7e8def","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6502f241b1792803da7e8def/mJ1XCVKivsMLi2Lo1kGKX.png","isPro":false,"fullname":"JingJing Xie","user":"ownerEli","type":"user"},{"_id":"619ef3f253061ce00477b09e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/619ef3f253061ce00477b09e/FknZhgQhV2_3aTqIKVsTo.jpeg","isPro":false,"fullname":"Qiaosheng Chen","user":"cqsss","type":"user"},{"_id":"6602548a68d519ed324b47c5","avatarUrl":"/avatars/5ab411f87440cc2a98c7a1c6a3ed5548.svg","isPro":false,"fullname":"ChengyouJia","user":"ChengyouJia","type":"user"},{"_id":"649d1d4c379eada9a580cf59","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649d1d4c379eada9a580cf59/ucXv7KoJDEB3Phgn-Dn5E.png","isPro":false,"fullname":"xuhuang","user":"xuhuang87","type":"user"},{"_id":"63340dbbd92c5842ae71d1e9","avatarUrl":"/avatars/3a3182996bd41b526dcbfa8687d91963.svg","isPro":false,"fullname":"Kanzhi Cheng","user":"cckevinn","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"61d8000084231b832e5bbd99","name":"ustc","fullname":"university of science and technology of china","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1641545773772-61d7fdeb22a383817a543b68.png"}}">
OS-Symphony presents a comprehensive framework for computer-using agents that enhances robustness in long-horizon tasks through reflection-memory and multimodal search capabilities.
Despite VLM advances, current CUA frameworks remain brittle in long-horizon workflows and weak in novel domains due to coarse historical visual context management and missing visual-aware tutorial retrieval, so we propose OS-SYMPHONY, an orchestrated framework combining milestone-driven reflection memory for trajectory-level self-correction with a SeeAct-style multimodal searcher that synthesizes visually aligned live tutorials, achieving new SOTA across three online benchmarks (65.84% on OSWorld).