Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-29T15:45:37.579Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6878007650375366},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.20380","authors":[{"_id":"697acd54df3e800774f13c30","name":"Le Zhang","hidden":false},{"_id":"697acd54df3e800774f13c31","name":"Yixiong Xiao","hidden":false},{"_id":"697acd54df3e800774f13c32","name":"Xinjiang Lu","hidden":false},{"_id":"697acd54df3e800774f13c33","name":"Jingjia Cao","hidden":false},{"_id":"697acd54df3e800774f13c34","name":"Yusai Zhao","hidden":false},{"_id":"697acd54df3e800774f13c35","name":"Jingbo Zhou","hidden":false},{"_id":"697acd54df3e800774f13c36","name":"Lang An","hidden":false},{"_id":"697acd54df3e800774f13c37","name":"Zikan Feng","hidden":false},{"_id":"697acd54df3e800774f13c38","name":"Wanxiang Sha","hidden":false},{"_id":"697acd54df3e800774f13c39","name":"Yu Shi","hidden":false},{"_id":"697acd54df3e800774f13c3a","name":"Congxi Xiao","hidden":false},{"_id":"697acd54df3e800774f13c3b","name":"Jian Xiong","hidden":false},{"_id":"697acd54df3e800774f13c3c","name":"Yankai Zhang","hidden":false},{"_id":"697acd54df3e800774f13c3d","name":"Hua Wu","hidden":false},{"_id":"697acd54df3e800774f13c3e","name":"Haifeng Wang","hidden":false}],"publishedAt":"2026-01-28T08:45:17.000Z","submittedOnDailyAt":"2026-01-29T00:30:40.013Z","title":"OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Graphical User Interface (GUI) agents show great potential for enabling foundation models to complete real-world tasks, revolutionizing human-computer interaction and improving human productivity. In this report, we present OmegaUse, a general-purpose GUI agent model for autonomous task execution on both mobile and desktop platforms, supporting computer-use and phone-use scenarios. Building an effective GUI agent model relies on two factors: (1) high-quality data and (2) effective training methods. To address these, we introduce a carefully engineered data-construction pipeline and a decoupled training paradigm. For data construction, we leverage rigorously curated open-source datasets and introduce a novel automated synthesis framework that integrates bottom-up autonomous exploration with top-down taxonomy-guided generation to create high-fidelity synthetic data. For training, to better leverage these data, we adopt a two-stage strategy: Supervised Fine-Tuning (SFT) to establish fundamental interaction syntax, followed by Group Relative Policy Optimization (GRPO) to improve spatial grounding and sequential planning. To balance computational efficiency with agentic reasoning capacity, OmegaUse is built on a Mixture-of-Experts (MoE) backbone. To evaluate cross-terminal capabilities in an offline setting, we introduce OS-Nav, a benchmark suite spanning multiple operating systems: ChiM-Nav, targeting Chinese Android mobile environments, and Ubu-Nav, focusing on routine desktop interactions on Ubuntu. Extensive experiments show that OmegaUse is highly competitive across established GUI benchmarks, achieving a state-of-the-art (SOTA) score of 96.3% on ScreenSpot-V2 and a leading 79.1% step success rate on AndroidControl. OmegaUse also performs strongly on OS-Nav, reaching 74.24% step success on ChiM-Nav and 55.9% average success on Ubu-Nav.","upvotes":8,"discussionId":"697acd55df3e800774f13c3f","ai_summary":"OmegaUse is a general-purpose GUI agent model that achieves state-of-the-art performance on mobile and desktop platforms through a combination of high-quality data construction, decoupled training methods, and a Mixture-of-Experts architecture.","ai_keywords":["GUI agents","foundation models","autonomous task execution","data-construction pipeline","decoupled training paradigm","automated synthesis framework","bottom-up autonomous exploration","top-down taxonomy-guided generation","supervised fine-tuning","group relative policy optimization","mixture-of-experts","screenspot-v2","androidcontrol","os-nav","chim-nav","ubu-nav"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66fa9f0edf4d7ebc64ba64b4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66fa9f0edf4d7ebc64ba64b4/j680v7qhjpk1R_rxcb-L8.jpeg","isPro":false,"fullname":"Daniel","user":"LighterDarkness","type":"user"},{"_id":"696605510d6b2e9067151a47","avatarUrl":"/avatars/3e6932ce31576b28a5d5efe0651a038c.svg","isPro":false,"fullname":"Zhou Jingbo","user":"zhoujingbo","type":"user"},{"_id":"697b1d9e28da49409c7023d1","avatarUrl":"/avatars/a18ec9cf09735c58855d8552b32c1833.svg","isPro":false,"fullname":"Brande Willoughby","user":"B11112","type":"user"},{"_id":"671c32deae81b0db89bc08a3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/BH4MA-D7u0rOJuZccnivi.jpeg","isPro":false,"fullname":"CYGDEN","user":"CYGDEN","type":"user"},{"_id":"644d8340d185572dd1e728f5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/UGVKlBiDLrQ5Qj1fSbSxP.png","isPro":false,"fullname":"Jesse Katz","user":"JesseK1627","type":"user"},{"_id":"684d57f26e04c265777ead3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/cuOj-bQqukSZreXgUJlfm.png","isPro":false,"fullname":"Joakim Lee","user":"Reinforcement4All","type":"user"},{"_id":"62baa0d6dd02fbf607ce97be","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62baa0d6dd02fbf607ce97be/V0I6pANlLEf2YDd9ZLZgi.jpeg","isPro":false,"fullname":"Wendy","user":"Wendy-Fly","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2601.20380

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Published on Jan 28

· Submitted by

taesiri on Jan 29

Upvote

Authors:

Abstract

OmegaUse is a general-purpose GUI agent model that achieves state-of-the-art performance on mobile and desktop platforms through a combination of high-quality data construction, decoupled training methods, and a Mixture-of-Experts architecture.

AI-generated summary

Graphical User Interface (GUI) agents show great potential for enabling foundation models to complete real-world tasks, revolutionizing human-computer interaction and improving human productivity. In this report, we present OmegaUse, a general-purpose GUI agent model for autonomous task execution on both mobile and desktop platforms, supporting computer-use and phone-use scenarios. Building an effective GUI agent model relies on two factors: (1) high-quality data and (2) effective training methods. To address these, we introduce a carefully engineered data-construction pipeline and a decoupled training paradigm. For data construction, we leverage rigorously curated open-source datasets and introduce a novel automated synthesis framework that integrates bottom-up autonomous exploration with top-down taxonomy-guided generation to create high-fidelity synthetic data. For training, to better leverage these data, we adopt a two-stage strategy: Supervised Fine-Tuning (SFT) to establish fundamental interaction syntax, followed by Group Relative Policy Optimization (GRPO) to improve spatial grounding and sequential planning. To balance computational efficiency with agentic reasoning capacity, OmegaUse is built on a Mixture-of-Experts (MoE) backbone. To evaluate cross-terminal capabilities in an offline setting, we introduce OS-Nav, a benchmark suite spanning multiple operating systems: ChiM-Nav, targeting Chinese Android mobile environments, and Ubu-Nav, focusing on routine desktop interactions on Ubuntu. Extensive experiments show that OmegaUse is highly competitive across established GUI benchmarks, achieving a state-of-the-art (SOTA) score of 96.3% on ScreenSpot-V2 and a leading 79.1% step success rate on AndroidControl. OmegaUse also performs strongly on OS-Nav, reaching 74.24% step success on ChiM-Nav and 55.9% average success on Ubu-Nav.

View arXiv page View PDF Add to collection

Community

taesiri

Paper submitter 23 days ago

OmegaUse is a general GUI agent enabling autonomous desktop and mobile task execution, trained with curated data, SFT and GRPO, MoE backbone, and evaluated on OS-Nav benchmarks.

librarian-bot

22 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.20380 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.20380 in a Space README.md to link it from this page.