Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - GEBench: Benchmarking Image Generation Models as GUI Environments
https://github.com/stepfun-ai/GEBench.\n","updatedAt":"2026-02-10T07:07:26.021Z","author":{"_id":"65ddea8b2d26e59a5a33330f","avatarUrl":"/avatars/3104ddafd6dda3c05ea9a771dbf2deeb.svg","fullname":"li haodong","name":"mickyhimself","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":0,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8757646679878235},"editors":["mickyhimself"],"editorAvatarUrls":["/avatars/3104ddafd6dda3c05ea9a771dbf2deeb.svg"],"reactions":[],"isReport":false}},{"id":"698bde7cb9b8fbd45e1ec0fd","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false},"createdAt":"2026-02-11T01:42:20.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [MMGR: Multi-Modal Generative Reasoning](https://huggingface.co/papers/2512.14691) (2025)\n* [RISE-Video: Can Video Generators Decode Implicit World Rules?](https://huggingface.co/papers/2602.05986) (2026)\n* [VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks](https://huggingface.co/papers/2512.16501) (2025)\n* [Generative Visual Code Mobile World Models](https://huggingface.co/papers/2602.01576) (2026)\n* [Rethinking Video Generation Model for the Embodied World](https://huggingface.co/papers/2601.15282) (2026)\n* [TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering](https://huggingface.co/papers/2512.16270) (2025)\n* [Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation](https://huggingface.co/papers/2602.01756) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-02-11T01:42:20.041Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7018976211547852},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.09007","authors":[{"_id":"698ad8ac1b2dc6b37d61b275","user":{"_id":"65ddea8b2d26e59a5a33330f","avatarUrl":"/avatars/3104ddafd6dda3c05ea9a771dbf2deeb.svg","isPro":false,"fullname":"li haodong","user":"mickyhimself","type":"user"},"name":"Haodong Li","status":"claimed_verified","statusLastChangedAt":"2026-02-10T09:28:30.775Z","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b276","name":"Jingwei Wu","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b277","name":"Quan Sun","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b278","name":"Guopeng Li","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b279","user":{"_id":"670880950e79a8b46f7ff9dd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/670880950e79a8b46f7ff9dd/hA1TLhwlQblkFsq8wLrkB.jpeg","isPro":false,"fullname":"Juanxi Tian","user":"Juanxi","type":"user"},"name":"Juanxi Tian","status":"claimed_verified","statusLastChangedAt":"2026-02-10T09:04:13.048Z","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b27a","name":"Huanyu Zhang","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b27b","name":"Yanlin Lai","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b27c","name":"Ruichuan An","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b27d","name":"Hongbo Peng","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b27e","user":{"_id":"65d70e775e971572da16c05b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d70e775e971572da16c05b/8Cv71Clfk_C7k6U4yI6ln.jpeg","isPro":false,"fullname":"YuHong Dai","user":"BroAlanTaps","type":"user"},"name":"Yuhong Dai","status":"claimed_verified","statusLastChangedAt":"2026-02-10T09:04:30.092Z","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b27f","name":"Chenxi Li","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b280","name":"Chunmei Qing","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b281","name":"Jia Wang","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b282","name":"Ziyang Meng","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b283","name":"Zheng Ge","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b284","name":"Xiangyu Zhang","hidden":false},{"_id":"698ad8ac1b2dc6b37d61b285","name":"Daxin Jiang","hidden":false}],"publishedAt":"2026-02-09T18:52:02.000Z","submittedOnDailyAt":"2026-02-10T04:37:26.012Z","title":"GEBench: Benchmarking Image Generation Models as GUI Environments","submittedOnDailyBy":{"_id":"65ddea8b2d26e59a5a33330f","avatarUrl":"/avatars/3104ddafd6dda3c05ea9a771dbf2deeb.svg","isPro":false,"fullname":"li haodong","user":"mickyhimself","type":"user"},"summary":"Recent advancements in image generation models have enabled the prediction of future Graphical User Interface (GUI) states based on user instructions. However, existing benchmarks primarily focus on general domain visual fidelity, leaving the evaluation of state transitions and temporal coherence in GUI-specific contexts underexplored. To address this gap, we introduce GEBench, a comprehensive benchmark for evaluating dynamic interaction and temporal coherence in GUI generation. GEBench comprises 700 carefully curated samples spanning five task categories, covering both single-step interactions and multi-step trajectories across real-world and fictional scenarios, as well as grounding point localization. To support systematic evaluation, we propose GE-Score, a novel five-dimensional metric that assesses Goal Achievement, Interaction Logic, Content Consistency, UI Plausibility, and Visual Quality. Extensive evaluations on current models indicate that while they perform well on single-step transitions, they struggle significantly with maintaining temporal coherence and spatial grounding over longer interaction sequences. Our findings identify icon interpretation, text rendering, and localization precision as critical bottlenecks. This work provides a foundation for systematic assessment and suggests promising directions for future research toward building high-fidelity generative GUI environments. The code is available at: https://github.com/stepfun-ai/GEBench.","upvotes":38,"discussionId":"698ad8ad1b2dc6b37d61b286","ai_summary":"A new benchmark and evaluation metric are introduced for assessing temporal coherence and dynamic interaction in GUI generation models, revealing significant challenges in maintaining consistency over extended interaction sequences.","ai_keywords":["GUI generation","temporal coherence","dynamic interaction","visual fidelity","GUI-specific contexts","GEBench","GE-Score","goal achievement","interaction logic","content consistency","UI plausibility","visual quality"],"organization":{"_id":"66e43eae9d477f566f937935","name":"stepfun-ai","fullname":"StepFun","avatar":"https://cdn-uploads.huggingface.co/production/uploads/66935cee39002fc0569c2943/Qv8QPbkgoKE3wR4jTzHiy.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65ddea8b2d26e59a5a33330f","avatarUrl":"/avatars/3104ddafd6dda3c05ea9a771dbf2deeb.svg","isPro":false,"fullname":"li haodong","user":"mickyhimself","type":"user"},{"_id":"67543820c3af453d7b3e1d5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67543820c3af453d7b3e1d5e/Shz6JAOZeUlSzY4RoY7DG.jpeg","isPro":false,"fullname":"Dingming Li","user":"lidingm","type":"user"},{"_id":"65e816bbcfd12cd15b052a0e","avatarUrl":"/avatars/4d92da469afdba8cd7dc645b98236011.svg","isPro":false,"fullname":"Huanyu_Zhang","user":"huanyu112","type":"user"},{"_id":"6447e88ce21484883404854c","avatarUrl":"/avatars/a56903a248de3eb36a2d16c2b7643495.svg","isPro":false,"fullname":"AoqiWu","user":"wswaq","type":"user"},{"_id":"670880950e79a8b46f7ff9dd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/670880950e79a8b46f7ff9dd/hA1TLhwlQblkFsq8wLrkB.jpeg","isPro":false,"fullname":"Juanxi Tian","user":"Juanxi","type":"user"},{"_id":"653614073f4248157d60ccdc","avatarUrl":"/avatars/c9298bab1cdc1d0b6ffe4c7c5ef18bd5.svg","isPro":false,"fullname":"mengziyang","user":"zylate","type":"user"},{"_id":"676e13d5940dd17d669ddb5e","avatarUrl":"/avatars/aa3c3eaffad47a4ece774580536f4e5a.svg","isPro":false,"fullname":"llm_gen","user":"llm-gen","type":"user"},{"_id":"65d70e775e971572da16c05b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d70e775e971572da16c05b/8Cv71Clfk_C7k6U4yI6ln.jpeg","isPro":false,"fullname":"YuHong Dai","user":"BroAlanTaps","type":"user"},{"_id":"630d7a8f81ef9b1772b67f4c","avatarUrl":"/avatars/00757abd6e548ccebb5bfb233be129a2.svg","isPro":false,"fullname":"Quan Sun","user":"QuanSun","type":"user"},{"_id":"688c72c011ef3399b561dee7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/688c72c011ef3399b561dee7/puhgnTOAfZYetsC46hqGm.jpeg","isPro":false,"fullname":"BoxueYang","user":"Boxue","type":"user"},{"_id":"65ab85b968139e3c42c6c50d","avatarUrl":"/avatars/fe35f055ecc49412b086a9a5513a11a8.svg","isPro":false,"fullname":"Jingwei Wu","user":"jingwwu","type":"user"},{"_id":"686b7549995e3cddf707d21d","avatarUrl":"/avatars/c670cc3e107d25d51f7ccabc5f359ced.svg","isPro":false,"fullname":"fantastic","user":"daluobo6","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"66e43eae9d477f566f937935","name":"stepfun-ai","fullname":"StepFun","avatar":"https://cdn-uploads.huggingface.co/production/uploads/66935cee39002fc0569c2943/Qv8QPbkgoKE3wR4jTzHiy.png"}}">
A new benchmark and evaluation metric are introduced for assessing temporal coherence and dynamic interaction in GUI generation models, revealing significant challenges in maintaining consistency over extended interaction sequences.
AI-generated summary
Recent advancements in image generation models have enabled the prediction of future Graphical User Interface (GUI) states based on user instructions. However, existing benchmarks primarily focus on general domain visual fidelity, leaving the evaluation of state transitions and temporal coherence in GUI-specific contexts underexplored. To address this gap, we introduce GEBench, a comprehensive benchmark for evaluating dynamic interaction and temporal coherence in GUI generation. GEBench comprises 700 carefully curated samples spanning five task categories, covering both single-step interactions and multi-step trajectories across real-world and fictional scenarios, as well as grounding point localization. To support systematic evaluation, we propose GE-Score, a novel five-dimensional metric that assesses Goal Achievement, Interaction Logic, Content Consistency, UI Plausibility, and Visual Quality. Extensive evaluations on current models indicate that while they perform well on single-step transitions, they struggle significantly with maintaining temporal coherence and spatial grounding over longer interaction sequences. Our findings identify icon interpretation, text rendering, and localization precision as critical bottlenecks. This work provides a foundation for systematic assessment and suggests promising directions for future research toward building high-fidelity generative GUI environments. The code is available at: https://github.com/stepfun-ai/GEBench.
Recent advancements in image generation models have enabled the prediction of future Graphical User Interface (GUI) states based on user instructions. However, existing benchmarks primarily focus on general domain visual fidelity, leaving the evaluation of state transitions and temporal coherence in GUI-specific contexts underexplored. To address this gap, we introduce GEBench, a comprehensive benchmark for evaluating dynamic interaction and temporal coherence in GUI generation. GEBench comprises 700 carefully curated samples spanning five task categories, covering both single-step interactions and multi-step trajectories across real-world and fictional scenarios, as well as grounding point localization. To support systematic evaluation, we propose GE-Score, a novel five-dimensional metric that assesses Goal Achievement, Interaction Logic, Content Consistency, UI Plausibility, and Visual Quality. Extensive evaluations on current models indicate that while they perform well on single-step transitions, they struggle significantly with maintaining temporal coherence and spatial grounding over longer interaction sequences. Our findings identify icon interpretation, text rendering, and localization precision as critical bottlenecks. This work provides a foundation for systematic assessment and suggests promising directions for future research toward building high-fidelity generative GUI environments. The code is available at: https://github.com/stepfun-ai/GEBench.