https://huggingface.co/datasets/HuggingFaceM4/WebSight

\n","updatedAt":"2024-03-15T15:14:13.939Z","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.2876605689525604},"editors":["AdinaY"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg"],"reactions":[{"reaction":"🚀","users":["julien-c","jeremy-london"],"count":2}],"isReport":false}},{"id":"65f4f3f7dd3cc437a8bed306","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-03-16T01:20:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Design2Code: How Far Are We From Automating Front-End Engineering?](https://huggingface.co/papers/2403.03163) (2024)\n* [DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence](https://huggingface.co/papers/2401.14196) (2024)\n* [Code Needs Comments: Enhancing Code LLMs with Comment Augmentation](https://huggingface.co/papers/2402.13013) (2024)\n* [OMPGPT: A Generative Pre-trained Transformer Model for OpenMP](https://huggingface.co/papers/2401.16445) (2024)\n* [Enhancing Vision-Language Pre-training with Rich Supervisions](https://huggingface.co/papers/2403.03346) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-03-16T01:20:55.592Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7470456957817078},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"65f6d4cd1d7d373a9a02c438","author":{"_id":"647f5af5b0e96764589f3b2a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/VJ4cDyjp5M3V5WmI5gPIU.jpeg","fullname":"Tianyi Zhou","name":"zhoutianyi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":22,"isUserFollowing":false},"createdAt":"2024-03-17T11:32:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Congrats on the great work! Our arXiv paper https://arxiv.org/abs/2305.14637 is one of the earliest works addressing the same problem one year ago. Looking forward to more work on the topic! ","html":"

Congrats on the great work! Our arXiv paper https://arxiv.org/abs/2305.14637 is one of the earliest works addressing the same problem one year ago. Looking forward to more work on the topic!

\n","updatedAt":"2024-03-17T11:32:29.841Z","author":{"_id":"647f5af5b0e96764589f3b2a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/VJ4cDyjp5M3V5WmI5gPIU.jpeg","fullname":"Tianyi Zhou","name":"zhoutianyi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":22,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8806163668632507},"editors":["zhoutianyi"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/VJ4cDyjp5M3V5WmI5gPIU.jpeg"],"reactions":[{"reaction":"👍","users":["AdinaY","VictorSanh"],"count":2}],"isReport":false}},{"id":"65f70841a774f70f1a9ad7ef","author":{"_id":"6177322d37f32ecb1e2d4cdf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg","fullname":"Hugo Laurençon","name":"HugoLaurencon","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":161,"isUserFollowing":false},"createdAt":"2024-03-17T15:12:01.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks @zhoutianyi for the reference, we indeed missed your paper\nWe’ll put it in the related work section if we edit this technical report after the next iteration!","html":"

Thanks \n\n@zhoutianyi\n\t for the reference, we indeed missed your paper
We’ll put it in the related work section if we edit this technical report after the next iteration!

\n","updatedAt":"2024-03-17T15:12:01.494Z","author":{"_id":"6177322d37f32ecb1e2d4cdf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg","fullname":"Hugo Laurençon","name":"HugoLaurencon","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":161,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8340848088264465},"editors":["HugoLaurencon"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.09029","authors":[{"_id":"65f3a30e2b4e85e2e8898b2d","user":{"_id":"6177322d37f32ecb1e2d4cdf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg","isPro":false,"fullname":"Hugo Laurençon","user":"HugoLaurencon","type":"user"},"name":"Hugo Laurençon","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:42:49.912Z","hidden":false},{"_id":"65f3a30e2b4e85e2e8898b2e","user":{"_id":"6244866a456803e9500d0f6a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1652185658647-6244866a456803e9500d0f6a.jpeg","isPro":false,"fullname":"Leo Tronchon","user":"Leyo","type":"user"},"name":"Léo Tronchon","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:43:01.647Z","hidden":false},{"_id":"65f3a30e2b4e85e2e8898b2f","user":{"_id":"5ecea265968f6028e0559fa5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1619623771844-5ecea265968f6028e0559fa5.jpeg","isPro":true,"fullname":"Victor Sanh","user":"VictorSanh","type":"user"},"name":"Victor Sanh","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:43:09.692Z","hidden":false}],"publishedAt":"2024-03-14T01:40:40.000Z","submittedOnDailyAt":"2024-03-15T01:21:31.174Z","title":"Unlocking the conversion of Web Screenshots into HTML Code with the\n WebSight Dataset","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Using vision-language models (VLMs) in web development presents a promising\nstrategy to increase efficiency and unblock no-code solutions: by providing a\nscreenshot or a sketch of a UI, a VLM could generate the code to reproduce it,\nfor instance in a language like HTML. Despite the advancements in VLMs for\nvarious tasks, the specific challenge of converting a screenshot into a\ncorresponding HTML has been minimally explored. We posit that this is mainly\ndue to the absence of a suitable, high-quality dataset. This work introduces\nWebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and\ntheir corresponding screenshots. We fine-tune a foundational VLM on our dataset\nand show proficiency in converting webpage screenshots to functional HTML code.\nTo accelerate the research in this area, we open-source WebSight.","upvotes":56,"discussionId":"65f3a30f2b4e85e2e8898b53","ai_summary":"A synthetic dataset of HTML code and screenshots is introduced to enhance the ability of vision-language models to convert screenshots into HTML code.","ai_keywords":["vision-language models","VLMs","WebSight","synthetic dataset","HTML codes","screenshots","fine-tune"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62288d04f83ec595d158f247","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62288d04f83ec595d158f247/hU7nf50waNJHDiG20mdQc.jpeg","isPro":false,"fullname":"Dmitry Balobin","user":"d0rj","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"62fadca70697d22421a05a36","avatarUrl":"/avatars/ce9fd5d70f56a903a5d0f4de9f6f4034.svg","isPro":false,"fullname":"jineui-kim","user":"engui","type":"user"},{"_id":"642a671569bb3e94e9a5923e","avatarUrl":"/avatars/dd64325e910368acf98f9e384a6850cc.svg","isPro":false,"fullname":"Xiang Fu","user":"craigxiangfu","type":"user"},{"_id":"62c7dbb646b925ad8e9d9e7a","avatarUrl":"/avatars/c0320ef100e11eabd0bb303a2eac6076.svg","isPro":false,"fullname":"Tahsin Elahi Navin ","user":"tnavin","type":"user"},{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"65f412d4e364a7d45b6cfcb4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/NwtjR34bE1fQU_vUEVKFF.jpeg","isPro":false,"fullname":"Dragor Draganovic","user":"rogard","type":"user"},{"_id":"64d98ef7a4839890b25eb78b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d98ef7a4839890b25eb78b/215-CSVLl81z6CAq0ECWU.jpeg","isPro":true,"fullname":"Fangyuan Yu","user":"Ksgk-fy","type":"user"},{"_id":"6177322d37f32ecb1e2d4cdf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg","isPro":false,"fullname":"Hugo Laurençon","user":"HugoLaurencon","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">

Papers

arxiv:2403.09029

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Published on Mar 14, 2024

· Submitted by

AK on Mar 15, 2024

#3 Paper of the day

Upvote

Authors:

Hugo Laurençon ,

Léo Tronchon ,

Victor Sanh

Abstract

A synthetic dataset of HTML code and screenshots is introduced to enhance the ability of vision-language models to convert screenshots into HTML code.

AI-generated summary

Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight.

View arXiv page View PDF Add to collection

Community

AdinaY

Mar 15, 2024

https://huggingface.co/datasets/HuggingFaceM4/WebSight

librarian-bot

Mar 16, 2024

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

zhoutianyi

Mar 17, 2024

Congrats on the great work! Our arXiv paper https://arxiv.org/abs/2305.14637 is one of the earliest works addressing the same problem one year ago. Looking forward to more work on the topic!

HugoLaurencon

Paper author Mar 17, 2024

Thanks @zhoutianyi for the reference, we indeed missed your paper
We’ll put it in the related work section if we edit this technical report after the next iteration!