Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Unlocking the conversion of Web Screenshots into HTML Code with the
WebSight Dataset
https://huggingface.co/datasets/HuggingFaceM4/WebSight\n","updatedAt":"2024-03-15T15:14:13.939Z","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.2876605689525604},"editors":["AdinaY"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg"],"reactions":[{"reaction":"🚀","users":["julien-c","jeremy-london"],"count":2}],"isReport":false}},{"id":"65f4f3f7dd3cc437a8bed306","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-03-16T01:20:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Design2Code: How Far Are We From Automating Front-End Engineering?](https://huggingface.co/papers/2403.03163) (2024)\n* [DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence](https://huggingface.co/papers/2401.14196) (2024)\n* [Code Needs Comments: Enhancing Code LLMs with Comment Augmentation](https://huggingface.co/papers/2402.13013) (2024)\n* [OMPGPT: A Generative Pre-trained Transformer Model for OpenMP](https://huggingface.co/papers/2401.16445) (2024)\n* [Enhancing Vision-Language Pre-training with Rich Supervisions](https://huggingface.co/papers/2403.03346) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-03-16T01:20:55.592Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7470456957817078},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"65f6d4cd1d7d373a9a02c438","author":{"_id":"647f5af5b0e96764589f3b2a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/VJ4cDyjp5M3V5WmI5gPIU.jpeg","fullname":"Tianyi Zhou","name":"zhoutianyi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":22,"isUserFollowing":false},"createdAt":"2024-03-17T11:32:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Congrats on the great work! Our arXiv paper https://arxiv.org/abs/2305.14637 is one of the earliest works addressing the same problem one year ago. Looking forward to more work on the topic! ","html":"
Congrats on the great work! Our arXiv paper https://arxiv.org/abs/2305.14637 is one of the earliest works addressing the same problem one year ago. Looking forward to more work on the topic!
\n","updatedAt":"2024-03-17T11:32:29.841Z","author":{"_id":"647f5af5b0e96764589f3b2a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/VJ4cDyjp5M3V5WmI5gPIU.jpeg","fullname":"Tianyi Zhou","name":"zhoutianyi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":22,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8806163668632507},"editors":["zhoutianyi"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/VJ4cDyjp5M3V5WmI5gPIU.jpeg"],"reactions":[{"reaction":"👍","users":["AdinaY","VictorSanh"],"count":2}],"isReport":false}},{"id":"65f70841a774f70f1a9ad7ef","author":{"_id":"6177322d37f32ecb1e2d4cdf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg","fullname":"Hugo Laurençon","name":"HugoLaurencon","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":161,"isUserFollowing":false},"createdAt":"2024-03-17T15:12:01.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thanks @zhoutianyi for the reference, we indeed missed your paper\nWe’ll put it in the related work section if we edit this technical report after the next iteration!","html":"
Thanks \n\n@zhoutianyi\n\t for the reference, we indeed missed your paper We’ll put it in the related work section if we edit this technical report after the next iteration!
\n","updatedAt":"2024-03-17T15:12:01.494Z","author":{"_id":"6177322d37f32ecb1e2d4cdf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg","fullname":"Hugo Laurençon","name":"HugoLaurencon","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":161,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8340848088264465},"editors":["HugoLaurencon"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.09029","authors":[{"_id":"65f3a30e2b4e85e2e8898b2d","user":{"_id":"6177322d37f32ecb1e2d4cdf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg","isPro":false,"fullname":"Hugo Laurençon","user":"HugoLaurencon","type":"user"},"name":"Hugo Laurençon","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:42:49.912Z","hidden":false},{"_id":"65f3a30e2b4e85e2e8898b2e","user":{"_id":"6244866a456803e9500d0f6a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1652185658647-6244866a456803e9500d0f6a.jpeg","isPro":false,"fullname":"Leo Tronchon","user":"Leyo","type":"user"},"name":"Léo Tronchon","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:43:01.647Z","hidden":false},{"_id":"65f3a30e2b4e85e2e8898b2f","user":{"_id":"5ecea265968f6028e0559fa5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1619623771844-5ecea265968f6028e0559fa5.jpeg","isPro":true,"fullname":"Victor Sanh","user":"VictorSanh","type":"user"},"name":"Victor Sanh","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:43:09.692Z","hidden":false}],"publishedAt":"2024-03-14T01:40:40.000Z","submittedOnDailyAt":"2024-03-15T01:21:31.174Z","title":"Unlocking the conversion of Web Screenshots into HTML Code with the\n WebSight Dataset","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Using vision-language models (VLMs) in web development presents a promising\nstrategy to increase efficiency and unblock no-code solutions: by providing a\nscreenshot or a sketch of a UI, a VLM could generate the code to reproduce it,\nfor instance in a language like HTML. Despite the advancements in VLMs for\nvarious tasks, the specific challenge of converting a screenshot into a\ncorresponding HTML has been minimally explored. We posit that this is mainly\ndue to the absence of a suitable, high-quality dataset. This work introduces\nWebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and\ntheir corresponding screenshots. We fine-tune a foundational VLM on our dataset\nand show proficiency in converting webpage screenshots to functional HTML code.\nTo accelerate the research in this area, we open-source WebSight.","upvotes":56,"discussionId":"65f3a30f2b4e85e2e8898b53","ai_summary":"A synthetic dataset of HTML code and screenshots is introduced to enhance the ability of vision-language models to convert screenshots into HTML code.","ai_keywords":["vision-language models","VLMs","WebSight","synthetic dataset","HTML codes","screenshots","fine-tune"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62288d04f83ec595d158f247","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62288d04f83ec595d158f247/hU7nf50waNJHDiG20mdQc.jpeg","isPro":false,"fullname":"Dmitry Balobin","user":"d0rj","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"62fadca70697d22421a05a36","avatarUrl":"/avatars/ce9fd5d70f56a903a5d0f4de9f6f4034.svg","isPro":false,"fullname":"jineui-kim","user":"engui","type":"user"},{"_id":"642a671569bb3e94e9a5923e","avatarUrl":"/avatars/dd64325e910368acf98f9e384a6850cc.svg","isPro":false,"fullname":"Xiang Fu","user":"craigxiangfu","type":"user"},{"_id":"62c7dbb646b925ad8e9d9e7a","avatarUrl":"/avatars/c0320ef100e11eabd0bb303a2eac6076.svg","isPro":false,"fullname":"Tahsin Elahi Navin ","user":"tnavin","type":"user"},{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"65f412d4e364a7d45b6cfcb4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/NwtjR34bE1fQU_vUEVKFF.jpeg","isPro":false,"fullname":"Dragor Draganovic","user":"rogard","type":"user"},{"_id":"64d98ef7a4839890b25eb78b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d98ef7a4839890b25eb78b/215-CSVLl81z6CAq0ECWU.jpeg","isPro":true,"fullname":"Fangyuan Yu","user":"Ksgk-fy","type":"user"},{"_id":"6177322d37f32ecb1e2d4cdf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635201569275-noauth.jpeg","isPro":false,"fullname":"Hugo Laurençon","user":"HugoLaurencon","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
A synthetic dataset of HTML code and screenshots is introduced to enhance the ability of vision-language models to convert screenshots into HTML code.
AI-generated summary
Using vision-language models (VLMs) in web development presents a promising
strategy to increase efficiency and unblock no-code solutions: by providing a
screenshot or a sketch of a UI, a VLM could generate the code to reproduce it,
for instance in a language like HTML. Despite the advancements in VLMs for
various tasks, the specific challenge of converting a screenshot into a
corresponding HTML has been minimally explored. We posit that this is mainly
due to the absence of a suitable, high-quality dataset. This work introduces
WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and
their corresponding screenshots. We fine-tune a foundational VLM on our dataset
and show proficiency in converting webpage screenshots to functional HTML code.
To accelerate the research in this area, we open-source WebSight.
Congrats on the great work! Our arXiv paper https://arxiv.org/abs/2305.14637 is one of the earliest works addressing the same problem one year ago. Looking forward to more work on the topic!
Thanks @zhoutianyi for the reference, we indeed missed your paper We’ll put it in the related work section if we edit this technical report after the next iteration!