Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
[go: Go Back, main page]

\n\n","html":"
\n \n \n \n \n \n
\n\n","updatedAt":"2024-02-09T22:41:59.105Z","author":{"_id":"5fa9ff3ea13e063b8b2b60cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1633380224986-5fa9ff3ea13e063b8b2b60cb.jpeg","fullname":"Xing Han Lù","name":"xhluca","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":24,"isUserFollowing":false}},"numEdits":5,"identifiedLanguage":{"language":"en","probability":0.23988814651966095},"editors":["xhluca"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1633380224986-5fa9ff3ea13e063b8b2b60cb.jpeg"],"reactions":[],"isReport":false}},{"id":"65c6cf77e286dbda4ee822b0","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-02-10T01:20:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models](https://huggingface.co/papers/2401.13919) (2024)\n* [SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents](https://huggingface.co/papers/2401.10935) (2024)\n* [VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks](https://huggingface.co/papers/2401.13649) (2024)\n* [GPT-4V(ision) is a Generalist Web Agent, if Grounded](https://huggingface.co/papers/2401.01614) (2024)\n* [Dual-View Visual Contextualization for Web Navigation](https://huggingface.co/papers/2402.04476) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-02-10T01:20:55.972Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7134878635406494},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["xhluca","antiquesordo","Boynn","krishnapraveen"],"count":4}],"isReport":false}},{"id":"66941595bdfa57b805b33c53","author":{"_id":"659e4f4005fc7896701077d0","avatarUrl":"/avatars/664800e5275b1dd4b669c71f3c33470a.svg","fullname":"Kartik Mittal","name":"Generativekartik","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-07-14T18:14:45.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2024-07-14T18:15:27.315Z","author":{"_id":"659e4f4005fc7896701077d0","avatarUrl":"/avatars/664800e5275b1dd4b669c71f3c33470a.svg","fullname":"Kartik Mittal","name":"Generativekartik","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"669415a755876af6fc1377bf","author":{"_id":"659e4f4005fc7896701077d0","avatarUrl":"/avatars/664800e5275b1dd4b669c71f3c33470a.svg","fullname":"Kartik Mittal","name":"Generativekartik","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-07-14T18:15:03.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"How to fetch google website ","html":"

How to fetch google website

\n","updatedAt":"2024-07-14T18:15:03.953Z","author":{"_id":"659e4f4005fc7896701077d0","avatarUrl":"/avatars/664800e5275b1dd4b669c71f3c33470a.svg","fullname":"Kartik Mittal","name":"Generativekartik","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8500480055809021},"editors":["Generativekartik"],"editorAvatarUrls":["/avatars/664800e5275b1dd4b669c71f3c33470a.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.05930","authors":[{"_id":"65c59241ed68e7aafdc5cace","user":{"_id":"5fa9ff3ea13e063b8b2b60cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1633380224986-5fa9ff3ea13e063b8b2b60cb.jpeg","isPro":false,"fullname":"Xing Han Lù","user":"xhluca","type":"user"},"name":"Xing Han Lù","status":"claimed_verified","statusLastChangedAt":"2024-02-09T08:05:07.030Z","hidden":false},{"_id":"65c59241ed68e7aafdc5cacf","user":{"_id":"631b8456bf1351ed2bd1d332","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1662747897741-631b8456bf1351ed2bd1d332.jpeg","isPro":false,"fullname":"Zdeněk Kasner","user":"kasnerz","type":"user"},"name":"Zdeněk Kasner","status":"admin_assigned","statusLastChangedAt":"2024-02-09T10:18:47.386Z","hidden":false},{"_id":"65c59241ed68e7aafdc5cad0","user":{"_id":"624734dc4c731bb6bfab8af7","avatarUrl":"/avatars/6b250b58710a3287b85e4733c1824558.svg","isPro":false,"fullname":"Siva Reddy","user":"sivareddyg","type":"user"},"name":"Siva Reddy","status":"extracted_confirmed","statusLastChangedAt":"2025-04-11T03:27:01.619Z","hidden":false}],"publishedAt":"2024-02-08T18:58:02.000Z","submittedOnDailyAt":"2024-02-09T00:17:30.571Z","title":"WebLINX: Real-World Website Navigation with Multi-Turn Dialogue","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We propose the problem of conversational web navigation, where a digital\nagent controls a web browser and follows user instructions to solve real-world\ntasks in a multi-turn dialogue fashion. To support this problem, we introduce\nWEBLINX - a large-scale benchmark of 100K interactions across 2300 expert\ndemonstrations of conversational web navigation. Our benchmark covers a broad\nrange of patterns on over 150 real-world websites and can be used to train and\nevaluate agents in diverse scenarios. Due to the magnitude of information\npresent, Large Language Models (LLMs) cannot process entire web pages in\nreal-time. To solve this bottleneck, we design a retrieval-inspired model that\nefficiently prunes HTML pages by ranking relevant elements. We use the selected\nelements, along with screenshots and action history, to assess a variety of\nmodels for their ability to replicate human behavior when navigating the web.\nOur experiments span from small text-only to proprietary multimodal LLMs. We\nfind that smaller finetuned decoders surpass the best zero-shot LLMs (including\nGPT-4V), but also larger finetuned multimodal models which were explicitly\npretrained on screenshots. However, all finetuned models struggle to generalize\nto unseen websites. Our findings highlight the need for large multimodal models\nthat can generalize to novel settings. Our code, data and models are available\nfor research: https://mcgill-nlp.github.io/weblinx","upvotes":39,"discussionId":"65c59242ed68e7aafdc5cb29","githubRepo":"https://github.com/McGill-NLP/weblinx","githubRepoAddedBy":"auto","ai_summary":"A benchmark and retrieval-inspired model for efficient conversational web navigation by digital agents across diverse real-world websites is introduced, highlighting challenges in generalization to unseen sites.","ai_keywords":["conversational web navigation","WEBLINX","benchmark","LLMs","large language models","HTML pages","retrieval-inspired model","human behavior replication","multimodal LLMs","zero-shot LLMs","GPT-4V","text-only models","proprietary multimodal LLMs","generalization"],"githubStars":159},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"635fd74e14657fb8cff2bc13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635fd74e14657fb8cff2bc13/lUlHB0z1CRPJpwwT3JcnO.jpeg","isPro":false,"fullname":"Chan Kim","user":"chanmuzi","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6270c58780d5f35f8dbe42be","avatarUrl":"/avatars/d4d6e5eadfe9b1f47bce1c66728b24fc.svg","isPro":true,"fullname":"Benno Krojer","user":"BennoKrojer","type":"user"},{"_id":"63458f12d54fb141dedac508","avatarUrl":"/avatars/3946fb9c23d1cd24037770cc0a3489bf.svg","isPro":false,"fullname":"Amirhossein Kazemnejad","user":"kazemnejad","type":"user"},{"_id":"60edf5e03203a5daf7d3912e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60edf5e03203a5daf7d3912e/fq1AzGutKI_qojhevc03P.jpeg","isPro":false,"fullname":"Rabiul Awal","user":"rabiulawal","type":"user"},{"_id":"60b63a757430e735fbe737b1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1622555235160-noauth.jpeg","isPro":false,"fullname":"Andreas Madsen","user":"andreasmadsen","type":"user"},{"_id":"640112d28ba76abe4b709fb5","avatarUrl":"/avatars/eea53a7327da1fd0689bbe1bd62786b2.svg","isPro":false,"fullname":"evaportelance","user":"evaportelance","type":"user"},{"_id":"6555125a4f361968f0e3aad7","avatarUrl":"/avatars/e7692d82804338f21ecdc6e731f5c5ea.svg","isPro":false,"fullname":"marinaretikof","user":"marinaretik","type":"user"},{"_id":"5fa9ff3ea13e063b8b2b60cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1633380224986-5fa9ff3ea13e063b8b2b60cb.jpeg","isPro":false,"fullname":"Xing Han Lù","user":"xhluca","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">
Papers
arxiv:2402.05930

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Published on Feb 8, 2024
· Submitted by
AK
on Feb 9, 2024
#2 Paper of the day

Abstract

A benchmark and retrieval-inspired model for efficient conversational web navigation by digital agents across diverse real-world websites is introduced, highlighting challenges in generalization to unseen sites.

AI-generated summary

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

This comment has been hidden

How to fetch google website

Sign up or log in to comment

Models citing this paper 11

Browse 11 models citing this paper

Datasets citing this paper 4

Spaces citing this paper 6

Collections including this paper 11