Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
[go: Go Back, main page]

https://twitter.com/hashtag/shorthebrewpapereviews?src=hashtag_click. So far I've shortly reviewed about deep learning papers. You are invited to follow and comment

\n

This paper review can be found at: https://twitter.com/MikeE_3_14/status/1675088525237051394?s=20

\n","updatedAt":"2023-07-01T10:39:56.739Z","author":{"_id":"63923905a83719c404cc5961","avatarUrl":"/avatars/143db6ac87f4a7ff9de9976481f59d46.svg","fullname":"Mike Erlihson","name":"mikeerl","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8232403993606567},"editors":["mikeerl"],"editorAvatarUrls":["/avatars/143db6ac87f4a7ff9de9976481f59d46.svg"],"reactions":[],"isReport":false}},{"id":"64a3a1340a4ffdd34eff6d24","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2023-07-04T04:33:56.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"There are actually a lot of large language vision models out there already.","html":"

There are actually a lot of large language vision models out there already.

\n","updatedAt":"2023-07-04T04:33:56.428Z","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9745213985443115},"editors":["huggingfeet2019"],"editorAvatarUrls":["/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg"],"reactions":[],"isReport":false}},{"id":"64a3a1a7565496b629861df9","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2023-07-04T04:35:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"NVM this is actually pretty novel.","html":"

NVM this is actually pretty novel.

\n","updatedAt":"2023-07-04T04:35:51.349Z","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9729006290435791},"editors":["huggingfeet2019"],"editorAvatarUrls":["/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg"],"reactions":[{"reaction":"๐Ÿ‘","users":["will33am"],"count":1}],"isReport":false}},{"id":"65d5e6cf5dd9785d778dfa6d","author":{"_id":"63fde9a227abbe6b3ce37288","avatarUrl":"/avatars/fc49f67c8478c61bbec843803b8e1079.svg","fullname":"nechba mohammed","name":"Nechba","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-02-21T12:04:31.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"\n","html":"","updatedAt":"2024-02-21T12:06:15.049Z","author":{"_id":"63fde9a227abbe6b3ce37288","avatarUrl":"/avatars/fc49f67c8478c61bbec843803b8e1079.svg","fullname":"nechba mohammed","name":"Nechba","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.5749505758285522},"editors":["Nechba"],"editorAvatarUrls":["/avatars/fc49f67c8478c61bbec843803b8e1079.svg"],"reactions":[],"isReport":false}},{"id":"6665582518f429c9262c70b0","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false},"createdAt":"2024-06-09T07:22:13.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# LENS: The Future of Computer Vision with Language Models!\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/H1fAMvvn9qkAU3f-bvEOH.mp4 \n\n## Links ๐Ÿ”—:\n๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix\n๐Ÿ‘‰ Twitter: https://x.com/arxflix\n๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

\n\t\n\t\t\n\t\n\t\n\t\tLENS: The Future of Computer Vision with Language Models!\n\t\n

\n

\n\n

\n\t\n\t\t\n\t\n\t\n\t\tLinks ๐Ÿ”—:\n\t\n

\n

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

\n

By Arxflix
\"9t4iCUHx_400x400-1.jpg\"

\n","updatedAt":"2024-06-09T07:22:13.286Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5250943899154663},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2306.16410","authors":[{"_id":"649ce6f33e5ad160389de1a5","user":{"_id":"62503f03815d0fd28f847c19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1656275265222-62503f03815d0fd28f847c19.jpeg","isPro":false,"fullname":"William Berrios","user":"will33am","type":"user"},"name":"William Berrios","status":"admin_assigned","statusLastChangedAt":"2023-06-29T11:17:07.044Z","hidden":false},{"_id":"649ce6f33e5ad160389de1a6","user":{"_id":"6311b43205cc08a1408d5a78","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6311b43205cc08a1408d5a78/PMGdqvQ9H8LWnPc3L-KPp.jpeg","isPro":false,"fullname":"Gautam Mittal","user":"gbm","type":"user"},"name":"Gautam Mittal","status":"admin_assigned","statusLastChangedAt":"2023-06-29T11:17:26.887Z","hidden":false},{"_id":"649ce6f33e5ad160389de1a7","user":{"_id":"61e9e3d4e2a95338e04c9f33","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e9e3d4e2a95338e04c9f33/c3Pfr2LrD5Dbf0eFkScP6.png","isPro":false,"fullname":"Tristan Thrush","user":"Tristan","type":"user"},"name":"Tristan Thrush","status":"admin_assigned","statusLastChangedAt":"2023-06-29T11:17:48.272Z","hidden":false},{"_id":"649ce6f33e5ad160389de1a8","user":{"_id":"61dc997715b47073db1620dc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1641847245435-61dc997715b47073db1620dc.jpeg","isPro":false,"fullname":"Douwe Kiela","user":"douwekiela","type":"user"},"name":"Douwe Kiela","status":"admin_assigned","statusLastChangedAt":"2023-06-29T11:18:08.050Z","hidden":false},{"_id":"649ce6f33e5ad160389de1a9","user":{"_id":"6230c6ecfd8b720a5648f6c4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1652253065548-6230c6ecfd8b720a5648f6c4.jpeg","isPro":false,"fullname":"Amanpreet Singh","user":"aps","type":"user"},"name":"Amanpreet Singh","status":"claimed_verified","statusLastChangedAt":"2024-02-19T09:36:07.606Z","hidden":false}],"publishedAt":"2023-06-28T17:57:10.000Z","submittedOnDailyAt":"2023-06-29T00:35:44.187Z","title":"Towards Language Models That Can See: Computer Vision Through the LENS\n of Natural Language","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We propose LENS, a modular approach for tackling computer vision problems by\nleveraging the power of large language models (LLMs). Our system uses a\nlanguage model to reason over outputs from a set of independent and highly\ndescriptive vision modules that provide exhaustive information about an image.\nWe evaluate the approach on pure computer vision settings such as zero- and\nfew-shot object recognition, as well as on vision and language problems. LENS\ncan be applied to any off-the-shelf LLM and we find that the LLMs with LENS\nperform highly competitively with much bigger and much more sophisticated\nsystems, without any multimodal training whatsoever. We open-source our code at\nhttps://github.com/ContextualAI/lens and provide an interactive demo.","upvotes":29,"discussionId":"649ce6f83e5ad160389de20a","githubRepo":"https://github.com/contextualai/lens","githubRepoAddedBy":"auto","ai_summary":"LENS uses language models to reason over outputs from vision modules, achieving competitive performance in vision and vision-language tasks without multimodal training.","ai_keywords":["LENS","large language models (LLMs)","vision modules","zero-shot object recognition","few-shot object recognition","multimodal training"],"githubStars":356},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"643cff85b101ac80d583c592","avatarUrl":"/avatars/bc22f67e88d2f1797da656eded704992.svg","isPro":false,"fullname":"Cho","user":"1980Dragon","type":"user"},{"_id":"63d4c8ce13ae45b780792f32","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d4c8ce13ae45b780792f32/QasegimoxBqfZwDzorukz.png","isPro":false,"fullname":"Ohenenoo","user":"PeepDaSlan9","type":"user"},{"_id":"634047a27158d417a0f7164f","avatarUrl":"/avatars/e61443eccba0c00119a11a7d391a6fdf.svg","isPro":false,"fullname":"Anonymous Guy","user":"anonymousguyt","type":"user"},{"_id":"62503f03815d0fd28f847c19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1656275265222-62503f03815d0fd28f847c19.jpeg","isPro":false,"fullname":"William Berrios","user":"will33am","type":"user"},{"_id":"634afd43dcf125e4dafeb7e4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/634afd43dcf125e4dafeb7e4/HkVWKYA0gCrwgYCHJ8nOg.jpeg","isPro":false,"fullname":"Ash Mari Walker","user":"AshMariWalker","type":"user"},{"_id":"6463a54ad2044cd1d7c8b6ea","avatarUrl":"/avatars/b36961718e54e31ba0b902eedc13ba28.svg","isPro":false,"fullname":"Peter Unger","user":"peterunger","type":"user"},{"_id":"63de2dc3e742e86dc91961a4","avatarUrl":"/avatars/0715c6aa84b2cf277796ef2681d70e46.svg","isPro":false,"fullname":"Basil Lennox ","user":"lennoxbasil","type":"user"},{"_id":"5ff0111581db4de1746a7514","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1672637245812-5ff0111581db4de1746a7514.png","isPro":false,"fullname":"Loy Jun Cheng","user":"poipii","type":"user"},{"_id":"64219cdd918f0fd889f0076e","avatarUrl":"/avatars/5b74c10d221240e679e49de81ec9fe9a.svg","isPro":false,"fullname":"Gautier Evennou","user":"Gevennou","type":"user"},{"_id":"63d180bab30415240fd54752","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d180bab30415240fd54752/NubwOS3K4y8haplY-MDwh.jpeg","isPro":false,"fullname":"Jefferson Araujo","user":"jeffaraujo","type":"user"},{"_id":"646a8bc13eb2bab0419c291e","avatarUrl":"/avatars/222952bcc5a3b01e5aa92fc6a50471c4.svg","isPro":false,"fullname":"Aman Sahani ","user":"amansahani","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2306.16410

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Published on Jun 28, 2023
ยท Submitted by
AK
on Jun 29, 2023

Abstract

LENS uses language models to reason over outputs from vision modules, achieving competitive performance in vision and vision-language tasks without multimodal training.

AI-generated summary

We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recognition, as well as on vision and language problems. LENS can be applied to any off-the-shelf LLM and we find that the LLMs with LENS perform highly competitively with much bigger and much more sophisticated systems, without any multimodal training whatsoever. We open-source our code at https://github.com/ContextualAI/lens and provide an interactive demo.

Community

Hey, Im reviewing deep learning papers on twitter daily in Hebrew via hashtag #https://twitter.com/hashtag/shorthebrewpapereviews?src=hashtag_click. So far I've shortly reviewed about deep learning papers. You are invited to follow and comment

This paper review can be found at: https://twitter.com/MikeE_3_14/status/1675088525237051394?s=20

There are actually a lot of large language vision models out there already.

NVM this is actually pretty novel.

No description provided.

LENS: The Future of Computer Vision with Language Models!

Links ๐Ÿ”—:

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2306.16410 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2306.16410 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2306.16410 in a Space README.md to link it from this page.

Collections including this paper 1