This paper review can be found at: https://twitter.com/MikeE_3_14/status/1675088525237051394?s=20
\n","updatedAt":"2023-07-01T10:39:56.739Z","author":{"_id":"63923905a83719c404cc5961","avatarUrl":"/avatars/143db6ac87f4a7ff9de9976481f59d46.svg","fullname":"Mike Erlihson","name":"mikeerl","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8232403993606567},"editors":["mikeerl"],"editorAvatarUrls":["/avatars/143db6ac87f4a7ff9de9976481f59d46.svg"],"reactions":[],"isReport":false}},{"id":"64a3a1340a4ffdd34eff6d24","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2023-07-04T04:33:56.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"There are actually a lot of large language vision models out there already.","html":"There are actually a lot of large language vision models out there already.
\n","updatedAt":"2023-07-04T04:33:56.428Z","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9745213985443115},"editors":["huggingfeet2019"],"editorAvatarUrls":["/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg"],"reactions":[],"isReport":false}},{"id":"64a3a1a7565496b629861df9","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2023-07-04T04:35:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"NVM this is actually pretty novel.","html":"NVM this is actually pretty novel.
\n","updatedAt":"2023-07-04T04:35:51.349Z","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9729006290435791},"editors":["huggingfeet2019"],"editorAvatarUrls":["/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg"],"reactions":[{"reaction":"๐","users":["will33am"],"count":1}],"isReport":false}},{"id":"65d5e6cf5dd9785d778dfa6d","author":{"_id":"63fde9a227abbe6b3ce37288","avatarUrl":"/avatars/fc49f67c8478c61bbec843803b8e1079.svg","fullname":"nechba mohammed","name":"Nechba","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-02-21T12:04:31.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"\n","html":"","updatedAt":"2024-02-21T12:06:15.049Z","author":{"_id":"63fde9a227abbe6b3ce37288","avatarUrl":"/avatars/fc49f67c8478c61bbec843803b8e1079.svg","fullname":"nechba mohammed","name":"Nechba","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.5749505758285522},"editors":["Nechba"],"editorAvatarUrls":["/avatars/fc49f67c8478c61bbec843803b8e1079.svg"],"reactions":[],"isReport":false}},{"id":"6665582518f429c9262c70b0","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false},"createdAt":"2024-06-09T07:22:13.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# LENS: The Future of Computer Vision with Language Models!\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/H1fAMvvn9qkAU3f-bvEOH.mp4 \n\n## Links ๐:\n๐ Subscribe: https://www.youtube.com/@Arxflix\n๐ Twitter: https://x.com/arxflix\n๐ LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n","html":"\n\t\n\t\t\n\t\n\t\n\t\tLENS: The Future of Computer Vision with Language Models!\n\t\n
\n \n\n\n\t\n\t\t\n\t\n\t\n\t\tLinks ๐:\n\t\n
\n๐ Subscribe: https://www.youtube.com/@Arxflix
๐ Twitter: https://x.com/arxflix
๐ LMNT (Partner): https://lmnt.com/
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Abstract
LENS uses language models to reason over outputs from vision modules, achieving competitive performance in vision and vision-language tasks without multimodal training.
We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recognition, as well as on vision and language problems. LENS can be applied to any off-the-shelf LLM and we find that the LLMs with LENS perform highly competitively with much bigger and much more sophisticated systems, without any multimodal training whatsoever. We open-source our code at https://github.com/ContextualAI/lens and provide an interactive demo.
Community
Hey, Im reviewing deep learning papers on twitter daily in Hebrew via hashtag #https://twitter.com/hashtag/shorthebrewpapereviews?src=hashtag_click. So far I've shortly reviewed about deep learning papers. You are invited to follow and comment
This paper review can be found at: https://twitter.com/MikeE_3_14/status/1675088525237051394?s=20
There are actually a lot of large language vision models out there already.
NVM this is actually pretty novel.
LENS: The Future of Computer Vision with Language Models!
Links ๐:
๐ Subscribe: https://www.youtube.com/@Arxflix
๐ Twitter: https://x.com/arxflix
๐ LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper