https://twitter.com/hashtag/shorthebrewpapereviews?src=hashtag_click. So far I've shortly reviewed about deep learning papers. You are invited to follow and comment

This paper review can be found at: https://twitter.com/MikeE_3_14/status/1675088525237051394?s=20

\n","updatedAt":"2023-07-01T10:39:56.739Z","author":{"_id":"63923905a83719c404cc5961","avatarUrl":"/avatars/143db6ac87f4a7ff9de9976481f59d46.svg","fullname":"Mike Erlihson","name":"mikeerl","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8232403993606567},"editors":["mikeerl"],"editorAvatarUrls":["/avatars/143db6ac87f4a7ff9de9976481f59d46.svg"],"reactions":[],"isReport":false}},{"id":"64a3a1340a4ffdd34eff6d24","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2023-07-04T04:33:56.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"There are actually a lot of large language vision models out there already.","html":"

There are actually a lot of large language vision models out there already.

\n","updatedAt":"2023-07-04T04:33:56.428Z","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9745213985443115},"editors":["huggingfeet2019"],"editorAvatarUrls":["/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg"],"reactions":[],"isReport":false}},{"id":"64a3a1a7565496b629861df9","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2023-07-04T04:35:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"NVM this is actually pretty novel.","html":"

NVM this is actually pretty novel.

\n","updatedAt":"2023-07-04T04:35:51.349Z","author":{"_id":"639f63899f1f2baab2f5a902","avatarUrl":"/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg","fullname":"Ken","name":"huggingfeet2019","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9729006290435791},"editors":["huggingfeet2019"],"editorAvatarUrls":["/avatars/8c5ea8ceb76a127e5f491c890adf65d5.svg"],"reactions":[{"reaction":"👍","users":["will33am"],"count":1}],"isReport":false}},{"id":"65d5e6cf5dd9785d778dfa6d","author":{"_id":"63fde9a227abbe6b3ce37288","avatarUrl":"/avatars/fc49f67c8478c61bbec843803b8e1079.svg","fullname":"nechba mohammed","name":"Nechba","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-02-21T12:04:31.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"\n","html":"","updatedAt":"2024-02-21T12:06:15.049Z","author":{"_id":"63fde9a227abbe6b3ce37288","avatarUrl":"/avatars/fc49f67c8478c61bbec843803b8e1079.svg","fullname":"nechba mohammed","name":"Nechba","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.5749505758285522},"editors":["Nechba"],"editorAvatarUrls":["/avatars/fc49f67c8478c61bbec843803b8e1079.svg"],"reactions":[],"isReport":false}},{"id":"6665582518f429c9262c70b0","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false},"createdAt":"2024-06-09T07:22:13.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# LENS: The Future of Computer Vision with Language Models!\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/H1fAMvvn9qkAU3f-bvEOH.mp4 \n\n## Links 🔗:\n👉 Subscribe: https://www.youtube.com/@Arxflix\n👉 Twitter: https://x.com/arxflix\n👉 LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

\n\t\n\t\t\n\t\n\t\n\t\tLENS: The Future of Computer Vision with Language Models!\n\t\n

\n\n

\n\t\n\t\t\n\t\n\t\n\t\tLinks 🔗:\n\t\n

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
$\"9t4iCUHx_400x400-1.jpg\"$

\n","updatedAt":"2024-06-09T07:22:13.286Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5250943899154663},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2306.16410","authors":[{"_id":"649ce6f33e5ad160389de1a5","user":{"_id":"62503f03815d0fd28f847c19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1656275265222-62503f03815d0fd28f847c19.jpeg","isPro":false,"fullname":"William Berrios","user":"will33am","type":"user"},"name":"William Berrios","status":"admin_assigned","statusLastChangedAt":"2023-06-29T11:17:07.044Z","hidden":false},{"_id":"649ce6f33e5ad160389de1a6","user":{"_id":"6311b43205cc08a1408d5a78","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6311b43205cc08a1408d5a78/PMGdqvQ9H8LWnPc3L-KPp.jpeg","isPro":false,"fullname":"Gautam Mittal","user":"gbm","type":"user"},"name":"Gautam Mittal","status":"admin_assigned","statusLastChangedAt":"2023-06-29T11:17:26.887Z","hidden":false},{"_id":"649ce6f33e5ad160389de1a7","user":{"_id":"61e9e3d4e2a95338e04c9f33","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e9e3d4e2a95338e04c9f33/c3Pfr2LrD5Dbf0eFkScP6.png","isPro":false,"fullname":"Tristan Thrush","user":"Tristan","type":"user"},"name":"Tristan Thrush","status":"admin_assigned","statusLastChangedAt":"2023-06-29T11:17:48.272Z","hidden":false},{"_id":"649ce6f33e5ad160389de1a8","user":{"_id":"61dc997715b47073db1620dc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1641847245435-61dc997715b47073db1620dc.jpeg","isPro":false,"fullname":"Douwe Kiela","user":"douwekiela","type":"user"},"name":"Douwe Kiela","status":"admin_assigned","statusLastChangedAt":"2023-06-29T11:18:08.050Z","hidden":false},{"_id":"649ce6f33e5ad160389de1a9","user":{"_id":"6230c6ecfd8b720a5648f6c4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1652253065548-6230c6ecfd8b720a5648f6c4.jpeg","isPro":false,"fullname":"Amanpreet Singh","user":"aps","type":"user"},"name":"Amanpreet Singh","status":"claimed_verified","statusLastChangedAt":"2024-02-19T09:36:07.606Z","hidden":false}],"publishedAt":"2023-06-28T17:57:10.000Z","submittedOnDailyAt":"2023-06-29T00:35:44.187Z","title":"Towards Language Models That Can See: Computer Vision Through the LENS\n of Natural Language","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We propose LENS, a modular approach for tackling computer vision problems by\nleveraging the power of large language models (LLMs). Our system uses a\nlanguage model to reason over outputs from a set of independent and highly\ndescriptive vision modules that provide exhaustive information about an image.\nWe evaluate the approach on pure computer vision settings such as zero- and\nfew-shot object recognition, as well as on vision and language problems. LENS\ncan be applied to any off-the-shelf LLM and we find that the LLMs with LENS\nperform highly competitively with much bigger and much more sophisticated\nsystems, without any multimodal training whatsoever. We open-source our code at\nhttps://github.com/ContextualAI/lens and provide an interactive demo.","upvotes":29,"discussionId":"649ce6f83e5ad160389de20a","githubRepo":"https://github.com/contextualai/lens","githubRepoAddedBy":"auto","ai_summary":"LENS uses language models to reason over outputs from vision modules, achieving competitive performance in vision and vision-language tasks without multimodal training.","ai_keywords":["LENS","large language models (LLMs)","vision modules","zero-shot object recognition","few-shot object recognition","multimodal training"],"githubStars":356},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"643cff85b101ac80d583c592","avatarUrl":"/avatars/bc22f67e88d2f1797da656eded704992.svg","isPro":false,"fullname":"Cho","user":"1980Dragon","type":"user"},{"_id":"63d4c8ce13ae45b780792f32","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d4c8ce13ae45b780792f32/QasegimoxBqfZwDzorukz.png","isPro":false,"fullname":"Ohenenoo","user":"PeepDaSlan9","type":"user"},{"_id":"634047a27158d417a0f7164f","avatarUrl":"/avatars/e61443eccba0c00119a11a7d391a6fdf.svg","isPro":false,"fullname":"Anonymous Guy","user":"anonymousguyt","type":"user"},{"_id":"62503f03815d0fd28f847c19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1656275265222-62503f03815d0fd28f847c19.jpeg","isPro":false,"fullname":"William Berrios","user":"will33am","type":"user"},{"_id":"634afd43dcf125e4dafeb7e4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/634afd43dcf125e4dafeb7e4/HkVWKYA0gCrwgYCHJ8nOg.jpeg","isPro":false,"fullname":"Ash Mari Walker","user":"AshMariWalker","type":"user"},{"_id":"6463a54ad2044cd1d7c8b6ea","avatarUrl":"/avatars/b36961718e54e31ba0b902eedc13ba28.svg","isPro":false,"fullname":"Peter Unger","user":"peterunger","type":"user"},{"_id":"63de2dc3e742e86dc91961a4","avatarUrl":"/avatars/0715c6aa84b2cf277796ef2681d70e46.svg","isPro":false,"fullname":"Basil Lennox ","user":"lennoxbasil","type":"user"},{"_id":"5ff0111581db4de1746a7514","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1672637245812-5ff0111581db4de1746a7514.png","isPro":false,"fullname":"Loy Jun Cheng","user":"poipii","type":"user"},{"_id":"64219cdd918f0fd889f0076e","avatarUrl":"/avatars/5b74c10d221240e679e49de81ec9fe9a.svg","isPro":false,"fullname":"Gautier Evennou","user":"Gevennou","type":"user"},{"_id":"63d180bab30415240fd54752","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d180bab30415240fd54752/NubwOS3K4y8haplY-MDwh.jpeg","isPro":false,"fullname":"Jefferson Araujo","user":"jeffaraujo","type":"user"},{"_id":"646a8bc13eb2bab0419c291e","avatarUrl":"/avatars/222952bcc5a3b01e5aa92fc6a50471c4.svg","isPro":false,"fullname":"Aman Sahani ","user":"amansahani","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2306.16410

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Published on Jun 28, 2023

· Submitted by

AK on Jun 29, 2023

Upvote

Authors:

William Berrios ,

Gautam Mittal ,

Tristan Thrush ,

Douwe Kiela ,

Amanpreet Singh

Abstract

LENS uses language models to reason over outputs from vision modules, achieving competitive performance in vision and vision-language tasks without multimodal training.

AI-generated summary

We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recognition, as well as on vision and language problems. LENS can be applied to any off-the-shelf LLM and we find that the LLMs with LENS perform highly competitively with much bigger and much more sophisticated systems, without any multimodal training whatsoever. We open-source our code at https://github.com/ContextualAI/lens and provide an interactive demo.

View arXiv page View PDF GitHub 356 auto Add to collection

Community

mikeerl

Jul 1, 2023

Hey, Im reviewing deep learning papers on twitter daily in Hebrew via hashtag #https://twitter.com/hashtag/shorthebrewpapereviews?src=hashtag_click. So far I've shortly reviewed about deep learning papers. You are invited to follow and comment

This paper review can be found at: https://twitter.com/MikeE_3_14/status/1675088525237051394?s=20