Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!
[go: Go Back, main page]

https://huggingface.co/IntelLabs/LlavaOLMoBitnet1B

\n","updatedAt":"2024-08-27T03:21:24.432Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9179,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6736628413200378},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"66ce7e70e06625ccdaf3e6ab","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-08-28T01:33:36.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [CROME: Cross-Modal Adapters for Efficient Multimodal LLM](https://huggingface.co/papers/2408.06610) (2024)\n* [LLAVADI: What Matters For Multimodal Large Language Models Distillation](https://huggingface.co/papers/2407.19409) (2024)\n* [SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs](https://huggingface.co/papers/2408.11813) (2024)\n* [IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities](https://huggingface.co/papers/2408.12902) (2024)\n* [Instruction Tuning-free Visual Token Complement for Multimodal LLMs](https://huggingface.co/papers/2408.05019) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-08-28T01:33:36.108Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.73140949010849},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2408.13402","authors":[{"_id":"66cd46147ab136deb3862f5e","user":{"_id":"615f72d3205e10d5c3903df3","avatarUrl":"/avatars/c85352ad1b858fd6c1629ddea06d0dea.svg","isPro":false,"fullname":"Naveen Sp","user":"naveensp","type":"user"},"name":"Jainaveen Sundaram","status":"claimed_verified","statusLastChangedAt":"2024-08-27T20:42:26.586Z","hidden":false},{"_id":"66cd46147ab136deb3862f5f","name":"Ravishankar Iyer","hidden":false}],"publishedAt":"2024-08-23T23:00:19.000Z","submittedOnDailyAt":"2024-08-27T01:51:24.426Z","title":"LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Multimodal Large Language Models (MM-LLMs) have seen significant advancements\nin the last year, demonstrating impressive performance across tasks. However,\nto truly democratize AI, models must exhibit strong capabilities and be able to\nrun efficiently on small compute footprints accessible by most. Part of this\nquest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM\ncapable of accepting Image(s)+Text inputs to produce coherent textual\nresponses. The model is fully open-sourced along with training scripts to\nencourage further research in this space. This accompanying technical report\nhighlights the training process, evaluation details, challenges associated with\nternary models and future opportunities. Link to the model:\nhttps://huggingface.co/IntelLabs/LlavaOLMoBitnet1B","upvotes":18,"discussionId":"66cd46157ab136deb3862fa2","ai_summary":"LLaVaOLMoBitnet1B is the first open-source ternary multimodal LLM that processes images and text to generate coherent responses, designed to run efficiently on small computational resources.","ai_keywords":["Multimodal Large Language Models","MM-LLMs","Ternary Multimodal LLM"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6581f9514adaee05cf640f81","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6581f9514adaee05cf640f81/sXvEEraq2QlSIyWHlSmpa.jpeg","isPro":false,"fullname":"Xi","user":"xi0v","type":"user"},{"_id":"651d99ba496d5de9ddd710cc","avatarUrl":"/avatars/2b11d2af740ad41984282cb9026e1e74.svg","isPro":false,"fullname":"Deep Thought","user":"DeeepThought","type":"user"},{"_id":"62e33241e5431c5d1ad3a6f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62e33241e5431c5d1ad3a6f0/965ETvBJsVpA8Zh0sux2o.png","isPro":false,"fullname":"Barton Rhodes","user":"bmorphism","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"65decc75beffeb39ba679eba","avatarUrl":"/avatars/735b678bd5863a0c1b1bdd3bbf8858fa.svg","isPro":true,"fullname":"r","user":"oceansweep","type":"user"},{"_id":"6659d5125b8b2aa9b70b5f8c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/8UrSA9nBSMoTnZODQUcYj.jpeg","isPro":false,"fullname":"Paul Gramma","user":"Paulgramma","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"6484c61300f3d63d6c3d6b27","avatarUrl":"/avatars/ce9b99882a65fd2cb983ba71a5ac2473.svg","isPro":false,"fullname":"Aryan V S","user":"a-r-r-o-w","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2408.13402

LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!

Published on Aug 23, 2024
· Submitted by
AK
on Aug 27, 2024
Authors:

Abstract

LLaVaOLMoBitnet1B is the first open-source ternary multimodal LLM that processes images and text to generate coherent responses, designed to run efficiently on small computational resources.

AI-generated summary

Multimodal Large Language Models (MM-LLMs) have seen significant advancements in the last year, demonstrating impressive performance across tasks. However, to truly democratize AI, models must exhibit strong capabilities and be able to run efficiently on small compute footprints accessible by most. Part of this quest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM capable of accepting Image(s)+Text inputs to produce coherent textual responses. The model is fully open-sourced along with training scripts to encourage further research in this space. This accompanying technical report highlights the training process, evaluation details, challenges associated with ternary models and future opportunities. Link to the model: https://huggingface.co/IntelLabs/LlavaOLMoBitnet1B

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.13402 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.13402 in a Space README.md to link it from this page.

Collections including this paper 2