Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!
https://huggingface.co/IntelLabs/LlavaOLMoBitnet1B\n","updatedAt":"2024-08-27T03:21:24.432Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9179,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6736628413200378},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"66ce7e70e06625ccdaf3e6ab","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-08-28T01:33:36.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [CROME: Cross-Modal Adapters for Efficient Multimodal LLM](https://huggingface.co/papers/2408.06610) (2024)\n* [LLAVADI: What Matters For Multimodal Large Language Models Distillation](https://huggingface.co/papers/2407.19409) (2024)\n* [SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs](https://huggingface.co/papers/2408.11813) (2024)\n* [IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities](https://huggingface.co/papers/2408.12902) (2024)\n* [Instruction Tuning-free Visual Token Complement for Multimodal LLMs](https://huggingface.co/papers/2408.05019) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-08-28T01:33:36.108Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.73140949010849},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2408.13402","authors":[{"_id":"66cd46147ab136deb3862f5e","user":{"_id":"615f72d3205e10d5c3903df3","avatarUrl":"/avatars/c85352ad1b858fd6c1629ddea06d0dea.svg","isPro":false,"fullname":"Naveen Sp","user":"naveensp","type":"user"},"name":"Jainaveen Sundaram","status":"claimed_verified","statusLastChangedAt":"2024-08-27T20:42:26.586Z","hidden":false},{"_id":"66cd46147ab136deb3862f5f","name":"Ravishankar Iyer","hidden":false}],"publishedAt":"2024-08-23T23:00:19.000Z","submittedOnDailyAt":"2024-08-27T01:51:24.426Z","title":"LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Multimodal Large Language Models (MM-LLMs) have seen significant advancements\nin the last year, demonstrating impressive performance across tasks. However,\nto truly democratize AI, models must exhibit strong capabilities and be able to\nrun efficiently on small compute footprints accessible by most. Part of this\nquest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM\ncapable of accepting Image(s)+Text inputs to produce coherent textual\nresponses. The model is fully open-sourced along with training scripts to\nencourage further research in this space. This accompanying technical report\nhighlights the training process, evaluation details, challenges associated with\nternary models and future opportunities. Link to the model:\nhttps://huggingface.co/IntelLabs/LlavaOLMoBitnet1B","upvotes":18,"discussionId":"66cd46157ab136deb3862fa2","ai_summary":"LLaVaOLMoBitnet1B is the first open-source ternary multimodal LLM that processes images and text to generate coherent responses, designed to run efficiently on small computational resources.","ai_keywords":["Multimodal Large Language Models","MM-LLMs","Ternary Multimodal LLM"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6581f9514adaee05cf640f81","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6581f9514adaee05cf640f81/sXvEEraq2QlSIyWHlSmpa.jpeg","isPro":false,"fullname":"Xi","user":"xi0v","type":"user"},{"_id":"651d99ba496d5de9ddd710cc","avatarUrl":"/avatars/2b11d2af740ad41984282cb9026e1e74.svg","isPro":false,"fullname":"Deep Thought","user":"DeeepThought","type":"user"},{"_id":"62e33241e5431c5d1ad3a6f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62e33241e5431c5d1ad3a6f0/965ETvBJsVpA8Zh0sux2o.png","isPro":false,"fullname":"Barton Rhodes","user":"bmorphism","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"65decc75beffeb39ba679eba","avatarUrl":"/avatars/735b678bd5863a0c1b1bdd3bbf8858fa.svg","isPro":true,"fullname":"r","user":"oceansweep","type":"user"},{"_id":"6659d5125b8b2aa9b70b5f8c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/8UrSA9nBSMoTnZODQUcYj.jpeg","isPro":false,"fullname":"Paul Gramma","user":"Paulgramma","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"6484c61300f3d63d6c3d6b27","avatarUrl":"/avatars/ce9b99882a65fd2cb983ba71a5ac2473.svg","isPro":false,"fullname":"Aryan V S","user":"a-r-r-o-w","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
LLaVaOLMoBitnet1B is the first open-source ternary multimodal LLM that processes images and text to generate coherent responses, designed to run efficiently on small computational resources.
AI-generated summary
Multimodal Large Language Models (MM-LLMs) have seen significant advancements
in the last year, demonstrating impressive performance across tasks. However,
to truly democratize AI, models must exhibit strong capabilities and be able to
run efficiently on small compute footprints accessible by most. Part of this
quest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM
capable of accepting Image(s)+Text inputs to produce coherent textual
responses. The model is fully open-sourced along with training scripts to
encourage further research in this space. This accompanying technical report
highlights the training process, evaluation details, challenges associated with
ternary models and future opportunities. Link to the model:
https://huggingface.co/IntelLabs/LlavaOLMoBitnet1B