Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - LLaVA-Critic: Learning to Evaluate Multimodal Models
[go: Go Back, main page]

https://llava-vl.github.io/blog/2024-10-03-llava-critic/

\n","updatedAt":"2024-10-04T02:25:11.527Z","author":{"_id":"64c039128e2612254356bba5","avatarUrl":"/avatars/06cc76feebba0cc80ebb8f4ff86f6d9b.svg","fullname":"Quanquan Gu","name":"thughost","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":26,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.254848450422287},"editors":["thughost"],"editorAvatarUrls":["/avatars/06cc76feebba0cc80ebb8f4ff86f6d9b.svg"],"reactions":[{"reaction":"🔥","users":["AdinaY","cataluna84","russwang"],"count":3}],"isReport":false}},{"id":"66ffaa8b2eced37b1c82cf2b","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false},"createdAt":"2024-10-04T08:42:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Interesting paper! Thanks for sharing @thughost. ","html":"

Interesting paper! Thanks for sharing \n\n@thughost\n\t.

\n","updatedAt":"2024-10-04T08:42:51.886Z","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7883646488189697},"editors":["AdinaY"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg"],"reactions":[],"isReport":false}},{"id":"670097a53c2e742ab8ea495b","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-10-05T01:34:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!](https://huggingface.co/papers/2408.13402) (2024)\n* [A Survey on Multimodal Benchmarks: In the Era of Large AI Models](https://huggingface.co/papers/2409.18142) (2024)\n* [xGen-MM (BLIP-3): A Family of Open Large Multimodal Models](https://huggingface.co/papers/2408.08872) (2024)\n* [LLaVA-OneVision: Easy Visual Task Transfer](https://huggingface.co/papers/2408.03326) (2024)\n* [Visual Question Decomposition on Multimodal Large Language Models](https://huggingface.co/papers/2409.19339) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-10-05T01:34:29.228Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2410.02712","authors":[{"_id":"66ff51ed9e1143bff207d587","user":{"_id":"6570977f87a92b76922c9950","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6570977f87a92b76922c9950/AQGto1w6ugBvH2yCV46YU.jpeg","isPro":false,"fullname":"Tianyi Xiong","user":"txiong23","type":"user"},"name":"Tianyi Xiong","status":"admin_assigned","statusLastChangedAt":"2024-10-04T08:41:36.661Z","hidden":false},{"_id":"66ff51ed9e1143bff207d588","user":{"_id":"655fed9fdef5905d38b84af3","avatarUrl":"/avatars/2cda4182dfd11a1e94743639e62328ea.svg","isPro":false,"fullname":"Xiyao Wang","user":"russwang","type":"user"},"name":"Xiyao Wang","status":"admin_assigned","statusLastChangedAt":"2024-10-04T08:44:04.812Z","hidden":false},{"_id":"66ff51ed9e1143bff207d589","name":"Dong Guo","hidden":false},{"_id":"66ff51ed9e1143bff207d58a","user":{"_id":"64530fc01a57e1179c1fe4c0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/0lncTpIHXn6suB0p-oSma.jpeg","isPro":false,"fullname":"QinghaoYe","user":"MAGAer13","type":"user"},"name":"Qinghao Ye","status":"claimed_verified","statusLastChangedAt":"2025-05-22T07:19:34.463Z","hidden":false},{"_id":"66ff51ed9e1143bff207d58b","name":"Haoqi Fan","hidden":false},{"_id":"66ff51ed9e1143bff207d58c","user":{"_id":"64c039128e2612254356bba5","avatarUrl":"/avatars/06cc76feebba0cc80ebb8f4ff86f6d9b.svg","isPro":false,"fullname":"Quanquan Gu","user":"thughost","type":"user"},"name":"Quanquan Gu","status":"admin_assigned","statusLastChangedAt":"2024-10-04T08:41:11.820Z","hidden":false},{"_id":"66ff51ed9e1143bff207d58d","user":{"_id":"6527bc6b34bf5ece73da426d","avatarUrl":"/avatars/120739a9ac84e7319d9ea157a63dc547.svg","isPro":false,"fullname":"henghuang","user":"henghuang","type":"user"},"name":"Heng Huang","status":"admin_assigned","statusLastChangedAt":"2024-10-04T08:43:29.377Z","hidden":false},{"_id":"66ff51ed9e1143bff207d58e","user":{"_id":"62aba526cae4462c0c6caa0f","avatarUrl":"/avatars/430560ec2c2547f819225769ab432f30.svg","isPro":false,"fullname":"Chunyuan Li","user":"Chunyuan24","type":"user"},"name":"Chunyuan Li","status":"admin_assigned","statusLastChangedAt":"2024-10-04T08:43:01.990Z","hidden":false}],"publishedAt":"2024-10-03T17:36:33.000Z","submittedOnDailyAt":"2024-10-04T00:55:11.521Z","title":"LLaVA-Critic: Learning to Evaluate Multimodal Models","submittedOnDailyBy":{"_id":"64c039128e2612254356bba5","avatarUrl":"/avatars/06cc76feebba0cc80ebb8f4ff86f6d9b.svg","isPro":false,"fullname":"Quanquan Gu","user":"thughost","type":"user"},"summary":"We introduce LLaVA-Critic, the first open-source large multimodal model (LMM)\ndesigned as a generalist evaluator to assess performance across a wide range of\nmultimodal tasks. LLaVA-Critic is trained using a high-quality critic\ninstruction-following dataset that incorporates diverse evaluation criteria and\nscenarios. Our experiments demonstrate the model's effectiveness in two key\nareas: (1) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation\nscores, performing on par with or surpassing GPT models on multiple evaluation\nbenchmarks; and (2) Preference Learning, where it generates reward signals for\npreference learning, enhancing model alignment capabilities. This work\nunderscores the potential of open-source LMMs in self-critique and evaluation,\nsetting the stage for future research into scalable, superhuman alignment\nfeedback mechanisms for LMMs.","upvotes":37,"discussionId":"66ff51ee9e1143bff207d5d8","ai_summary":"LLaVA-Critic, an open-source large multimodal model, effectively evaluates multimodal tasks and provides reliable scores, surpassing GPT models, and enhances preference learning for model alignment.","ai_keywords":["large multimodal model","LMM","instruction-following dataset","evaluation criteria","LMM-as-a-Judge","GPT models","evaluation benchmarks","Preference Learning","reward signals","model alignment","superhuman alignment feedback mechanisms"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64c039128e2612254356bba5","avatarUrl":"/avatars/06cc76feebba0cc80ebb8f4ff86f6d9b.svg","isPro":false,"fullname":"Quanquan Gu","user":"thughost","type":"user"},{"_id":"62a993d80472c0b7f94027df","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62a993d80472c0b7f94027df/j5vp-IwLA2YBexylUHiQU.png","isPro":false,"fullname":"Zhang Yuanhan","user":"ZhangYuanhan","type":"user"},{"_id":"647bf082aba7062fe5c51ca9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/647bf082aba7062fe5c51ca9/VvKAhQC_LxBcBuy3XROSX.jpeg","isPro":false,"fullname":"Yifan Zhang","user":"yifAI","type":"user"},{"_id":"655fed9fdef5905d38b84af3","avatarUrl":"/avatars/2cda4182dfd11a1e94743639e62328ea.svg","isPro":false,"fullname":"Xiyao Wang","user":"russwang","type":"user"},{"_id":"6570977f87a92b76922c9950","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6570977f87a92b76922c9950/AQGto1w6ugBvH2yCV46YU.jpeg","isPro":false,"fullname":"Tianyi Xiong","user":"txiong23","type":"user"},{"_id":"64b762568c632fbca942a405","avatarUrl":"/avatars/1eb737ec169967872f1ebf5ff29f1e6b.svg","isPro":false,"fullname":"Yinfei Yang","user":"yinfeiy","type":"user"},{"_id":"62aba526cae4462c0c6caa0f","avatarUrl":"/avatars/430560ec2c2547f819225769ab432f30.svg","isPro":false,"fullname":"Chunyuan Li","user":"Chunyuan24","type":"user"},{"_id":"63916d6c239695d2240858a1","avatarUrl":"/avatars/d58cab782c176024f59a602ba83aa0c7.svg","isPro":false,"fullname":"Dong Guo","user":"dguo-explore","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"668cd4bbe990292e5f6974d3","avatarUrl":"/avatars/d1747b2372e94500ecb5fb56809b482d.svg","isPro":false,"fullname":"Jinyeong Kim","user":"rubatoyeong","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2410.02712

LLaVA-Critic: Learning to Evaluate Multimodal Models

Published on Oct 3, 2024
· Submitted by
Quanquan Gu
on Oct 4, 2024
Authors:
,
,

Abstract

LLaVA-Critic, an open-source large multimodal model, effectively evaluates multimodal tasks and provides reliable scores, surpassing GPT models, and enhances preference learning for model alignment.

AI-generated summary

We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks. LLaVA-Critic is trained using a high-quality critic instruction-following dataset that incorporates diverse evaluation criteria and scenarios. Our experiments demonstrate the model's effectiveness in two key areas: (1) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation scores, performing on par with or surpassing GPT models on multiple evaluation benchmarks; and (2) Preference Learning, where it generates reward signals for preference learning, enhancing model alignment capabilities. This work underscores the potential of open-source LMMs in self-critique and evaluation, setting the stage for future research into scalable, superhuman alignment feedback mechanisms for LMMs.

Community

Paper author Paper submitter

Interesting paper! Thanks for sharing @thughost .

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 1

Spaces citing this paper 2

Collections including this paper 14