Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-10-26T01:33:16.543Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7163602113723755},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2410.18976","authors":[{"_id":"671b3cc85f7cde5ac31f34cf","user":{"_id":"66d559103c5bc37ee0dfa61b","avatarUrl":"/avatars/8310fc0b01e6d8873aec37ba9ef27c5b.svg","isPro":false,"fullname":"SaraG","user":"SLMLAH","type":"user"},"name":"Sara Ghaboura","status":"claimed_verified","statusLastChangedAt":"2025-02-06T16:06:28.185Z","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d0","user":{"_id":"656864e12d73834278a8dea7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656864e12d73834278a8dea7/sfAWS2eyPtFHb_2GZIypp.jpeg","isPro":true,"fullname":"Ahmed Heakl","user":"ahmedheakl","type":"user"},"name":"Ahmed Heakl","status":"claimed_verified","statusLastChangedAt":"2024-10-25T09:28:34.074Z","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d1","user":{"_id":"64b7d2ad8c632fbca9507431","avatarUrl":"/avatars/76c31ea218108cf6c3715269f7605404.svg","isPro":false,"fullname":"Omkar Thawakar","user":"omkarthawakar","type":"user"},"name":"Omkar Thawakar","status":"claimed_verified","statusLastChangedAt":"2025-11-03T21:05:43.826Z","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d2","name":"Ali Alharthi","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d3","name":"Ines Riahi","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d4","name":"Abduljalil Saif","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d5","name":"Jorma Laaksonen","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d6","name":"Fahad S. Khan","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d7","name":"Salman Khan","hidden":false},{"_id":"671b3cc85f7cde5ac31f34d8","name":"Rao M. Anwer","hidden":false}],"publishedAt":"2024-10-24T17:59:38.000Z","submittedOnDailyAt":"2024-10-25T05:21:52.868Z","title":"CAMEL-Bench: A Comprehensive Arabic LMM Benchmark","submittedOnDailyBy":{"_id":"656864e12d73834278a8dea7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656864e12d73834278a8dea7/sfAWS2eyPtFHb_2GZIypp.jpeg","isPro":true,"fullname":"Ahmed Heakl","user":"ahmedheakl","type":"user"},"summary":"Recent years have witnessed a significant interest in developing large\nmultimodal models (LMMs) capable of performing various visual reasoning and\nunderstanding tasks. This has led to the introduction of multiple LMM\nbenchmarks to evaluate LMMs on different tasks. However, most existing LMM\nevaluation benchmarks are predominantly English-centric. In this work, we\ndevelop a comprehensive LMM evaluation benchmark for the Arabic language to\nrepresent a large population of over 400 million speakers. The proposed\nbenchmark, named CAMEL-Bench, comprises eight diverse domains and 38\nsub-domains including, multi-image understanding, complex visual perception,\nhandwritten document understanding, video understanding, medical imaging, plant\ndiseases, and remote sensing-based land use understanding to evaluate broad\nscenario generalizability. Our CAMEL-Bench comprises around 29,036 questions\nthat are filtered from a larger pool of samples, where the quality is manually\nverified by native speakers to ensure reliable model assessment. We conduct\nevaluations of both closed-source, including GPT-4 series, and open-source\nLMMs. Our analysis reveals the need for substantial improvement, especially\namong the best open-source models, with even the closed-source GPT-4o achieving\nan overall score of 62%. Our benchmark and evaluation scripts are open-sourced.","upvotes":13,"discussionId":"671b3ccc5f7cde5ac31f367e","githubRepo":"https://github.com/mbzuai-oryx/CAMEL-Bench","githubRepoAddedBy":"auto","ai_summary":"Researchers developed CAMEL-Bench, a comprehensive Arabic multimodal evaluation benchmark, to assess visual reasoning and understanding models across diverse domains and scenarios.","ai_keywords":["large multimodal models","LMM benchmarks","multi-image understanding","complex visual perception","handwritten document understanding","video understanding","medical imaging","plant diseases","remote sensing","generalizability","model assessment","closed-source models","open-source models","GPT-4 series"],"githubStars":36,"organization":{"_id":"61fb9e24dc607a42af5f193f","name":"MBZUAI","fullname":"Mohamed Bin Zayed University of Artificial Intelligence","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1643879908583-603ab5664a944b99e81476e8.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"656864e12d73834278a8dea7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656864e12d73834278a8dea7/sfAWS2eyPtFHb_2GZIypp.jpeg","isPro":true,"fullname":"Ahmed Heakl","user":"ahmedheakl","type":"user"},{"_id":"5f32b2367e583543386214d9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1635314457124-5f32b2367e583543386214d9.jpeg","isPro":false,"fullname":"Sergei Averkiev","user":"averoo","type":"user"},{"_id":"62676a94dacab364889bb36c","avatarUrl":"/avatars/0ead41b44957eb30564ea685ed22781a.svg","isPro":false,"fullname":"SARIM HASHMI","user":"Sarim-Hash","type":"user"},{"_id":"66c89ac6b5528a503be49876","avatarUrl":"/avatars/1e9b4b35a62af2205ff8f1deb1726b43.svg","isPro":false,"fullname":"Sarim Hashmi","user":"sarimhash","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/Ydjcjd4VuNUGj5Cd4QHdB.png","isPro":false,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"66a788dc34ae1d4c78c8cdbb","avatarUrl":"/avatars/01aeeff26873c4cfd977e8be1b4b04de.svg","isPro":false,"fullname":"Roba Al Majzoub","user":"musk007","type":"user"},{"_id":"64e567c9ddbefb63095a9662","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/F2BwrOU0XpzVI5nd-TL54.png","isPro":false,"fullname":"Bullard ","user":"Charletta1","type":"user"},{"_id":"63b2a92e18e5cf2cdd333492","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b2a92e18e5cf2cdd333492/GxnngJG0u7d0jYTEFOrfe.png","isPro":false,"fullname":"Jaehyun Jun","user":"btjhjeon","type":"user"},{"_id":"65262a396b41932089fd7bae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65262a396b41932089fd7bae/6YIEoAfJojuTW1UOKlwZT.png","isPro":false,"fullname":"Mukul Ranjan","user":"mukul54","type":"user"},{"_id":"66d559103c5bc37ee0dfa61b","avatarUrl":"/avatars/8310fc0b01e6d8873aec37ba9ef27c5b.svg","isPro":false,"fullname":"SaraG","user":"SLMLAH","type":"user"},{"_id":"626237d9bbcbd1c34f1bb231","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626237d9bbcbd1c34f1bb231/EJrOjvAL-68qMCYdnvOrq.png","isPro":true,"fullname":"Ali El Filali","user":"alielfilali01","type":"user"},{"_id":"67c159c74e5a4421af01f728","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67c159c74e5a4421af01f728/1LiWX042ngiQo2xqCbJ1x.jpeg","isPro":false,"fullname":"YiChenZhang","user":"YiYiChen","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"61fb9e24dc607a42af5f193f","name":"MBZUAI","fullname":"Mohamed Bin Zayed University of Artificial Intelligence","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1643879908583-603ab5664a944b99e81476e8.jpeg"}}">
Researchers developed CAMEL-Bench, a comprehensive Arabic multimodal evaluation benchmark, to assess visual reasoning and understanding models across diverse domains and scenarios.
AI-generated summary
Recent years have witnessed a significant interest in developing large
multimodal models (LMMs) capable of performing various visual reasoning and
understanding tasks. This has led to the introduction of multiple LMM
benchmarks to evaluate LMMs on different tasks. However, most existing LMM
evaluation benchmarks are predominantly English-centric. In this work, we
develop a comprehensive LMM evaluation benchmark for the Arabic language to
represent a large population of over 400 million speakers. The proposed
benchmark, named CAMEL-Bench, comprises eight diverse domains and 38
sub-domains including, multi-image understanding, complex visual perception,
handwritten document understanding, video understanding, medical imaging, plant
diseases, and remote sensing-based land use understanding to evaluate broad
scenario generalizability. Our CAMEL-Bench comprises around 29,036 questions
that are filtered from a larger pool of samples, where the quality is manually
verified by native speakers to ensure reliable model assessment. We conduct
evaluations of both closed-source, including GPT-4 series, and open-source
LMMs. Our analysis reveals the need for substantial improvement, especially
among the best open-source models, with even the closed-source GPT-4o achieving
an overall score of 62%. Our benchmark and evaluation scripts are open-sourced.
The proposed CAMEL-Bench covers eight diverse and challenging domains: multimodal understanding and reasoning, OCR and document understanding, chart and diagram understanding, video understanding, cultural-specific understanding, medical imaging understanding, agricultural image understanding, and remote sensing understanding in Arabic. CAMEL-Bench covers 38 sub-domains with over 29K questions carefully curated by native Arabic speakers to rigorously evaluate essential skills desired in Arabic LMMs.