Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-10-11T01:36:27.657Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6896512508392334},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2510.08559","authors":[{"_id":"68e86dbd95e8e6771df3890f","user":{"_id":"64616f03455531c6be7cfc37","avatarUrl":"/avatars/a97207e7ec9482162f051624bb28fdfb.svg","isPro":false,"fullname":"Andong Deng","user":"groundmore","type":"user"},"name":"Andong Deng","status":"claimed_verified","statusLastChangedAt":"2025-10-10T09:12:33.201Z","hidden":false},{"_id":"68e86dbd95e8e6771df38910","name":"Taojiannan Yang","hidden":false},{"_id":"68e86dbd95e8e6771df38911","name":"Shoubin Yu","hidden":false},{"_id":"68e86dbd95e8e6771df38912","name":"Lincoln Spencer","hidden":false},{"_id":"68e86dbd95e8e6771df38913","name":"Mohit Bansal","hidden":false},{"_id":"68e86dbd95e8e6771df38914","name":"Chen Chen","hidden":false},{"_id":"68e86dbd95e8e6771df38915","name":"Serena Yeung-Levy","hidden":false},{"_id":"68e86dbd95e8e6771df38916","user":{"_id":"65703fab7f50602340d23704","avatarUrl":"/avatars/324c45f5fba9cd8c38a89b30427c06b4.svg","isPro":false,"fullname":"Xiaohan Wang","user":"nicholswang","type":"user"},"name":"Xiaohan Wang","status":"claimed_verified","statusLastChangedAt":"2025-10-21T14:06:38.031Z","hidden":false}],"publishedAt":"2025-10-09T17:59:23.000Z","submittedOnDailyAt":"2025-10-10T00:52:07.252Z","title":"SciVideoBench: Benchmarking Scientific Video Reasoning in Large\n Multimodal Models","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Large Multimodal Models (LMMs) have achieved remarkable progress across\nvarious capabilities; however, complex video reasoning in the scientific domain\nremains a significant and challenging frontier. Current video benchmarks\npredominantly target general scenarios where perception/recognition is heavily\nrelied on, while with relatively simple reasoning tasks, leading to saturation\nand thus failing to effectively evaluate advanced multimodal cognitive skills.\nTo address this critical gap, we introduce SciVideoBench, a rigorous benchmark\nspecifically designed to assess advanced video reasoning in scientific\ncontexts. SciVideoBench consists of 1,000 carefully crafted multiple-choice\nquestions derived from cutting-edge scientific experimental videos spanning\nover 25 specialized academic subjects and verified by a semi-automatic system.\nEach question demands sophisticated domain-specific knowledge, precise\nspatiotemporal perception, and intricate logical reasoning, effectively\nchallenging models' higher-order cognitive abilities. Our evaluation highlights\nsignificant performance deficits in state-of-the-art proprietary and\nopen-source LMMs, including Gemini 2.5 Pro and Qwen2.5-VL, indicating\nsubstantial room for advancement in video reasoning capabilities. Detailed\nanalyses of critical factors such as reasoning complexity and visual grounding\nprovide valuable insights and clear direction for future developments in LMMs,\ndriving the evolution of truly capable multimodal AI co-scientists. We hope\nSciVideoBench could fit the interests of the community and help to push the\nboundary of cutting-edge AI for border science.","upvotes":9,"discussionId":"68e86dbd95e8e6771df38917","projectPage":"https://scivideobench.github.io/","githubRepo":"https://github.com/dengandong/SciVideoBench","githubRepoAddedBy":"user","ai_summary":"SciVideoBench is a benchmark designed to evaluate advanced video reasoning in scientific contexts, challenging models with sophisticated domain-specific knowledge and logical reasoning.","ai_keywords":["Large Multimodal Models","LMMs","video reasoning","scientific domain","video benchmarks","perception","recognition","reasoning tasks","multiple-choice questions","spatiotemporal perception","logical reasoning","cognitive abilities","reasoning complexity","visual grounding","multimodal AI co-scientists"],"githubStars":10},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"64616f03455531c6be7cfc37","avatarUrl":"/avatars/a97207e7ec9482162f051624bb28fdfb.svg","isPro":false,"fullname":"Andong Deng","user":"groundmore","type":"user"},{"_id":"65703fab7f50602340d23704","avatarUrl":"/avatars/324c45f5fba9cd8c38a89b30427c06b4.svg","isPro":false,"fullname":"Xiaohan Wang","user":"nicholswang","type":"user"},{"_id":"63a3c40dd039d7d016ff22d0","avatarUrl":"/avatars/ca33b603961439a615465bf771bf8d97.svg","isPro":false,"fullname":"Tao","user":"vectoryyyy","type":"user"},{"_id":"662118e5bac5577b6bb052ce","avatarUrl":"/avatars/060d29ddc2ce5c98c12bd9ad6bc51cd8.svg","isPro":false,"fullname":"Chen Chen","user":"cc1164","type":"user"},{"_id":"66207f34bd2dca4c783c6357","avatarUrl":"/avatars/edd20b005c9a8595162a553a68bd614d.svg","isPro":true,"fullname":"Xinxin Liu","user":"ciciciciliu","type":"user"},{"_id":"648c9605565e3a44f3c9bb7b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648c9605565e3a44f3c9bb7b/W5chvk17Zol6-2QSWkFVR.jpeg","isPro":true,"fullname":"Orr Zohar","user":"orrzohar","type":"user"},{"_id":"62f662bcc58915315c4eccea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f662bcc58915315c4eccea/zOAQLONfMP88zr70sxHK-.jpeg","isPro":true,"fullname":"Yilun Zhao","user":"yilunzhao","type":"user"},{"_id":"686db5d4af2b856fabbf13aa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/6BjMv2LVNoqvbX8fQSTPI.png","isPro":false,"fullname":"V bbbb","user":"Bbbbbnnn","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2510.08559

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

Published on Oct 9, 2025
ยท Submitted by
taesiri
on Oct 10, 2025
Authors:
,
,
,
,
,
,

Abstract

SciVideoBench is a benchmark designed to evaluate advanced video reasoning in scientific contexts, challenging models with sophisticated domain-specific knowledge and logical reasoning.

AI-generated summary

Large Multimodal Models (LMMs) have achieved remarkable progress across various capabilities; however, complex video reasoning in the scientific domain remains a significant and challenging frontier. Current video benchmarks predominantly target general scenarios where perception/recognition is heavily relied on, while with relatively simple reasoning tasks, leading to saturation and thus failing to effectively evaluate advanced multimodal cognitive skills. To address this critical gap, we introduce SciVideoBench, a rigorous benchmark specifically designed to assess advanced video reasoning in scientific contexts. SciVideoBench consists of 1,000 carefully crafted multiple-choice questions derived from cutting-edge scientific experimental videos spanning over 25 specialized academic subjects and verified by a semi-automatic system. Each question demands sophisticated domain-specific knowledge, precise spatiotemporal perception, and intricate logical reasoning, effectively challenging models' higher-order cognitive abilities. Our evaluation highlights significant performance deficits in state-of-the-art proprietary and open-source LMMs, including Gemini 2.5 Pro and Qwen2.5-VL, indicating substantial room for advancement in video reasoning capabilities. Detailed analyses of critical factors such as reasoning complexity and visual grounding provide valuable insights and clear direction for future developments in LMMs, driving the evolution of truly capable multimodal AI co-scientists. We hope SciVideoBench could fit the interests of the community and help to push the boundary of cutting-edge AI for border science.

Community

Paper submitter

Large Multimodal Models (LMMs) have achieved remarkable progress across various capabilities; however, complex video reasoning in the scientific domain remains a significant and challenging frontier. Current video benchmarks predominantly target general scenarios where perception/recognition is heavily relied on, while with relatively simple reasoning tasks, leading to saturation and thus failing to effectively evaluate advanced multimodal cognitive skills. To address this critical gap, we introduce SciVideoBench, a rigorous benchmark specifically designed to assess advanced video reasoning in scientific contexts. SciVideoBench consists of 1,000 carefully crafted multiple-choice questions derived from cutting-edge scientific experimental videos spanning over 25 specialized academic subjects and verified by a semi-automatic system. Each question demands sophisticated domain-specific knowledge, precise spatiotemporal perception, and intricate logical reasoning, effectively challenging models' higher-order cognitive abilities. Our evaluation highlights significant performance deficits in state-of-the-art proprietary and open-source LMMs, including Gemini 2.5 Pro and Qwen2.5-VL, indicating substantial room for advancement in video reasoning capabilities. Detailed analyses of critical factors such as reasoning complexity and visual grounding provide valuable insights and clear direction for future developments in LMMs, driving the evolution of truly capable multimodal AI co-scientists. We hope SciVideoBench could fit the interests of the community and help to push the boundary of cutting-edge AI for border science.

great work ๐Ÿ‘๐Ÿ‘

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.08559 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.08559 in a Space README.md to link it from this page.

Collections including this paper 2