Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
[go: Go Back, main page]

https://prime-rl.github.io/P1-VL
GitHub: https://github.com/PRIME-RL/P1-VL

\n","updatedAt":"2026-02-11T06:31:19.630Z","author":{"_id":"6086838b19137b3a6ba760e7","avatarUrl":"/avatars/d63eea3e39b22c6e65b82c28192696f1.svg","fullname":"Jianhao Yan","name":"Elliott","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8808199763298035},"editors":["Elliott"],"editorAvatarUrls":["/avatars/d63eea3e39b22c6e65b82c28192696f1.svg"],"reactions":[],"isReport":false}},{"id":"698d2f4dc98ef9ef0a79fc0e","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-12T01:39:25.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods](https://huggingface.co/papers/2601.21821) (2026)\n* [CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving](https://huggingface.co/papers/2601.01874) (2026)\n* [Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy](https://huggingface.co/papers/2601.06801) (2026)\n* [Figure It Out: Improve the Frontier of Reasoning with Executable Visual States](https://huggingface.co/papers/2512.24297) (2025)\n* [V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation](https://huggingface.co/papers/2601.10094) (2026)\n* [Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision](https://huggingface.co/papers/2602.04290) (2026)\n* [SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning](https://huggingface.co/papers/2512.24330) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-12T01:39:25.720Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7154006958007812},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.09443","authors":[{"_id":"698c019f6052d3bed9630b1c","name":"Yun Luo","hidden":false},{"_id":"698c019f6052d3bed9630b1d","user":{"_id":"6418228b83957c4eaaad4d01","avatarUrl":"/avatars/b6af01d09bba5d5ba7bf4a62914ca468.svg","isPro":false,"fullname":"wang","user":"astrid01052","type":"user"},"name":"Futing Wang","status":"claimed_verified","statusLastChangedAt":"2026-02-13T09:37:44.168Z","hidden":false},{"_id":"698c019f6052d3bed9630b1e","name":"Qianjia Cheng","hidden":false},{"_id":"698c019f6052d3bed9630b1f","name":"Fangchen Yu","hidden":false},{"_id":"698c019f6052d3bed9630b20","name":"Haodi Lei","hidden":false},{"_id":"698c019f6052d3bed9630b21","name":"Jianhao Yan","hidden":false},{"_id":"698c019f6052d3bed9630b22","name":"Chenxi Li","hidden":false},{"_id":"698c019f6052d3bed9630b23","name":"Jiacheng Chen","hidden":false},{"_id":"698c019f6052d3bed9630b24","name":"Yufeng Zhao","hidden":false},{"_id":"698c019f6052d3bed9630b25","user":{"_id":"691b0f528411a45dc9ee9de8","avatarUrl":"/avatars/261c28f7e616a8482970f50c1f8919fd.svg","isPro":false,"fullname":"Haiyuan Wan","user":"HY-Wan","type":"user"},"name":"Haiyuan Wan","status":"claimed_verified","statusLastChangedAt":"2026-02-13T09:37:35.614Z","hidden":false},{"_id":"698c019f6052d3bed9630b26","name":"Yuchen Zhang","hidden":false},{"_id":"698c019f6052d3bed9630b27","name":"Shenghe Zheng","hidden":false},{"_id":"698c019f6052d3bed9630b28","name":"Junchi Yao","hidden":false},{"_id":"698c019f6052d3bed9630b29","name":"Qingyang Zhang","hidden":false},{"_id":"698c019f6052d3bed9630b2a","name":"Haonan He","hidden":false},{"_id":"698c019f6052d3bed9630b2b","name":"Wenxuan Zeng","hidden":false},{"_id":"698c019f6052d3bed9630b2c","name":"Li Sheng","hidden":false},{"_id":"698c019f6052d3bed9630b2d","name":"Chengxing Xie","hidden":false},{"_id":"698c019f6052d3bed9630b2e","name":"Yuxin Zuo","hidden":false},{"_id":"698c019f6052d3bed9630b2f","name":"Yizhuo Li","hidden":false},{"_id":"698c019f6052d3bed9630b30","name":"Yulun Wu","hidden":false},{"_id":"698c019f6052d3bed9630b31","name":"Rui Huang","hidden":false},{"_id":"698c019f6052d3bed9630b32","name":"Dongzhan Zhou","hidden":false},{"_id":"698c019f6052d3bed9630b33","name":"Kai Chen","hidden":false},{"_id":"698c019f6052d3bed9630b34","name":"Yu Qiao","hidden":false},{"_id":"698c019f6052d3bed9630b35","name":"Lei Bai","hidden":false},{"_id":"698c019f6052d3bed9630b36","name":"Yu Cheng","hidden":false},{"_id":"698c019f6052d3bed9630b37","name":"Ning Ding","hidden":false},{"_id":"698c019f6052d3bed9630b38","name":"Bowen Zhou","hidden":false},{"_id":"698c019f6052d3bed9630b39","name":"Peng Ye","hidden":false},{"_id":"698c019f6052d3bed9630b3a","name":"Ganqu Cui","hidden":false}],"publishedAt":"2026-02-10T06:28:08.000Z","submittedOnDailyAt":"2026-02-11T04:00:48.041Z","title":"P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads","submittedOnDailyBy":{"_id":"6086838b19137b3a6ba760e7","avatarUrl":"/avatars/d63eea3e39b22c6e65b82c28192696f1.svg","isPro":false,"fullname":"Jianhao Yan","user":"Elliott","type":"user"},"summary":"The transition from symbolic manipulation to science-grade reasoning represents a pivotal frontier for Large Language Models (LLMs), with physics serving as the critical test anchor for binding abstract logic to physical reality. Physics demands that a model maintain physical consistency with the laws governing the universe, a task that fundamentally requires multimodal perception to ground abstract logic in reality. At the Olympiad level, diagrams are often constitutive rather than illustrative, containing essential constraints, such as boundary conditions and spatial symmetries, that are absent from the text. To bridge this visual-logical gap, we introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning. Our method harmonizes Curriculum Reinforcement Learning, which employs progressive difficulty expansion to stabilize post-training, with Agentic Augmentation, enabling iterative self-verification at inference. Evaluated on HiPhO, a rigorous benchmark of 13 exams from 2024-2025, our flagship P1-VL-235B-A22B becomes the first open-source Vision-Language Model (VLM) to secure 12 gold medals and achieves the state-of-the-art performance in the open-source models. Our agent-augmented system achieves the No.2 overall rank globally, trailing only Gemini-3-Pro. Beyond physics, P1-VL demonstrates remarkable scientific reasoning capacity and generalizability, establishing significant leads over base models in STEM benchmarks. By open-sourcing P1-VL, we provide a foundational step toward general-purpose physical intelligence to better align visual perceptions with abstract physical laws for machine scientific discovery.","upvotes":57,"discussionId":"698c019f6052d3bed9630b3b","projectPage":"https://prime-rl.github.io/P1-VL","githubRepo":"https://github.com/PRIME-RL/P1-VL","githubRepoAddedBy":"user","ai_summary":"Physics-oriented vision-language models leverage curriculum reinforcement learning and agentic augmentation to achieve state-of-the-art scientific reasoning performance while maintaining physical consistency through multimodal perception.","ai_keywords":["vision-language models","curriculum reinforcement learning","agentic augmentation","multimodal perception","scientific reasoning","physical consistency","HiPhO benchmark","P1-VL-235B-A22B","Gemini-3-Pro"],"githubStars":14,"organization":{"_id":"6747ee5decec679eafb90450","name":"ShanghaiAiLab","fullname":"shanghai ailab "}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6086838b19137b3a6ba760e7","avatarUrl":"/avatars/d63eea3e39b22c6e65b82c28192696f1.svg","isPro":false,"fullname":"Jianhao Yan","user":"Elliott","type":"user"},{"_id":"644915c5e87a77e872e61350","avatarUrl":"/avatars/46ba7bdf04ad4c1b0ad79155010dc684.svg","isPro":false,"fullname":"Luo","user":"ramiroluo","type":"user"},{"_id":"68ad9cb3bcaa8d84217a8bdf","avatarUrl":"/avatars/dbb3199cf5bfc2acdbd38069c823c027.svg","isPro":false,"fullname":"Fangchen Yu","user":"SciYu","type":"user"},{"_id":"666ea165f98a92bc8997cbfb","avatarUrl":"/avatars/be1fce19192fd4f220fe82f373859b52.svg","isPro":false,"fullname":"Qianjia Cheng","user":"CajZella","type":"user"},{"_id":"65352acb7139c5dd8d9a8590","avatarUrl":"/avatars/e2ff22b596aee45cdfb8f68dc15572f9.svg","isPro":false,"fullname":"JiachengChen","user":"JC-Chen","type":"user"},{"_id":"66ea643899af9ac3463639b1","avatarUrl":"/avatars/252d470e761a57834dee3dbc60dfefed.svg","isPro":false,"fullname":"Disen Lan","user":"landisen","type":"user"},{"_id":"629454301ae2138079f7ff31","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/629454301ae2138079f7ff31/rVtbF-j06gDiYzomTeVTc.jpeg","isPro":false,"fullname":"Tong Zhu","user":"Spico","type":"user"},{"_id":"667cf204268f6622dac71961","avatarUrl":"/avatars/90e1928beb2a685e82e19758e4a6b7ae.svg","isPro":false,"fullname":"shiyang","user":"sY713","type":"user"},{"_id":"64cb54da1af278541d663708","avatarUrl":"/avatars/c44507cc92bb2e83154bad31b90ce6dd.svg","isPro":false,"fullname":"Xiaoye Qu","user":"Xiaoye08","type":"user"},{"_id":"67247adb73d1eb17b6bfd27c","avatarUrl":"/avatars/57bdbb7362f9854c87dd0a71ae071652.svg","isPro":false,"fullname":"Zefeng He","user":"yhx12","type":"user"},{"_id":"691b0f528411a45dc9ee9de8","avatarUrl":"/avatars/261c28f7e616a8482970f50c1f8919fd.svg","isPro":false,"fullname":"Haiyuan Wan","user":"HY-Wan","type":"user"},{"_id":"62495cb96ee7ee6b646db130","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62495cb96ee7ee6b646db130/UwBXmvcMq7LMvBWUw0xo3.jpeg","isPro":false,"fullname":"Runzhe Zhan","user":"rzzhan","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"6747ee5decec679eafb90450","name":"ShanghaiAiLab","fullname":"shanghai ailab "}}">
Papers
arxiv:2602.09443

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

Published on Feb 10
· Submitted by
Jianhao Yan
on Feb 11
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Physics-oriented vision-language models leverage curriculum reinforcement learning and agentic augmentation to achieve state-of-the-art scientific reasoning performance while maintaining physical consistency through multimodal perception.

AI-generated summary

The transition from symbolic manipulation to science-grade reasoning represents a pivotal frontier for Large Language Models (LLMs), with physics serving as the critical test anchor for binding abstract logic to physical reality. Physics demands that a model maintain physical consistency with the laws governing the universe, a task that fundamentally requires multimodal perception to ground abstract logic in reality. At the Olympiad level, diagrams are often constitutive rather than illustrative, containing essential constraints, such as boundary conditions and spatial symmetries, that are absent from the text. To bridge this visual-logical gap, we introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning. Our method harmonizes Curriculum Reinforcement Learning, which employs progressive difficulty expansion to stabilize post-training, with Agentic Augmentation, enabling iterative self-verification at inference. Evaluated on HiPhO, a rigorous benchmark of 13 exams from 2024-2025, our flagship P1-VL-235B-A22B becomes the first open-source Vision-Language Model (VLM) to secure 12 gold medals and achieves the state-of-the-art performance in the open-source models. Our agent-augmented system achieves the No.2 overall rank globally, trailing only Gemini-3-Pro. Beyond physics, P1-VL demonstrates remarkable scientific reasoning capacity and generalizability, establishing significant leads over base models in STEM benchmarks. By open-sourcing P1-VL, we provide a foundational step toward general-purpose physical intelligence to better align visual perceptions with abstract physical laws for machine scientific discovery.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.09443 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.09443 in a Space README.md to link it from this page.

Collections including this paper 1