Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - URSA: Understanding and Verifying Chain-of-thought Reasoning in
Multimodal Mathematics
https://ursa-math.github.io/\n","updatedAt":"2025-01-09T07:18:53.452Z","author":{"_id":"6548956fe49bd8d58e8adf0e","avatarUrl":"/avatars/24c54b14c253c87d4b7438193f16ce28.svg","fullname":"Ruilin","name":"Antimage01","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.40194031596183777},"editors":["Antimage01"],"editorAvatarUrls":["/avatars/24c54b14c253c87d4b7438193f16ce28.svg"],"reactions":[],"isReport":false}},{"id":"6780791ec377ac7f3f9c669c","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false},"createdAt":"2025-01-10T01:34:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning](https://huggingface.co/papers/2411.11930) (2024)\n* [Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS](https://huggingface.co/papers/2411.18478) (2024)\n* [LLaVA-CoT: Let Vision Language Models Reason Step-by-Step](https://huggingface.co/papers/2411.10440) (2024)\n* [System-2 Mathematical Reasoning via Enriched Instruction Tuning](https://huggingface.co/papers/2412.16964) (2024)\n* [SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation](https://huggingface.co/papers/2411.11053) (2024)\n* [Mars-PO: Multi-Agent Reasoning System Preference Optimization](https://huggingface.co/papers/2411.19039) (2024)\n* [BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning](https://huggingface.co/papers/2501.03226) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-01-10T01:34:22.136Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7108842730522156},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2501.04686","authors":[{"_id":"677f62ca407edfda4408dd07","user":{"_id":"6548956fe49bd8d58e8adf0e","avatarUrl":"/avatars/24c54b14c253c87d4b7438193f16ce28.svg","isPro":false,"fullname":"Ruilin","user":"Antimage01","type":"user"},"name":"Ruilin Luo","status":"claimed_verified","statusLastChangedAt":"2025-01-09T10:06:31.575Z","hidden":false},{"_id":"677f62ca407edfda4408dd08","user":{"_id":"64ae0f825d48838462023c9b","avatarUrl":"/avatars/d3f348c1428376aea490339e94d4c239.svg","isPro":false,"fullname":"Zheng Zhuofan","user":"fun6668","type":"user"},"name":"Zhuofan Zheng","status":"admin_assigned","statusLastChangedAt":"2025-01-09T11:09:45.251Z","hidden":false},{"_id":"677f62ca407edfda4408dd09","name":"Yifan Wang","hidden":false},{"_id":"677f62ca407edfda4408dd0a","user":{"_id":"66b1eb17652012ddfb59e41e","avatarUrl":"/avatars/b1c8ce44e7a7d789a3eb2e29e2ddfebe.svg","isPro":false,"fullname":"Yiyao Yu","user":"yiyaoyu","type":"user"},"name":"Yiyao Yu","status":"admin_assigned","statusLastChangedAt":"2025-01-09T20:19:06.927Z","hidden":false},{"_id":"677f62ca407edfda4408dd0b","name":"Xinzhe Ni","hidden":false},{"_id":"677f62ca407edfda4408dd0c","user":{"_id":"64292eb375bcc24c5e52c011","avatarUrl":"/avatars/c8cb03ca35ca12d8831be5f4e8547d54.svg","isPro":false,"fullname":"czl","user":"Lin1557","type":"user"},"name":"Zicheng Lin","status":"claimed_verified","statusLastChangedAt":"2025-01-13T09:09:56.183Z","hidden":false},{"_id":"677f62ca407edfda4408dd0d","name":"Jin Zeng","hidden":false},{"_id":"677f62ca407edfda4408dd0e","user":{"_id":"64ca1fe838837b12d5e529b7","avatarUrl":"/avatars/44a3ad9e59318784ac531993b5f69f6b.svg","isPro":false,"fullname":"Yujiu Yang","user":"Thu-redrobot","type":"user"},"name":"Yujiu Yang","status":"admin_assigned","statusLastChangedAt":"2025-01-09T20:02:25.883Z","hidden":false}],"publishedAt":"2025-01-08T18:49:41.000Z","submittedOnDailyAt":"2025-01-09T03:59:54.445Z","title":"URSA: Understanding and Verifying Chain-of-thought Reasoning in\n Multimodal Mathematics","submittedOnDailyBy":{"_id":"64292eb375bcc24c5e52c011","avatarUrl":"/avatars/c8cb03ca35ca12d8831be5f4e8547d54.svg","isPro":false,"fullname":"czl","user":"Lin1557","type":"user"},"summary":"Chain-of-thought (CoT) reasoning has been widely applied in the mathematical\nreasoning of Large Language Models (LLMs). Recently, the introduction of\nderivative process supervision on CoT trajectories has sparked discussions on\nenhancing scaling capabilities during test time, thereby boosting the potential\nof these models. However, in multimodal mathematical reasoning, the scarcity of\nhigh-quality CoT training data has hindered existing models from achieving\nhigh-precision CoT reasoning and has limited the realization of reasoning\npotential during test time. In this work, we propose a three-module synthesis\nstrategy that integrates CoT distillation, trajectory-format rewriting, and\nformat unification. It results in a high-quality CoT reasoning instruction\nfine-tuning dataset in multimodal mathematics, MMathCoT-1M. We comprehensively\nvalidate the state-of-the-art (SOTA) performance of the trained URSA-7B model\non multiple multimodal mathematical benchmarks. For test-time scaling, we\nintroduce a data synthesis strategy that automatically generates process\nannotation datasets, known as DualMath-1.1M, focusing on both interpretation\nand logic. By further training URSA-7B on DualMath-1.1M, we transition from CoT\nreasoning capabilities to robust supervision abilities. The trained URSA-RM-7B\nacts as a verifier, effectively enhancing the performance of URSA-7B at test\ntime. URSA-RM-7B also demonstrates excellent out-of-distribution (OOD)\nverifying capabilities, showcasing its generalization. Model weights, training\ndata and code will be open-sourced.","upvotes":53,"discussionId":"677f62cb407edfda4408dd5c","projectPage":"https://ursa-math.github.io/","githubRepo":"https://github.com/URSA-MATH/URSA-MATH","githubRepoAddedBy":"user","ai_summary":"A three-module synthesis strategy and data generation approach improve Chain-of-Thought reasoning in multimodal mathematical tasks, boosting model performance and out-of-distribution verification capabilities.","ai_keywords":["Chain-of-Thought","reasoning","Large Language Models","derivative process supervision","multimodal mathematical reasoning","CoT distillation","trajectory-format rewriting","format unification","multimodal mathematics","fine-tuning dataset","state-of-the-art","benchmarks","process annotation datasets","robust supervision","verification capabilities","out-of-distribution","generalization"],"githubStars":129},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64292eb375bcc24c5e52c011","avatarUrl":"/avatars/c8cb03ca35ca12d8831be5f4e8547d54.svg","isPro":false,"fullname":"czl","user":"Lin1557","type":"user"},{"_id":"650be23ec4e52db6a4db63ef","avatarUrl":"/avatars/03af548029b38bee49ec295fefe74f9a.svg","isPro":false,"fullname":"Haoling Li","user":"Ringo1110","type":"user"},{"_id":"66f6627e08be8ab9ab0833f2","avatarUrl":"/avatars/871a0862c22e0e4f8829144953f81d85.svg","isPro":false,"fullname":"mi","user":"mxy123","type":"user"},{"_id":"66b1eb17652012ddfb59e41e","avatarUrl":"/avatars/b1c8ce44e7a7d789a3eb2e29e2ddfebe.svg","isPro":false,"fullname":"Yiyao Yu","user":"yiyaoyu","type":"user"},{"_id":"6535e56ef551a245bbf7e4ed","avatarUrl":"/avatars/540bde030bc61362f4f2ad507a063596.svg","isPro":false,"fullname":"zhr","user":"finduzzz","type":"user"},{"_id":"643d49cf482011f5f2be28fb","avatarUrl":"/avatars/1b2e0213bd53a8260889cded82de47d8.svg","isPro":false,"fullname":"jin","user":"zengjin","type":"user"},{"_id":"64b75db417570fdff9b43fd9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b75db417570fdff9b43fd9/-NcNL_QzuiAAvsBIQG3xo.png","isPro":false,"fullname":"Cheng YANG","user":"Cheng-YANG","type":"user"},{"_id":"677f6e9ce3ea3b483ac870f8","avatarUrl":"/avatars/3ad2f744f706e4a29f0cf4b0f7cae662.svg","isPro":false,"fullname":"chan","user":"Lemonade11","type":"user"},{"_id":"666fea4dcb82a3b014edbeeb","avatarUrl":"/avatars/2d17bf2c4cabd8e968f159cf983f490e.svg","isPro":false,"fullname":"wong","user":"efanwong","type":"user"},{"_id":"659bb6ab6d20ab21b0bf2456","avatarUrl":"/avatars/b97bcfafcf961797faa23d5ca55b9261.svg","isPro":false,"fullname":"Xinzhe","user":"Thmars","type":"user"},{"_id":"6683a05e74fb1736a4b7c934","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6683a05e74fb1736a4b7c934/eiz6qlqIUjAWGy5zfg8Cs.jpeg","isPro":false,"fullname":"QRQ","user":"RichardQRQ","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
A three-module synthesis strategy and data generation approach improve Chain-of-Thought reasoning in multimodal mathematical tasks, boosting model performance and out-of-distribution verification capabilities.
AI-generated summary
Chain-of-thought (CoT) reasoning has been widely applied in the mathematical
reasoning of Large Language Models (LLMs). Recently, the introduction of
derivative process supervision on CoT trajectories has sparked discussions on
enhancing scaling capabilities during test time, thereby boosting the potential
of these models. However, in multimodal mathematical reasoning, the scarcity of
high-quality CoT training data has hindered existing models from achieving
high-precision CoT reasoning and has limited the realization of reasoning
potential during test time. In this work, we propose a three-module synthesis
strategy that integrates CoT distillation, trajectory-format rewriting, and
format unification. It results in a high-quality CoT reasoning instruction
fine-tuning dataset in multimodal mathematics, MMathCoT-1M. We comprehensively
validate the state-of-the-art (SOTA) performance of the trained URSA-7B model
on multiple multimodal mathematical benchmarks. For test-time scaling, we
introduce a data synthesis strategy that automatically generates process
annotation datasets, known as DualMath-1.1M, focusing on both interpretation
and logic. By further training URSA-7B on DualMath-1.1M, we transition from CoT
reasoning capabilities to robust supervision abilities. The trained URSA-RM-7B
acts as a verifier, effectively enhancing the performance of URSA-7B at test
time. URSA-RM-7B also demonstrates excellent out-of-distribution (OOD)
verifying capabilities, showcasing its generalization. Model weights, training
data and code will be open-sourced.
TL;DR: Our work focuses on multimodal mathematical reasoning capabilities. We contribute the MMathCoT-1M and DualMath-1.1M datasets through a three-module high-quality CoT data synthesis and dual-view process label automation. The URSA-7B model, fine-tuned on MMathCoT-1M, achieves SOTA performance among models of the same size on multiple multimodal mathematics benchmarks. Furthermore, the URSA-RM-7B, trained on DualMath-1.1M, is the first contribution of a small-sized reward model in the domain of multimodal mathematics, with its test-time scaling effectiveness being greatly validated.