Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-09-06T11:29:15.698Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7077420353889465},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2509.03059","authors":[{"_id":"68b913eed43cadaf7a688b6c","user":{"_id":"62cb06206193ba3ced7afa05","avatarUrl":"/avatars/3265c0523642b2463b7c6c7a4181681c.svg","isPro":false,"fullname":"Xingyue Huang","user":"hxyscott","type":"user"},"name":"Xingyue Huang","status":"claimed_verified","statusLastChangedAt":"2025-09-04T08:43:53.847Z","hidden":false},{"_id":"68b913eed43cadaf7a688b6d","name":"Rishabh","hidden":false},{"_id":"68b913eed43cadaf7a688b6e","name":"Gregor Franke","hidden":false},{"_id":"68b913eed43cadaf7a688b6f","name":"Ziyi Yang","hidden":false},{"_id":"68b913eed43cadaf7a688b70","name":"Jiamu Bai","hidden":false},{"_id":"68b913eed43cadaf7a688b71","name":"Weijie Bai","hidden":false},{"_id":"68b913eed43cadaf7a688b72","name":"Jinhe Bi","hidden":false},{"_id":"68b913eed43cadaf7a688b73","name":"Zifeng Ding","hidden":false},{"_id":"68b913eed43cadaf7a688b74","name":"Yiqun Duan","hidden":false},{"_id":"68b913eed43cadaf7a688b75","name":"Chengyu Fan","hidden":false},{"_id":"68b913eed43cadaf7a688b76","name":"Wendong Fan","hidden":false},{"_id":"68b913eed43cadaf7a688b77","name":"Xin Gao","hidden":false},{"_id":"68b913eed43cadaf7a688b78","name":"Ruohao Guo","hidden":false},{"_id":"68b913eed43cadaf7a688b79","user":{"_id":"6073b82d69a66931a0273f0a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6073b82d69a66931a0273f0a/TicnGhqUD4Ol-n5Vau7n2.png","isPro":false,"fullname":"Yuan He","user":"lawhy","type":"user"},"name":"Yuan He","status":"claimed_verified","statusLastChangedAt":"2025-10-27T10:33:05.461Z","hidden":false},{"_id":"68b913eed43cadaf7a688b7a","user":{"_id":"6621e88fa11ce46061d25a16","avatarUrl":"/avatars/4a9a965a2d0f33e2855d2909a3e162bc.svg","isPro":false,"fullname":"Yicheng He","user":"bruno888","type":"user"},"name":"Zhuangzhuang He","status":"admin_assigned","statusLastChangedAt":"2025-09-22T19:40:51.928Z","hidden":false},{"_id":"68b913eed43cadaf7a688b7b","name":"Xianglong Hu","hidden":false},{"_id":"68b913eed43cadaf7a688b7c","name":"Neil Johnson","hidden":false},{"_id":"68b913eed43cadaf7a688b7d","name":"Bowen Li","hidden":false},{"_id":"68b913eed43cadaf7a688b7e","name":"Fangru Lin","hidden":false},{"_id":"68b913eed43cadaf7a688b7f","name":"Siyu Lin","hidden":false},{"_id":"68b913eed43cadaf7a688b80","name":"Tong Liu","hidden":false},{"_id":"68b913eed43cadaf7a688b81","name":"Yunpu Ma","hidden":false},{"_id":"68b913eed43cadaf7a688b82","name":"Hao Shen","hidden":false},{"_id":"68b913eed43cadaf7a688b83","name":"Hao Sun","hidden":false},{"_id":"68b913eed43cadaf7a688b84","name":"Beibei Wang","hidden":false},{"_id":"68b913eed43cadaf7a688b85","user":{"_id":"6401f9193e3d0f2745ae7760","avatarUrl":"/avatars/2f978944e290052a72c9d7e1e14a68c2.svg","isPro":false,"fullname":"FANGYIJIE WANG","user":"fangyijie","type":"user"},"name":"Fangyijie Wang","status":"claimed_verified","statusLastChangedAt":"2025-09-06T10:57:21.473Z","hidden":false},{"_id":"68b913eed43cadaf7a688b86","name":"Hao Wang","hidden":false},{"_id":"68b913eed43cadaf7a688b87","name":"Haoran Wang","hidden":false},{"_id":"68b913eed43cadaf7a688b88","name":"Yang Wang","hidden":false},{"_id":"68b913eed43cadaf7a688b89","name":"Yifeng Wang","hidden":false},{"_id":"68b913eed43cadaf7a688b8a","user":{"_id":"62281c11236b7b2eefa7f198","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62281c11236b7b2eefa7f198/O-LoLaDkIoWcP19mzkNgS.jpeg","isPro":true,"fullname":"Zhaowei Wang","user":"ZhaoweiWang","type":"user"},"name":"Zhaowei Wang","status":"claimed_verified","statusLastChangedAt":"2025-12-25T20:56:04.829Z","hidden":false},{"_id":"68b913eed43cadaf7a688b8b","name":"Ziyang Wang","hidden":false},{"_id":"68b913eed43cadaf7a688b8c","name":"Yifan Wu","hidden":false},{"_id":"68b913eed43cadaf7a688b8d","name":"Zikai Xiao","hidden":false},{"_id":"68b913eed43cadaf7a688b8e","name":"Chengxing Xie","hidden":false},{"_id":"68b913eed43cadaf7a688b8f","name":"Fan Yang","hidden":false},{"_id":"68b913eed43cadaf7a688b90","user":{"_id":"65d859a3661492b25c46a117","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d859a3661492b25c46a117/Yui5RSHsWltBF3s3X4NI2.jpeg","isPro":false,"fullname":"Junxiao Yang","user":"yangjunxiao2021","type":"user"},"name":"Junxiao Yang","status":"claimed_verified","statusLastChangedAt":"2025-09-04T12:51:29.776Z","hidden":false},{"_id":"68b913eed43cadaf7a688b91","name":"Qianshuo Ye","hidden":false},{"_id":"68b913eed43cadaf7a688b92","name":"Ziyu Ye","hidden":false},{"_id":"68b913eed43cadaf7a688b93","name":"Guangtao Zeng","hidden":false},{"_id":"68b913eed43cadaf7a688b94","name":"Yuwen Ebony Zhang","hidden":false},{"_id":"68b913eed43cadaf7a688b95","name":"Zeyu Zhang","hidden":false},{"_id":"68b913eed43cadaf7a688b96","user":{"_id":"64030d9956038547951c7d55","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64030d9956038547951c7d55/gRCzhy1yd7cOV6GN-KGHJ.png","isPro":false,"fullname":"Zihao Zhu","user":"ZihaoZhu","type":"user"},"name":"Zihao Zhu","status":"claimed_verified","statusLastChangedAt":"2025-09-04T12:51:31.659Z","hidden":false},{"_id":"68b913eed43cadaf7a688b97","name":"Bernard Ghanem","hidden":false},{"_id":"68b913eed43cadaf7a688b98","name":"Philip Torr","hidden":false},{"_id":"68b913eed43cadaf7a688b99","user":{"_id":"6338790e76421c054310c96b","avatarUrl":"/avatars/112e3d88d155bc998a89fef6f33af64d.svg","isPro":false,"fullname":"Guohao Li","user":"lightaime","type":"user"},"name":"Guohao Li","status":"claimed_verified","statusLastChangedAt":"2025-09-06T10:57:23.571Z","hidden":false}],"publishedAt":"2025-09-03T06:42:40.000Z","submittedOnDailyAt":"2025-09-05T19:36:35.301Z","title":"Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers","submittedOnDailyBy":{"_id":"62cb06206193ba3ced7afa05","avatarUrl":"/avatars/3265c0523642b2463b7c6c7a4181681c.svg","isPro":false,"fullname":"Xingyue Huang","user":"hxyscott","type":"user"},"summary":"Recent advances in Large Language Models (LLMs) have shown that their\nreasoning capabilities can be significantly improved through Reinforcement\nLearning with Verifiable Reward (RLVR), particularly in domains like\nmathematics and programming, where ground-truth correctness can be\nautomatically evaluated. However, extending this success to other\nreasoning-intensive domains remains challenging due to the scarcity of\nhigh-quality, verifiable datasets and the high cost of human supervision. In\nthis work, we introduce the Loong Project: an open-source framework for\nscalable synthetic data generation and verification across a diverse range of\nreasoning-intensive domains. The framework consists of two key components: (1)\nLoongBench, a curated seed dataset containing 8,729 human-vetted examples\nacross 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired\nwith executable code and rich metadata; and (2) LoongEnv, a modular synthetic\ndata generation environment that supports multiple prompting strategies to\nproduce new question-answer-code triples. Together, these components form an\nagent-environment loop that enables reinforcement learning, where an LLM-based\nagent is rewarded for generating Chain-of-Thought (CoT) solutions that align\nwith code-executed answers. Empirically, we benchmark LoongBench on a broad\nsuite of both open-source and proprietary LLMs to evaluate domain coverage and\nreveal performance bottlenecks. In addition, we conduct a comprehensive\nanalysis of synthetic data generated by LoongEnv, examining correctness,\ndifficulty, and diversity. Code and documentation are available at\nhttps://github.com/camel-ai/loong.","upvotes":25,"discussionId":"68b913eed43cadaf7a688b9a","projectPage":"https://github.com/camel-ai/loong","githubRepo":"https://github.com/camel-ai/loong","githubRepoAddedBy":"auto","ai_summary":"The Loong Project introduces a framework for generating and verifying synthetic data to improve reasoning capabilities in Large Language Models through Reinforcement Learning with Verifiable Reward.","ai_keywords":["Large Language Models","Reinforcement Learning with Verifiable Reward","synthetic data generation","verification","LoongBench","LoongEnv","Chain-of-Thought","domain coverage","performance bottlenecks","correctness","difficulty","diversity"],"githubStars":485},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62cb06206193ba3ced7afa05","avatarUrl":"/avatars/3265c0523642b2463b7c6c7a4181681c.svg","isPro":false,"fullname":"Xingyue Huang","user":"hxyscott","type":"user"},{"_id":"65d859a3661492b25c46a117","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d859a3661492b25c46a117/Yui5RSHsWltBF3s3X4NI2.jpeg","isPro":false,"fullname":"Junxiao Yang","user":"yangjunxiao2021","type":"user"},{"_id":"6338790e76421c054310c96b","avatarUrl":"/avatars/112e3d88d155bc998a89fef6f33af64d.svg","isPro":false,"fullname":"Guohao Li","user":"lightaime","type":"user"},{"_id":"6401f9193e3d0f2745ae7760","avatarUrl":"/avatars/2f978944e290052a72c9d7e1e14a68c2.svg","isPro":false,"fullname":"FANGYIJIE WANG","user":"fangyijie","type":"user"},{"_id":"62281c11236b7b2eefa7f198","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62281c11236b7b2eefa7f198/O-LoLaDkIoWcP19mzkNgS.jpeg","isPro":true,"fullname":"Zhaowei Wang","user":"ZhaoweiWang","type":"user"},{"_id":"64030d9956038547951c7d55","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64030d9956038547951c7d55/gRCzhy1yd7cOV6GN-KGHJ.png","isPro":false,"fullname":"Zihao Zhu","user":"ZihaoZhu","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"656a8c1e7825b31010b58a10","avatarUrl":"/avatars/edb333a1fd461f24efd4945f2d5e2e01.svg","isPro":false,"fullname":"zjrwtx","user":"zjrwtxtechstudio","type":"user"},{"_id":"6621e88fa11ce46061d25a16","avatarUrl":"/avatars/4a9a965a2d0f33e2855d2909a3e162bc.svg","isPro":false,"fullname":"Yicheng He","user":"bruno888","type":"user"},{"_id":"65df33cc172353c169dceaaf","avatarUrl":"/avatars/938f4fcbea0305279ca9ec37ce7eaa65.svg","isPro":false,"fullname":"Ubec","user":"hrw","type":"user"},{"_id":"6358edff3b3638bdac83f7ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1666772404424-noauth.jpeg","isPro":false,"fullname":"Pratyay Banerjee","user":"Neilblaze","type":"user"},{"_id":"647f805de9c81260ff8881ee","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/647f805de9c81260ff8881ee/BLntQiCpqIhTFhbCoOy0B.jpeg","isPro":false,"fullname":"ℏεsam","user":"hesamation","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
The Loong Project introduces a framework for generating and verifying synthetic data to improve reasoning capabilities in Large Language Models through Reinforcement Learning with Verifiable Reward.
AI-generated summary
Recent advances in Large Language Models (LLMs) have shown that their
reasoning capabilities can be significantly improved through Reinforcement
Learning with Verifiable Reward (RLVR), particularly in domains like
mathematics and programming, where ground-truth correctness can be
automatically evaluated. However, extending this success to other
reasoning-intensive domains remains challenging due to the scarcity of
high-quality, verifiable datasets and the high cost of human supervision. In
this work, we introduce the Loong Project: an open-source framework for
scalable synthetic data generation and verification across a diverse range of
reasoning-intensive domains. The framework consists of two key components: (1)
LoongBench, a curated seed dataset containing 8,729 human-vetted examples
across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired
with executable code and rich metadata; and (2) LoongEnv, a modular synthetic
data generation environment that supports multiple prompting strategies to
produce new question-answer-code triples. Together, these components form an
agent-environment loop that enables reinforcement learning, where an LLM-based
agent is rewarded for generating Chain-of-Thought (CoT) solutions that align
with code-executed answers. Empirically, we benchmark LoongBench on a broad
suite of both open-source and proprietary LLMs to evaluate domain coverage and
reveal performance bottlenecks. In addition, we conduct a comprehensive
analysis of synthetic data generated by LoongEnv, examining correctness,
difficulty, and diversity. Code and documentation are available at
https://github.com/camel-ai/loong.
We introduce Project Loong: focusing on scaling up synthetic data generation with verifiers for a broad range of domains. We believe that synthetic data generation is essential—not only for addressing gaps in data-scarce domains, but also for enhancing reasoning capabilities in areas like math and programming by expanding dataset availability.