Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
[go: Go Back, main page]

https://supergpqa.github.io/

\n","updatedAt":"2025-02-21T03:15:33.143Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9179,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.32268935441970825},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[{"reaction":"➕","users":["shri210620","chujiezheng","JuntingZhou","AdinaY","QiYao-Wang","Yifan12","ArtiomNosov"],"count":7},{"reaction":"🔥","users":["AdinaY","CheeryLJH","Yifan12","AxAI","hiyouga"],"count":5}],"isReport":false}},{"id":"67b81fdc8ff8782a98014bfc","author":{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","fullname":"Chujie Zheng","name":"chujiezheng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":85,"isUserFollowing":false},"createdAt":"2025-02-21T06:40:28.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🐂🍺","html":"

🐂🍺

\n","updatedAt":"2025-02-21T06:40:28.067Z","author":{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","fullname":"Chujie Zheng","name":"chujiezheng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":85,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"it","probability":0.409974604845047},"editors":["chujiezheng"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg"],"reactions":[],"isReport":false}},{"id":"67b82ed9ad43789401cbe1a4","author":{"_id":"65084ce79db6e2495cc52cc6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65084ce79db6e2495cc52cc6/MbwRARtrDfjVi_DHZW6-r.jpeg","fullname":"Tianhao Cheng","name":"crazycth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"createdAt":"2025-02-21T07:44:25.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🐮\n","html":"

🐮

\n","updatedAt":"2025-02-21T07:44:25.649Z","author":{"_id":"65084ce79db6e2495cc52cc6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65084ce79db6e2495cc52cc6/MbwRARtrDfjVi_DHZW6-r.jpeg","fullname":"Tianhao Cheng","name":"crazycth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3119780123233795},"editors":["crazycth"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65084ce79db6e2495cc52cc6/MbwRARtrDfjVi_DHZW6-r.jpeg"],"reactions":[],"isReport":false}},{"id":"67b83aeb99159e6fc937586d","author":{"_id":"64c910233d5a0dfed5ce5abb","avatarUrl":"/avatars/8c73f380219c05ae7e7c2fad75a570d8.svg","fullname":"dma2077","name":"dma2077","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2025-02-21T08:35:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🐮","html":"

🐮

\n","updatedAt":"2025-02-21T08:35:55.325Z","author":{"_id":"64c910233d5a0dfed5ce5abb","avatarUrl":"/avatars/8c73f380219c05ae7e7c2fad75a570d8.svg","fullname":"dma2077","name":"dma2077","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3119780123233795},"editors":["dma2077"],"editorAvatarUrls":["/avatars/8c73f380219c05ae7e7c2fad75a570d8.svg"],"reactions":[],"isReport":false}},{"id":"67b874c05ca6113f966476d9","author":{"_id":"6382252f54421460665ec501","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6382252f54421460665ec501/gW9fev3T5QPcNq4f9hqB1.jpeg","fullname":"Yizhi Li","name":"yizhilll","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":29,"isUserFollowing":false},"createdAt":"2025-02-21T12:42:40.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🐂!","html":"

🐂!

\n","updatedAt":"2025-02-21T12:42:40.574Z","author":{"_id":"6382252f54421460665ec501","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6382252f54421460665ec501/gW9fev3T5QPcNq4f9hqB1.jpeg","fullname":"Yizhi Li","name":"yizhilll","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":29,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"wuu","probability":0.9812566637992859},"editors":["yizhilll"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6382252f54421460665ec501/gW9fev3T5QPcNq4f9hqB1.jpeg"],"reactions":[],"isReport":false}},{"id":"67b87aa9f19fe6332dfd869b","author":{"_id":"65bb11cb00a03997849e9e85","avatarUrl":"/avatars/17022b0254192a837f4fe00d84389cda.svg","fullname":"Wenhao Huang","name":"StephenHuang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2025-02-21T13:07:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🐮","html":"

🐮

\n","updatedAt":"2025-02-21T13:07:53.254Z","author":{"_id":"65bb11cb00a03997849e9e85","avatarUrl":"/avatars/17022b0254192a837f4fe00d84389cda.svg","fullname":"Wenhao Huang","name":"StephenHuang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3119780123233795},"editors":["StephenHuang"],"editorAvatarUrls":["/avatars/17022b0254192a837f4fe00d84389cda.svg"],"reactions":[],"isReport":false}},{"id":"67b8977fd01134f898894624","author":{"_id":"66751c722b487c2e015a1f60","avatarUrl":"/avatars/d66a98b625451ccea1b4dfcdaf623304.svg","fullname":"lin","name":"adams6435","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-02-21T15:10:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"good👍🏻","html":"

good👍🏻

\n","updatedAt":"2025-02-21T15:10:55.278Z","author":{"_id":"66751c722b487c2e015a1f60","avatarUrl":"/avatars/d66a98b625451ccea1b4dfcdaf623304.svg","fullname":"lin","name":"adams6435","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4173163175582886},"editors":["adams6435"],"editorAvatarUrls":["/avatars/d66a98b625451ccea1b4dfcdaf623304.svg"],"reactions":[{"reaction":"👍","users":["ShiwenNi"],"count":1}],"isReport":false}},{"id":"67b9298eb9ae7ee8eccc1810","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-02-22T01:34:06.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning](https://huggingface.co/papers/2502.02871) (2025)\n* [IOLBENCH: Benchmarking LLMs on Linguistic Reasoning](https://huggingface.co/papers/2501.04249) (2025)\n* [Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges](https://huggingface.co/papers/2502.08680) (2025)\n* [Text2World: Benchmarking Large Language Models for Symbolic World Model Generation](https://huggingface.co/papers/2502.13092) (2025)\n* [HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning](https://huggingface.co/papers/2502.11393) (2025)\n* [MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark](https://huggingface.co/papers/2501.16688) (2025)\n* [UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models](https://huggingface.co/papers/2501.13766) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-02-22T01:34:06.401Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7143529057502747},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"67bc1eee113c3e29a286f8f8","author":{"_id":"641944acf9d6f1d772ede902","avatarUrl":"/avatars/a2f88017f7ab0cbea6862201e35fe747.svg","fullname":"ShiwenNi","name":"ShiwenNi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":15,"isUserFollowing":false},"createdAt":"2025-02-24T07:25:34.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🐂","html":"

🐂

\n","updatedAt":"2025-02-24T07:25:34.256Z","author":{"_id":"641944acf9d6f1d772ede902","avatarUrl":"/avatars/a2f88017f7ab0cbea6862201e35fe747.svg","fullname":"ShiwenNi","name":"ShiwenNi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":15,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.24113425612449646},"editors":["ShiwenNi"],"editorAvatarUrls":["/avatars/a2f88017f7ab0cbea6862201e35fe747.svg"],"reactions":[],"isReport":false}},{"id":"67bd7830279996003e559ff6","createdAt":"2025-02-25T07:58:40.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🐂🍺","html":"

🐂🍺

\n","updatedAt":"2025-02-25T07:58:40.660Z"},"numEdits":0,"identifiedLanguage":{"language":"it","probability":0.409974604845047},"editors":["deleted"],"editorAvatarUrls":["deleted"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.14739","authors":[{"_id":"67b7efc26348a1df80a8ae53","name":"M-A-P Team","hidden":false},{"_id":"67b7efc26348a1df80a8ae54","user":{"_id":"654907a4a1faff97850c4eff","avatarUrl":"/avatars/458c90151614bc7f116943b6e67d6b8a.svg","isPro":false,"fullname":"du","user":"dododododo","type":"user"},"name":"Xinrun Du","status":"claimed_verified","statusLastChangedAt":"2025-02-21T14:42:53.525Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae55","name":"Yifan Yao","hidden":false},{"_id":"67b7efc26348a1df80a8ae56","user":{"_id":"65eb65722fbf6807134a636c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65eb65722fbf6807134a636c/8YgrynSM-WExVlu2mwRBY.jpeg","isPro":false,"fullname":"Kaijing Ma","user":"mkj69","type":"user"},"name":"Kaijing Ma","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:20:18.121Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae57","user":{"_id":"658d0a228cff48d3a4612689","avatarUrl":"/avatars/70e297c6cb12d1bdde6d91c23f590b63.svg","isPro":false,"fullname":"Bingli Wang","user":"BingliW","type":"user"},"name":"Bingli Wang","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:20:24.792Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae58","user":{"_id":"64ab99dcb76bfd863eba64c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ab99dcb76bfd863eba64c1/UBXwDPx17X-gl-SzBPvrc.jpeg","isPro":false,"fullname":"TY.Zheng","user":"aaabiao","type":"user"},"name":"Tianyu Zheng","status":"claimed_verified","statusLastChangedAt":"2025-02-21T09:58:24.002Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae59","user":{"_id":"6578265ddea7e2122d02f6ba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6578265ddea7e2122d02f6ba/Bh6JjoVF5ceLSjV7Z7nTk.jpeg","isPro":false,"fullname":"king zhu","user":"kangz","type":"user"},"name":"Kang Zhu","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:20:45.485Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae5a","user":{"_id":"6417d9ea8f689506e7148417","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6417d9ea8f689506e7148417/bAYcruWNw4WvmuQcGgcwC.jpeg","isPro":false,"fullname":"minghao","user":"Liam-Liu","type":"user"},"name":"Minghao Liu","status":"claimed_verified","statusLastChangedAt":"2025-02-21T09:58:25.894Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae5b","user":{"_id":"6555e8d8a0c34cd61a6b9ce3","avatarUrl":"/avatars/71dc562cef4bd42f6b762f036357c800.svg","isPro":false,"fullname":"yimingliang","user":"yimingliang","type":"user"},"name":"Yiming Liang","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:21:08.553Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae5c","name":"Xiaolong Jin","hidden":false},{"_id":"67b7efc26348a1df80a8ae5d","user":{"_id":"67375a6ae6b1d15ff5359a54","avatarUrl":"/avatars/9d32d9e3bfb43b8d001c6ddeae720ec5.svg","isPro":false,"fullname":"Zela","user":"vzl123","type":"user"},"name":"Zhenlin Wei","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:21:31.565Z","hidden":true},{"_id":"67b7efc26348a1df80a8ae5e","user":{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","isPro":false,"fullname":"Chujie Zheng","user":"chujiezheng","type":"user"},"name":"Chujie Zheng","status":"claimed_verified","statusLastChangedAt":"2025-02-21T09:58:34.124Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae5f","name":"Kaixing Deng","hidden":false},{"_id":"67b7efc26348a1df80a8ae60","name":"Shuyue Guo","hidden":false},{"_id":"67b7efc26348a1df80a8ae61","name":"Shian Jia","hidden":false},{"_id":"67b7efc26348a1df80a8ae62","user":{"_id":"675085408119fa5fac3cd7cf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0Mrrkzhv0wBggP5kGKtSt.png","isPro":false,"fullname":"jiangsichao","user":"jsc137","type":"user"},"name":"Sichao Jiang","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:22:09.140Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae63","user":{"_id":"67a9d186b22571659007a43d","avatarUrl":"/avatars/79b06cf0983083b6161374e66a8c51b2.svg","isPro":false,"fullname":"Yiyan Liao","user":"yiyanliao","type":"user"},"name":"Yiyan Liao","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:22:15.619Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae64","name":"Rui Li","hidden":false},{"_id":"67b7efc26348a1df80a8ae65","name":"Qinrui Li","hidden":false},{"_id":"67b7efc26348a1df80a8ae66","user":{"_id":"67ab7826ab5ebf181a7f78d7","avatarUrl":"/avatars/d6baf414011d6df659da4eb58e9d8958.svg","isPro":false,"fullname":"Sirun Li","user":"inorganicwriter","type":"user"},"name":"Sirun Li","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:22:30.478Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae67","user":{"_id":"6382252f54421460665ec501","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6382252f54421460665ec501/gW9fev3T5QPcNq4f9hqB1.jpeg","isPro":false,"fullname":"Yizhi Li","user":"yizhilll","type":"user"},"name":"Yizhi Li","status":"claimed_verified","statusLastChangedAt":"2025-02-21T14:42:51.449Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae68","name":"Yunwen Li","hidden":false},{"_id":"67b7efc26348a1df80a8ae69","name":"Dehua Ma","hidden":false},{"_id":"67b7efc26348a1df80a8ae6a","user":{"_id":"64de37ee5e192985054be575","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64de37ee5e192985054be575/fVV7JQMtp_J3uFqszJJHH.jpeg","isPro":false,"fullname":"Yuansheng Ni","user":"yuanshengni","type":"user"},"name":"Yuansheng Ni","status":"claimed_verified","statusLastChangedAt":"2025-02-21T09:58:30.371Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae6b","name":"Haoran Que","hidden":false},{"_id":"67b7efc26348a1df80a8ae6c","user":{"_id":"64560618bfdf9c63ce2d658a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64560618bfdf9c63ce2d658a/GVBWU4yNzRsjdyzKT3z3B.jpeg","isPro":false,"fullname":"Mathsion Wong","user":"QiYao-Wang","type":"user"},"name":"Qiyao Wang","status":"claimed_verified","statusLastChangedAt":"2025-02-21T09:58:28.639Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae6d","user":{"_id":"6761461f6b3a8119fec25257","avatarUrl":"/avatars/559321b3dcf8d39b5f6e17a158f33e8e.svg","isPro":false,"fullname":"Maxwell Wen","user":"MaxwellWen","type":"user"},"name":"Zhoufutu Wen","status":"claimed_verified","statusLastChangedAt":"2025-08-27T15:00:00.071Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae6e","user":{"_id":"656d97b10bbc114fe64a96c5","avatarUrl":"/avatars/fd23bae1d85c5b96c42064a5ddcfad41.svg","isPro":false,"fullname":"SiweiWu","user":"SiweiWu","type":"user"},"name":"Siwei Wu","status":"claimed_verified","statusLastChangedAt":"2025-03-16T21:18:56.110Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae6f","user":{"_id":"65d2251f98b4a470bf6a26e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d2251f98b4a470bf6a26e3/C4T0LHYGejrI9mu_k3M8p.jpeg","isPro":false,"fullname":"xts","user":"xtsssss","type":"user"},"name":"Tianshun Xing","status":"claimed_verified","statusLastChangedAt":"2025-07-10T09:13:09.320Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae70","name":"Ming Xu","hidden":false},{"_id":"67b7efc26348a1df80a8ae71","name":"Zhenzhu Yang","hidden":false},{"_id":"67b7efc26348a1df80a8ae72","name":"Zekun Moore Wang","hidden":false},{"_id":"67b7efc26348a1df80a8ae73","name":"Junting Zhou","hidden":false},{"_id":"67b7efc26348a1df80a8ae74","name":"Yuelin Bai","hidden":false},{"_id":"67b7efc26348a1df80a8ae75","user":{"_id":"6444e7765691ca69b0d95856","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6444e7765691ca69b0d95856/WbtUXVbC2VIribtVAsLp7.jpeg","isPro":false,"fullname":"Xingyuan Bu","user":"sefira32","type":"user"},"name":"Xingyuan Bu","status":"claimed_verified","statusLastChangedAt":"2025-04-09T14:37:08.706Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae76","user":{"_id":"64f9c21b681224dbe49a2280","avatarUrl":"/avatars/df26cc4b4c6105af2c77392db61e3a27.svg","isPro":false,"fullname":"caichenglin","user":"easy4mego","type":"user"},"name":"Chenglin Cai","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:23:17.731Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae77","name":"Liang Chen","hidden":false},{"_id":"67b7efc26348a1df80a8ae78","name":"Yifan Chen","hidden":false},{"_id":"67b7efc26348a1df80a8ae79","name":"Chengtuo Cheng","hidden":false},{"_id":"67b7efc26348a1df80a8ae7a","name":"Tianhao Cheng","hidden":false},{"_id":"67b7efc26348a1df80a8ae7b","name":"Keyi Ding","hidden":false},{"_id":"67b7efc26348a1df80a8ae7c","name":"Siming Huang","hidden":false},{"_id":"67b7efc26348a1df80a8ae7d","user":{"_id":"682d4f2a95452d00fd98c042","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/VYnvcnIno3I1f2FbbFcBi.png","isPro":false,"fullname":"Yuna Huang","user":"YunaHuang","type":"user"},"name":"Yun Huang","status":"claimed_verified","statusLastChangedAt":"2026-01-07T09:29:30.057Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae7e","name":"Yaoru Li","hidden":false},{"_id":"67b7efc26348a1df80a8ae7f","name":"Yizhe Li","hidden":false},{"_id":"67b7efc26348a1df80a8ae80","name":"Zhaoqun Li","hidden":false},{"_id":"67b7efc26348a1df80a8ae81","user":{"_id":"6703ec213df5fe425086ef73","avatarUrl":"/avatars/e6f9dad6587ee0883ae10f8805ab7ea9.svg","isPro":true,"fullname":"Tianhao Liang","user":"tianhao2k","type":"user"},"name":"Tianhao Liang","status":"claimed_verified","statusLastChangedAt":"2025-04-11T08:23:45.498Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae82","user":{"_id":"66751c722b487c2e015a1f60","avatarUrl":"/avatars/d66a98b625451ccea1b4dfcdaf623304.svg","isPro":false,"fullname":"lin","user":"adams6435","type":"user"},"name":"Chengdong Lin","status":"claimed_verified","statusLastChangedAt":"2025-02-21T15:15:33.213Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae83","name":"Hongquan Lin","hidden":false},{"_id":"67b7efc26348a1df80a8ae84","name":"Yinghao Ma","hidden":false},{"_id":"67b7efc26348a1df80a8ae85","user":{"_id":"63299f93688ad82b783aaf20","avatarUrl":"/avatars/7c11e60e551ef1c62aa2862529e357f5.svg","isPro":false,"fullname":"zhongyuan peng","user":"happzy2633","type":"user"},"name":"Zhongyuan Peng","status":"claimed_verified","statusLastChangedAt":"2025-07-09T15:11:42.455Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae86","user":{"_id":"65adda5299c3bd19c74d6a8d","avatarUrl":"/avatars/1ce504b64ab60f375b235ebaf81cafd6.svg","isPro":false,"fullname":"PENG ZIFAN","user":"Ziffer","type":"user"},"name":"Zifan Peng","status":"claimed_verified","statusLastChangedAt":"2025-02-21T09:58:20.429Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae87","name":"Qige Qi","hidden":false},{"_id":"67b7efc26348a1df80a8ae88","user":{"_id":"6644bb2c9bdbd85493074411","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6644bb2c9bdbd85493074411/iug8dkP1zjID-kXzNxDjD.jpeg","isPro":false,"fullname":"SHI QIU","user":"StarThomas1002","type":"user"},"name":"Shi Qiu","status":"claimed_verified","statusLastChangedAt":"2025-04-25T08:36:12.459Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae89","name":"Xingwei Qu","hidden":false},{"_id":"67b7efc26348a1df80a8ae8a","name":"Yizhou Tan","hidden":false},{"_id":"67b7efc26348a1df80a8ae8b","name":"Zili Wang","hidden":false},{"_id":"67b7efc26348a1df80a8ae8c","name":"Chenqing Wang","hidden":false},{"_id":"67b7efc26348a1df80a8ae8d","name":"Hao Wang","hidden":false},{"_id":"67b7efc26348a1df80a8ae8e","name":"Yiya Wang","hidden":false},{"_id":"67b7efc26348a1df80a8ae8f","user":{"_id":"636a35eff8d9af4aea181608","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/636a35eff8d9af4aea181608/s9GFJYd_QXVbg0Lb4JpKj.jpeg","isPro":false,"fullname":"yubo","user":"ubowang","type":"user"},"name":"Yubo Wang","status":"claimed_verified","statusLastChangedAt":"2025-04-08T06:58:21.107Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae90","name":"Jiajun Xu","hidden":false},{"_id":"67b7efc26348a1df80a8ae91","name":"Kexin Yang","hidden":false},{"_id":"67b7efc26348a1df80a8ae92","user":{"_id":"5fd6f670053c8345eddc1b68","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd6f670053c8345eddc1b68/cuTsu2krRYHC6zYGD2dpQ.jpeg","isPro":false,"fullname":"Ruibin Yuan","user":"a43992899","type":"user"},"name":"Ruibin Yuan","status":"claimed_verified","statusLastChangedAt":"2025-03-12T08:44:01.351Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae93","name":"Yuanhao Yue","hidden":false},{"_id":"67b7efc26348a1df80a8ae94","user":{"_id":"64245d68089d5fae56b447ec","avatarUrl":"/avatars/e57e78c5553cf691b26ab22ee7ab54e2.svg","isPro":false,"fullname":"Tianyang Zhan","user":"Neptune233","type":"user"},"name":"Tianyang Zhan","status":"claimed_verified","statusLastChangedAt":"2025-12-25T20:58:51.508Z","hidden":false},{"_id":"67b7efc26348a1df80a8ae95","name":"Chun Zhang","hidden":false},{"_id":"67b7efc26348a1df80a8ae96","name":"Jingyang Zhang","hidden":false},{"_id":"67b7efc26348a1df80a8ae97","name":"Xiyue Zhang","hidden":false},{"_id":"67b7efc26348a1df80a8ae98","name":"Xingjian Zhang","hidden":false},{"_id":"67b7efc26348a1df80a8ae99","name":"Yue Zhang","hidden":false},{"_id":"67b7efc26348a1df80a8ae9a","name":"Yongchi Zhao","hidden":false},{"_id":"67b7efc26348a1df80a8ae9b","name":"Xiangyu Zheng","hidden":false},{"_id":"67b7efc26348a1df80a8ae9c","name":"Chenghua Zhong","hidden":false},{"_id":"67b7efc26348a1df80a8ae9d","name":"Yang Gao","hidden":false},{"_id":"67b7efc26348a1df80a8ae9e","name":"Zhoujun Li","hidden":false},{"_id":"67b7efc26348a1df80a8ae9f","name":"Dayiheng Liu","hidden":false},{"_id":"67b7efc26348a1df80a8aea0","user":{"_id":"612ee6a7b960e78c6d2319d4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/612ee6a7b960e78c6d2319d4/2Hu9BaAyXbyh1vt0v1Qui.jpeg","isPro":false,"fullname":"Qian Liu","user":"SivilTaram","type":"user"},"name":"Qian Liu","status":"claimed_verified","statusLastChangedAt":"2025-02-21T09:58:32.399Z","hidden":false},{"_id":"67b7efc26348a1df80a8aea1","name":"Tianyu Liu","hidden":false},{"_id":"67b7efc26348a1df80a8aea2","name":"Shiwen Ni","hidden":false},{"_id":"67b7efc26348a1df80a8aea3","name":"Junran Peng","hidden":false},{"_id":"67b7efc26348a1df80a8aea4","name":"Yujia Qin","hidden":false},{"_id":"67b7efc26348a1df80a8aea5","name":"Wenbo Su","hidden":false},{"_id":"67b7efc26348a1df80a8aea6","user":{"_id":"6490d4ba1afdee3acd1147f6","avatarUrl":"/avatars/ae13c7b21fe9ced7541dcd664d1b94ed.svg","isPro":false,"fullname":"Guoyin Wang","user":"guoyinwang","type":"user"},"name":"Guoyin Wang","status":"admin_assigned","statusLastChangedAt":"2025-02-21T10:23:25.946Z","hidden":false},{"_id":"67b7efc26348a1df80a8aea7","name":"Shi Wang","hidden":false},{"_id":"67b7efc26348a1df80a8aea8","name":"Jian Yang","hidden":false},{"_id":"67b7efc26348a1df80a8aea9","name":"Min Yang","hidden":false},{"_id":"67b7efc26348a1df80a8aeaa","name":"Meng Cao","hidden":false},{"_id":"67b7efc26348a1df80a8aeab","name":"Xiang Yue","hidden":false},{"_id":"67b7efc26348a1df80a8aeac","name":"Zhaoxiang Zhang","hidden":false},{"_id":"67b7efc26348a1df80a8aead","user":{"_id":"628c8598ef14f971b698107f","avatarUrl":"/avatars/3a4ad87e6b5f9e836a1160d869df1447.svg","isPro":false,"fullname":"Zhou","user":"Wangchunshu","type":"user"},"name":"Wangchunshu Zhou","status":"claimed_verified","statusLastChangedAt":"2025-06-18T12:17:59.715Z","hidden":false},{"_id":"67b7efc26348a1df80a8aeae","user":{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},"name":"Jiaheng Liu","status":"claimed_verified","statusLastChangedAt":"2025-02-21T09:58:22.185Z","hidden":false},{"_id":"67b7efc26348a1df80a8aeaf","name":"Qunshu Lin","hidden":false},{"_id":"67b7efc26348a1df80a8aeb0","name":"Wenhao Huang","hidden":false},{"_id":"67b7efc26348a1df80a8aeb1","user":{"_id":"638efcf4c67af472d316d424","avatarUrl":"/avatars/97a57859d7d87a3a8f1bb41d32a72bc2.svg","isPro":false,"fullname":"Ge Zhang","user":"zhangysk","type":"user"},"name":"Ge Zhang","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:17:53.865Z","hidden":false}],"publishedAt":"2025-02-20T17:05:58.000Z","submittedOnDailyAt":"2025-02-21T00:45:33.133Z","title":"SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Large language models (LLMs) have demonstrated remarkable proficiency in\nmainstream academic disciplines such as mathematics, physics, and computer\nscience. However, human knowledge encompasses over 200 specialized disciplines,\nfar exceeding the scope of existing benchmarks. The capabilities of LLMs in\nmany of these specialized fields-particularly in light industry, agriculture,\nand service-oriented disciplines-remain inadequately evaluated. To address this\ngap, we present SuperGPQA, a comprehensive benchmark that evaluates\ngraduate-level knowledge and reasoning capabilities across 285 disciplines. Our\nbenchmark employs a novel Human-LLM collaborative filtering mechanism to\neliminate trivial or ambiguous questions through iterative refinement based on\nboth LLM responses and expert feedback. Our experimental results reveal\nsignificant room for improvement in the performance of current state-of-the-art\nLLMs across diverse knowledge domains (e.g., the reasoning-focused model\nDeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting\nthe considerable gap between current model capabilities and artificial general\nintelligence. Additionally, we present comprehensive insights from our\nmanagement of a large-scale annotation process, involving over 80 expert\nannotators and an interactive Human-LLM collaborative system, offering valuable\nmethodological guidance for future research initiatives of comparable scope.","upvotes":108,"discussionId":"67b7efc66348a1df80a8afc8","projectPage":"https://supergpqa.github.io/","ai_summary":"SuperGPQA, a benchmark evaluating LLMs across 285 disciplines, reveals performance gaps and offers insights into the collaborative filtering process with human experts.","ai_keywords":["LLMs","Large language models","SuperGPQA","Human-LLM collaborative filtering","DeepSeek-R1"],"organization":{"_id":"67d1140985ea0644e2f14b99","name":"ByteDance-Seed","fullname":"ByteDance Seed","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6535c9e88bde2fae19b6fb25/flkDUqd_YEuFsjeNET3r-.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62567c86d444a9b5a0ec51c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62567c86d444a9b5a0ec51c1/1vXJf2uGztPcXpkwyTBr6.png","isPro":false,"fullname":"Dongfu Jiang","user":"DongfuJiang","type":"user"},{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","isPro":false,"fullname":"Chujie Zheng","user":"chujiezheng","type":"user"},{"_id":"612ee6a7b960e78c6d2319d4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/612ee6a7b960e78c6d2319d4/2Hu9BaAyXbyh1vt0v1Qui.jpeg","isPro":false,"fullname":"Qian Liu","user":"SivilTaram","type":"user"},{"_id":"64de37ee5e192985054be575","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64de37ee5e192985054be575/fVV7JQMtp_J3uFqszJJHH.jpeg","isPro":false,"fullname":"Yuansheng Ni","user":"yuanshengni","type":"user"},{"_id":"662f1824e4a1f7302a7de73d","avatarUrl":"/avatars/80207a668901e08a4c1d64987fd36073.svg","isPro":false,"fullname":"YifanYao","user":"Yifan12","type":"user"},{"_id":"65bb11cb00a03997849e9e85","avatarUrl":"/avatars/17022b0254192a837f4fe00d84389cda.svg","isPro":false,"fullname":"Wenhao Huang","user":"StephenHuang","type":"user"},{"_id":"64405a9d518271b0d1beea38","avatarUrl":"/avatars/b702474588fd7090773320422417a582.svg","isPro":false,"fullname":"Weiming Ren","user":"wren93","type":"user"},{"_id":"6578265ddea7e2122d02f6ba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6578265ddea7e2122d02f6ba/Bh6JjoVF5ceLSjV7Z7nTk.jpeg","isPro":false,"fullname":"king zhu","user":"kangz","type":"user"},{"_id":"643f37cce9d063936912048b","avatarUrl":"/avatars/25822ea5676a79b2e1ddf08d5fc2226c.svg","isPro":false,"fullname":"Yujia Qin","user":"YujiaHi","type":"user"},{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},{"_id":"65adda5299c3bd19c74d6a8d","avatarUrl":"/avatars/1ce504b64ab60f375b235ebaf81cafd6.svg","isPro":false,"fullname":"PENG ZIFAN","user":"Ziffer","type":"user"},{"_id":"638efcf4c67af472d316d424","avatarUrl":"/avatars/97a57859d7d87a3a8f1bb41d32a72bc2.svg","isPro":false,"fullname":"Ge Zhang","user":"zhangysk","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3,"organization":{"_id":"67d1140985ea0644e2f14b99","name":"ByteDance-Seed","fullname":"ByteDance Seed","avatar":"https://cdn-uploads.huggingface.co/production/uploads/6535c9e88bde2fae19b6fb25/flkDUqd_YEuFsjeNET3r-.png"}}">
Papers
arxiv:2502.14739

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Published on Feb 20, 2025
· Submitted by
AK
on Feb 21, 2025
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,

Abstract

SuperGPQA, a benchmark evaluating LLMs across 285 disciplines, reveals performance gaps and offers insights into the collaborative filtering process with human experts.

AI-generated summary

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

Community

Paper submitter
Paper author

🐂🍺

🐮

🐮

Paper author

🐂!

Paper author

good👍🏻

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

🐂

deleted

🐂🍺

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.14739 in a model README.md to link it from this page.

Datasets citing this paper 4

Spaces citing this paper 2

Collections including this paper 10