Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
opencompass (OpenCompass)

OpenCompass

community

https://opencompass.org.cn/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

dongsheng submitted a paper 1 day ago

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

dongsheng authored a paper 4 days ago

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

ZwwWayne authored a paper 5 days ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

View all activity

Papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

View all Papers

Organization Card

Community About org cards

OpenCompass Website ^HOT OpenCompass Toolkit ^{TRY IT OUT}

👋 join us on Discord and WeChat

follow us on Github

OpenCompass is a platform focused on evaluation of AGI, include Large Language Model and Multi-modality Model. We aim to:

develop high-quality libraries to reduce the difficulties in evaluation
provide convincing leaderboards for improving the understanding of the large models
create powerful toolchains targeting a variety of abilities and tasks
build solid benchmarks to support the large model research

Collections 3

View 3 collections

spaces 18

Open VLM Leaderboard

VLMEvalKit Evaluation Results Collection

ATLAS Benchmark

ATLAS for Frontier Scientific Benchmark

RISEBench Gallery

A Gallery of Generation Results on RISEBench

Open LMM Spatial Leaderboard

A Leaderboard for LMM spatial understanding capabilities

Open LMM Subjective Leaderboard

VLMEvalKit Subjectivce Benchmark Results

CompassAcademic Leaderboard Full Version

Compass Academic Leaderboard Full Version

models 13

opencompass/CompassVerifier-3B

3B • Updated Jan 4 • 420 • 7

opencompass/CompassVerifier-32B

33B • Updated Jan 4 • 17 • 7

opencompass/CompassVerifier-7B

8B • Updated Nov 26, 2025 • 26 • 4

opencompass/CompassJudger-2-7B-Instruct

Text Ranking • 8B • Updated Jul 22, 2025 • 158 • 3

opencompass/CompassJudger-2-32B-Instruct

Text Ranking • 33B • Updated Jul 22, 2025 • 1.42k • 3

opencompass/anah-7b

Text Classification • 8B • Updated Mar 8, 2025 • 8

opencompass/anah-20b

Text Classification • 20B • Updated Mar 8, 2025 • 10

opencompass/anah-v2

Text Classification • 8B • Updated Mar 8, 2025 • 15 • 4

opencompass/CompassJudger-1-14B-Instruct

Text Generation • 15B • Updated Oct 30, 2024 • 11 • • 2

opencompass/CompassJudger-1-32B-Instruct

Text Generation • 33B • Updated Oct 30, 2024 • 23 • • 18

datasets 17

opencompass/NeedleBench

Viewer • Updated 7 days ago • 6.8k • 20k • 5

opencompass/TextEdit

Viewer • Updated Mar 15 • 2.15k • 479 • 9

opencompass/ATLAS

Viewer • Updated Nov 20, 2025 • 798 • 140

opencompass/ReasonZoo

Updated Aug 27, 2025 • 100

opencompass/VerifierBench

Viewer • Updated Aug 26, 2025 • 2.82k • 238 • 4

opencompass/LiveMathBench

Viewer • Updated Aug 5, 2025 • 483 • 513 • 12

opencompass/CodeForce_SAGA

Viewer • Updated Aug 1, 2025 • 5.57k • 143 • 1

opencompass/CodeCompass

Updated Aug 1, 2025 • 213 • 1

opencompass/compass_academic_predictions

Viewer • Updated Apr 7, 2025 • 4.42M • 7

opencompass/Creation-MMBench

Viewer • Updated Mar 19, 2025 • 765 • 286 • 3

View 17 datasets