Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
evoeval (EvoEval)
[go: Go Back, main page]

AI & ML interests

None defined yet.

Organization Card

EvoEval: Evolving Coding Benchmarks via LLM

EvoEval1 is a holistic benchmark suite created by evolving HumanEval problems:

  • 🔥 Contains 828 new problems across 5 🌠 semantic-altering and 2 ⭐ semantic-preserving benchmarks
  • 🔮 Allows evaluation/comparison across different dimensions and problem types (i.e., Difficult, Creative or Tool Use problems). See our visualization tool for ready-to-use comparison
  • 🏆 Complete with leaderboard, groundtruth solutions, robust testcases and evaluation scripts to easily fit into your evaluation pipeline
  • 🤖 Generated LLM code samples from >50 different models to save you time in running experiments

1 coincidentally similar pronunciation with 😈 EvilEval

models 0

None public yet