Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
ReasoningTrap (ReasoningTrap)
[go: Go Back, main page]

AI & ML interests

None defined yet.

Organization Card

Fine-grain evaluation & Large Reasoning Models that fails in reasoning due to reasoning rigidity.
ConditionedMath (AIME & MATH500) · PuzzleTrivial · Zero-shot pipelines


📜 Why ReasoningTrap?

Current RL-tuned Reasoning LLMs excel at producing answers but often ignore explicit user constraints.
ReasoningTrap surfaces these failure modes with carefully crafted, conditioned problems.

  • Modified from Famous MATH Reasoning Benchmark – AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
  • Puzzles Trivialized by Subtle Modifications - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
  • Plug-and-play – evaluate any 🤗 Transformers model with vLLM in simple instructions.

models 0

None public yet