TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification Paper β’ 2604.14531 β’ Published 3 days ago β’ 6
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model Paper β’ 2603.05438 β’ Published Mar 5 β’ 40
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper β’ 2601.18137 β’ Published Jan 26 β’ 35
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper β’ 2512.13168 β’ Published Dec 15, 2025 β’ 53
RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies Paper β’ 2510.17950 β’ Published Oct 20, 2025 β’ 9