You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass Paper • 2604.10966 • Published 12 days ago • 11
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models Paper • 2603.24575 • Published about 1 month ago • 18
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos Paper • 2602.23543 • Published Feb 26 • 9
Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL Paper • 2601.09876 • Published Jan 14 • 7
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification Paper • 2506.15569 • Published Jun 18, 2025 • 12
Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL Paper • 2601.09876 • Published Jan 14 • 7
LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation Paper • 2412.15188 • Published Dec 19, 2024 • 2
MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation Paper • 2506.07999 • Published Jun 9, 2025 • 2
TV2TV: A Unified Framework for Interleaved Language and Video Generation Paper • 2512.05103 • Published Dec 4, 2025 • 20
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments Paper • 2510.01179 • Published Oct 1, 2025 • 28
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL Paper • 2505.23977 • Published May 29, 2025 • 10
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL Paper • 2505.23977 • Published May 29, 2025 • 10