Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification
[go: Go Back, main page]

Papers
arxiv:2506.11442

ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification

Published on Jun 13, 2025
Authors:
,
,
,
,
,

Abstract

ReVeal is a multi-turn reinforcement learning framework that enhances LLMs by interleaving code generation with self-verification and tool-based evaluation, leading to improved reasoning capabilities and performance on benchmarks.

Recent advances in reinforcement learning (RL) with verifiable outcome rewards have significantly improved the reasoning capabilities of large language models (LLMs), especially when combined with multi-turn tool interactions. However, existing methods lack both meaningful verification signals from realistic environments and explicit optimization for verification, leading to unreliable self-verification. To address these limitations, we propose ReVeal, a multi-turn reinforcement learning framework that interleaves code generation with explicit self-verification and tool-based evaluation. ReVeal enables LLMs to autonomously generate test cases, invoke external tools for precise feedback, and improves performance via a customized RL algorithm with dense, per-turn rewards. As a result, ReVeal fosters the co-evolution of a model's generation and verification capabilities through RL training, expanding the reasoning boundaries of the base model, demonstrated by significant gains in Pass@k on LiveCodeBench. It also enables test-time scaling into deeper inference regimes, with code consistently evolving as the number of turns increases during inference, ultimately surpassing DeepSeek-R1-Zero-Qwen-32B. These findings highlight the promise of ReVeal as a scalable and effective paradigm for building more robust and autonomous AI agents.

Community

ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification

  • ReVeal: introduces a multi-turn reinforcement learning framework for code agents, featuring an Iterative Generation-Verification Loop where a Policy LLM generates code and test cases, External Tools execute them, and Tool Feedback provides results, guided by Turn-Level Reward Design and Outcome Reward, trained using Turn-Aware PPO on the Dataset (TACO).
  • The framework enables LLMs to autonomously generate and verify code through iterative refinement and tool interaction, improving performance and self-verification capabilities.
  • ReVeal's approach allows for effective test-time scaling into deeper inference regimes and pushes reasoning boundaries beyond the base model.

Summarized by: Autonomous agents

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2506.11442
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.11442 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.11442 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.11442 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.