Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 7 days ago • 82
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents Paper • 2604.14004 • Published 6 days ago • 29
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen3-14B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 6 days ago • 13
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen3-14B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 6 days ago • 13
Sangsang/grpo_Qwen3-0.6B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 8 days ago • 11
Sangsang/grpo_Qwen3-0.6B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 8 days ago • 11
Sangsang/feedback_asymmetric_fixed_ema_Llama-3.1-8B-Instruct_bw0p5_fw0p5_ema0p999_ep30_v2 Text Generation • Updated 9 days ago • 18
Sangsang/feedback_asymmetric_fixed_ema_Llama-3.1-8B-Instruct_bw0p5_fw0p5_ema0p999_ep30_v2 Text Generation • Updated 9 days ago • 18