Blog

Learning with Verbal Feedback

May 31, 2026

After scores and checkmarks, the next reward is a sentence. An opinionated tour through the arc of feedback in RL for LLMs — scalar RLHF, verifiable RLVR, and the rise of verbal feedback — culminating in Ditto.

Thinking in RL

Apr 13, 2026

An opinionated tour through the algorithm tree of modern LLM RL — PPO, GRPO, REINFORCE, REINFORCE++, DPO, and the theoretical ideas that tie them together.

The Quest of User-Effective AI Agents

Nov 2, 2025

Exploring what makes AI agents truly effective for users, beyond benchmark performance.

The overlooked "bad" word list ☠️

Dec 15, 2024

Stop using outdated bad word lists. Use ToxicTrig instead for better toxic language analysis.