I lead the alignment team at Anthropic, where I'm hoping to reduce existential risks from AI systems. I led the team that developed Constitutional Classifiers, the first approach capable of robustly preventing most bad actors from obtaining harmful information from AI systems; Constitutional Classifiers enabled Anthropic to deploy Claude 4 Opus and subsequent models, despite their ability to assist in advanced weapons development. I helped to develop Retrieval-Augmented Generation (RAG), a widely used approach for augmenting large language models with other sources of information. I also introduced Automated Red Teaming, which is used across major frontier AI labs for pre-deployment model testing. I received a best paper award at ICML 2024 for my work showing that debating with more persuasive models leads to more truthful answers.
I received my PhD from NYU under the supervision of Kyunghyun Cho and Douwe Kiela, funded by the National Science Foundation and Open Philanthropy. Previously, I've spent time at DeepMind, Facebook AI Research, University of Montreal, Uber, and Google. I was also named one of Forbes's 30 Under 30 in AI.
Ethan's Research
We find that language models can self-correct their own biases against different demographic groups.
Ethan Perez
Head of Alignment · Anthropic
I lead the alignment team at Anthropic, where I’m working to reduce existential risks from AI systems. I led the team that developed Constitutional Classifiers, the first approach capable of robustly preventing most bad actors from obtaining harmful information from AI systems. I also helped develop Retrieval-Augmented Generation (RAG) and introduced Automated Red Teaming, both now widely used across major AI labs.
I received my PhD from NYU under Kyunghyun Cho and Douwe Kiela, funded by NSF and Open Philanthropy. I’ve previously spent time at DeepMind, Meta AI Research, University of Montreal, Uber, and Google. I was named one of Forbes’s 30 Under 30 in AI.
Selected publications
- Constitutional Classifiers: Defending Against Universal Jailbreaks
First approach capable of robustly preventing most bad actors from obtaining harmful information from AI systems.
- Debating with More Persuasive LLMs Leads to More Truthful Answers
Multi-turn AI debate with stronger models elicits more truthful responses, supporting scalable oversight.
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Combines parametric and non-parametric memory for open-domain question answering and knowledge-intensive tasks.
Writing
- Inverse Scaling Prize Ideas
A curated list of tasks where large language models may perform worse as they scale, submitted for the Inverse Scaling Prize.
2022
- Personal Research Statement for Ph.D. Programs in Machine Learning
My personal research statement for Ph.D. programs in machine learning, shared for those applying to similar programs.
2022
- Easy Paper Writing Tips
Tips that improve the clarity of research papers while being fairly easy to implement.
2022
- Open Philanthropy AI Fellowship Statement
My personal statement for the Open Philanthropy AI Fellowship, shared for those applying to similar programs.
2020


