Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Hugh Zhang🌐">
Hugh Zhang
I recently decided to take a leave of absence from Harvard to join Scale AI and kickstart our open source AI research efforts.
In my spare time, I've been a lifelong Go player (in fact, seeing AlphaGo beat Lee Sedol was origin of my interest in AI). I also co-founded the Gradient , a digital magazine focusing on AI.
My current work focuses on evals, test-time compute, and post-training for LLMs. Previously, I also worked on multi-agent reinforcement learning and game theory. * denotes equal or alphabetical ordering.
We demonstrate that multi-turn human jailbreaks can achieve >70% success rates against LLM defenses that report single-digit success rates for automated single-turn attacks.
Training on chain-of-thoughts that lead to a correct answer can help a LLM self-improve and generalize far beyond their original capabilities in the toy environment of addition.
Unified algorithm for both reinforcement learning and game theory. Can solve MDPs as fast as RL methods and imperfect-information games as fast as CFR using the single set of hyperparameters.
A novel no-regret learning procedure that converges to correlated and coarse-correlated equilibria several orders of magnitude faster than previous methods in randomly generated normal-form games.
Existing language models can generate either high quality or diverse utterances, but not both simultaneously. How can we measure that in a single metric?