Projects
Accelerate video diffusion model generation
- First open distillation recipes for video DiT, with support for distilling and finetuning state-of-the-art open video DiTs (including Sliding Tile Attention).
- Scalable training with FSDP, sequence parallelism, and selective activation checkpointing — near-linear scaling to 64 GPUs.
- Memory-efficient finetuning with LoRA, precomputed latents, and precomputed text embeddings.
Fast tile-level sparse attention for video diffusion · ICML 2025
- The first hardware-efficient 2D/3D sliding-window attention implementation, achieving 58.79% MFU.
- Up to 17× attention speedup over FlashAttention-2 on video generation.
- Deployed in production by Hunyuan Video 1.5 and Meituan LongCat Video.
Block-sparse video diffusion with Attention Tile · ICML 2025 @ Es-FoMo
- Discovered the Attention Tile pattern in 3D-DiT and built an efficient video diffusion pipeline.
- Achieves 7.8× speedup on a single GPU through block-sparse kernels and consistency distillation.
Evaluating LLMs as agents · ICLR 2024
- Classifies real-world browsing options and designs an auto-collected browsing-traces data framework, building a more efficient language-model-driven automated web navigation agent.
Miscellaneous
- Guitar. Lead guitarist of Prime (素数), a math-rock band active in Beijing. Recent live performance at Prime Lab.
- Pingpong. Member and referee of our department's ping-pong team.