Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Yifan Qiao
[go: Go Back, main page]

About Me

I am a Founding Member of Technical Staff at Inferact, building vLLM to make AI accessible to everyone with cheaper and faster inference.

I just finished my Postdoc at UC Berkeley, where I worked with Ion Stoica and Joseph E. Gonzalez in the Sky Computing Lab. Prior to that, I completed my Ph.D. in Computer Science at UCLA in 2024, where I was advised by Harry Xu and Miryung Kim.

My research lies at the intersection of systems and machine learning. I build systems to make AI faster and more efficient.

I am a recipient of the Amazon & UCLA Science Hub Fellowship (2021), a Jane Street Graduate Research Fellowship Finalist (2023), and UCLA's Outstanding Graduate Student Research Award (2024).

Updates

Open Source Projects

Other than vLLM, I am actively working on the Open Virtual GPU project (ovg-project), building open-source infrastructure for GPU virtualization and efficient GPU sharing in datacenters. Our vision is to create a "GPU OS" that makes GPU resources as manageable and shareable as CPU resources today. Read our first blog post on solving the GPU cost crisis.

Elastic KV cache sharing across multiple co-located LLMs through GPU virtual memory. Integrates with SGLang and vLLM.

Co-leading with Jiarong Xing and Shan Yu

GVM 22 New

An OS-level GPU virtualization layer, for sharing a GPU with hardware-like performance isolation and full flexibility.

Co-leading with Yicheng Liu

Publications

  1. ConServe: Fine-Grained GPU Harvesting for LLM Online and Offline Co-Serving

    Yifan Qiao, Shan Yu, Shu Anzai, Haoran Ma, Shuo Yang, Yang Wang, Miryung Kim, Yongji Wu, Yang Zhou, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, Harry Xu

    ICML 2026

  2. BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

    Yilong Zhao, Shuo Yang, Kan Zhu, Lianmin Zheng, Baris Kasikci, Yifan Qiao, Yang Zhou, Jiarong Xing, Ion Stoica

    ASPLOS 2026

  3. UniCache: Unifying Prefix Cache Eviction for Heterogeneous LLM Serving Workloads

    Bingyang Ouyang, Yifan Qiao, Jiarong Xing

    SIGMETRICS 2026

  4. Lost in Translation: The Search for Meaning in Network-Attached AI Accelerator Disaggregation

    Jaewan Hong, Yifan Qiao, Soujanya Ponnapalli, Shu Liu, Marcos K. Aguilera Vincent Liu, Christopher J. Rossbach, Ion Stoica

    HotNets 2025

  5. PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications

    Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, Junchen Jiang

    SOSP 2025

  6. Orthrus: Efficient and Timely Detection of Silent User Data Corruption in the Cloud with Resource-Adaptive Computation Validation

    Chenxiao Liu, Zhenting Zhu, Quanxi Li, Yanwen Xia, Yifan Qiao, Xiangyun Deng, Youyou Lu, Tao Xie, Huimin Cui, Zidong Du, Harry Xu, Chenxi Wang

    SOSP 2025

  7. Idleness is Relative: Exploiting Tool-Call Idle Windows for Offloading in Agentic Systems with MORI

    Tian Xia, Hanchen Li, Zhifei Li, Xiaokun Chen, Hao Kang, Yifan Qiao, Yi Xu, Ion Stoica

    Arxiv 2026

  8. SkyNomad: On Using Multi-Region Spot Instances to Minimize AI Batch Job Cost

    Zhifei Li, Tian Xia, Ziming Mao, Zihan Zhou, Ethan J. Jackson, Jamison Kerney, Zhanghao Wu, Pratik Mishra, Yi Xu, Yifan Qiao, Scott Shenker, Ion Stoica

    Arxiv 2026

  9. Towards Efficient and Practical GPU Multitasking in the Era of LLM

    Jiarong Xing, Yifan Qiao, Simon Mo, Xingqi Cui, Gur-Eyal Sela, Yang Zhou, Joseph Gonzalez, Ion Stoica

    Arxiv 2025

  10. Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

    Shan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng

    Arxiv 2025 code

  11. Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models

    Yongji Wu, Wenjie Qu, Xueshen Liu, Tianyang Tao, Yifan Qiao, Zhuang Wang, Wei Bai, Yuan Tian, Jiaheng Zhang, Z. Morley Mao, Matthew Lentz, Danyang Zhuo, Ion Stoica

    Arxiv 2024

  12. DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

    Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Chenxi Wang, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, and Harry Xu.

    OSDI 2024 full versioncode

  13. A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

    Lei Chen*, Shi Liu*, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, and Harry Xu.

    OSDI 2024 full versioncode

  14. Harvesting Idle Memory for Application-managed Soft State with Midas

    Yifan Qiao, Zhenyuan Ruan, Haoran Ma, Adam Belay, Miryung Kim, and Harry Xu.

    NSDI 2024 codeslides

  15. Hermit: Low-Latency, High-Throughput, and Transparent Remote Memory via Feedback-Directed Asynchrony

    Yifan Qiao, Chenxi Wang, Zhenyuan Ruan, Adam Belay, Qingda Lu, Yiying Zhang, Miryung Kim, and Guoqing Harry Xu.

    NSDI 2023 codeslides

  16. Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory

    Chenxi Wang*, Yifan Qiao*, Haoran Ma, Shi Liu, Yiying Zhang, Wenguang Chen, Ravi Netravali, Miryung Kim, Guoqing Harry Xu. (*contributed equally)

    NSDI 2023 codeslides

  17. Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

    John Thorpe*, Pengzhan Zhao*, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, Guoqing Harry Xu.

    NSDI 2023 full versioncode

  18. MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime

    Chenxi Wang*, Haoran Ma*, Shi Liu, Yifan Qiao, Jonathan Eyolfson, Christian Navasca, Shan Lu, Guoqing Harry Xu.

    OSDI 2022 (Awarded Jay Lepreau Best Paper) code

  19. Mako: A Low-Pause, High-Throughput Evacuating Collector for Memory-Disaggregated Datacenters

    Haoran Ma, Shi Liu, Chenxi Wang, Yifan Qiao, Michael D. Bond, Stephen M. Blackburn, Miryung Kim, Guoqing Harry Xu.

    PLDI 2022 code

  20. Dorylus: Affordable, Scalable, and Accurate GNN Training over Billion-Edge Graphs

    John Thorpe*, Yifan Qiao*, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. (*contributed equally)

    OSDI 2021 full versioncode

  21. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC

    Shuo Yang, Kai Wu, Yifan Qiao, Dong Li, Jidong Zhai.

    CLUSTER 2017

Experience

  1. Visiting Student at MIT PDOS Group, hosted by Adam Belay.

    Worked on an elastic LLM serving system.

    Jun. 2023 - Sept. 2023

  2. Visiting Student at MIT PDOS Group, hosted by Adam Belay.

    Worked on Midas, a new OS memory abstraction for soft state.

    Jun. 2022 - Sept. 2022

  3. Research Intern at Alibaba Bellevue, Cloud Storage Team, hosted by Qingda Lu.

    Worked on Hermit, a high-performance and transparent remote memory system.

    Jun. 2021 - Sept. 2021

Service

  • MLSys 2026, Program Committee
  • ASPLOS 2026, Program Committee
  • ATC 2024, External Review Committee
  • SOSP 2023, Artifact Evaluation Committee
  • OSDI 2023, Artifact Evaluation Committee
  • ATC 2023, Artifact Evaluation Committee
  • WORDS 2022, Session Chair

Awards

  • 2024 Outstanding Graduate Student Research Award, UCLA
  • 2023 Jane Street Graduate Research Fellowship Finalist
  • 2021 Amazon Ph.D. Fellow
  • 2019 Magna Cum Laude in Beijing (8/140)
  • 2019 Magna Cum Laude at Department of Computer Science and Technology, Tsinghua University
  • 2019 Cum Laude at Tsinghua University (14/140)
  • 2018 CNPC Scholarship for Comprehensive Excellence (8/140)
  • 2018 Qualcomm Scholarship (Top 6%)
  • 2017 National Scholarship (6/140)

Teaching

UC Berkeley

  • Guest Lecture for CS 262A Advanced Topics in Computer Systems (Spring 2025)
  • Guest Lecture for CS 162 Operating System (Fall 2025)

UCLA

  • Teaching Assistant for CS 130 Software Engineering (Winter 2024)
  • Teaching Assistant for CS 130 Software Engineering (Spring 2024)