Education
-
[2017-2021] 🎉 I received my B.E. degree from Peking University, awarded Outstanding Graduate (Top 5%).
-
[2020-2021] I worked as a visiting student in University of Pennsylvania, supervised by Prof. Jianbo Shi.
-
[2021-Now] 🎓 I'm pursuing my Ph.D. in MMLab, CUHK, supervised by Prof. Hongsheng Li and Prof. Xiaogang Wang.
-
[2021-2024] I worked as a research intern at Shanghai AI Lab, supervised by Dr. Peng Gao.
-
[2024-2025] I worked as a research intern at LLaVA team, ByteDance, Seattle, supervised by Dr. Chunyuan Li.
-
[2025-Now] 💪 I joined SEED (Multimodal Interaction & World Model), ByteDance, San Jose.
Biography
📌 My research interests include Large Multimodal Models, Vision-language Learning, Emboided AI, and 3D Vision.
✉️ I'm looking for undergraduate and graduate students for academic cooperation. Discussions are welcome!
News
-
[2025-05] 🔥 We release "T2I-R1", introducing R1 into image generation domains for CoT reasoning.
-
[2025-05] One paper accepted by ICML 2025
-
[2025-03] 🔥 We release "HybridVLA", the first work unifying Autoregression and Diffusion in VLA models.
-
[2025-02] Three papers accepted by CVPR 2025, one Highlight 🎉
-
[2025-01] Five papers accepted by ICLR 2025, two Spotlight 🎉
-
[2025-01] 🔥 We release "Image Generation with CoT", the first work investigating CoT strategies (e.g., Test-time Scling, RL, and Reflection) in autoregressive text-to-image generation.
-
[2025-01] 🎉 "Video-MME", is thrilled to be selected as One of the 14 Groundbreaking Stuides in 2024.
-
[2024-08] 🔥 We release "LLaVA-OneVision", the latest LLaVA model for image, video, and image-text interleaved scenarios with superior performance.
-
[2024-07] Four papers accepted by ECCV 2024
-
[2024-07] 🔥 We release "LLaVA-NeXT-Interleave" for multi-image instruction tuning and "MAVIS" for multimodal mathematical reasoning.
-
[2024-05] Three papers accepted by ICML 2024
-
[2024-03] Seven papers accepted by CVPR 2024, two Highlight 🎉
-
[2024-03] 🔥 We release "MathVerse", a novel mathematical benchmark with the first CoT evaluation strategy.
-
[2024-01] Four papers accepted by ICLR 2024
Selected Projects
* Equal Contribution   # Project Lead
♠ o1/R1-like Chain-of-Thought (CoT) Reasoning |
♠ Large Language & Multimodal Models (LLMs & LMMs) |
♠ Large Vision Models |
♠ Emboided AI & Robotics |
♠ Vision-language Learning |
♠ 3D Vision & Autonomous Driving |