Education
Biography
I am a final-year Ph.D. candidate at MMLab, CUHK, and got my B.E. degree from Peking University.
📌 My research interests include Large Multimodal Models, Vision-language Learning, Emboided AI, and 3D Vision.
✉️ I'm looking for undergraduate and graduate students for academic cooperation. Discussions are welcomed!
News
-
[2025-03] 🔥 We release "HybridVLA", the first work unifying Autoregression and Diffusion in VLA models.
-
[2025-02] Three papers accepted by CVPR 2025
-
[2025-01] Five papers accepted by ICLR 2025, two Spotlight 🎉
-
[2025-01] 🔥 We release "Image Generation with CoT", the first work investigating CoT strategies (e.g., Test-time Scling, RL, and Reflection) in autoregressive text-to-image generation.
-
[2025-01] 🎉 "Video-MME", is thrilled to be selected as One of the 14 Groundbreaking Stuides in 2024.
-
[2024-12] Two papers accepted by AAAI 2025
-
[2024-08] 🔥 We release "LLaVA-OneVision", the latest LLaVA model for image, video, and image-text interleaved scenarios with superior performance.
-
[2024-07] Four papers accepted by ECCV 2024
-
[2024-07] 🔥 We release "LLaVA-NeXT-Interleave" for multi-image instruction tuning and "MAVIS" for multimodal mathematical reasoning.
-
[2024-05] Three papers accepted by ICML 2024
-
[2024-03] Seven papers accepted by CVPR 2024, two Highlight 🎉
-
[2024-03] 🔥 We release "MathVerse", a novel mathematical benchmark with the first CoT evaluation strategy.
-
[2024-02] One paper accepted by ICRA 2024
-
[2024-01] Four papers accepted by ICLR 2024
Selected Projects
* Equal Contribution   # Project Lead
♠ o1/R1-like Chain-of-Thought (CoT) Reasoning |
♠ Large Language & Multimodal Models (LLMs & LMMs) |
♠ Large Vision Models |
♠ Emboided AI & Robotics |
♠ Vision-language Learning |
♠ 3D Vision & Autonomous Driving |