I am a fifth-year Ph.D. candidate (2020-) at HAN LAB of MIT EECS, advised by Prof. Song Han. My research interest is systems and machine learning (SysML). I worked on full-stack efficient 3D deep learning for autonomous driving in the first half my Ph.D., and became interested in efficient foundation models for multimodal generation recently. I received the MLSys 2024 Best Paper Award as the system co-lead of the AWQ project.

I interned at NVIDIA, Alphabet (Waymo/Google), OmniML (now part of NVIDIA) during my PhD. I did my master of science in EECS at MIT in 2022. Before that, I graduated with highest honor from the Department of Computer Science and Engineering of Shanghai Jiao Tong University in 2020, where I was fortunately advised by Prof. Hongtao Lu. I was also affiliated with the IEEE Honor Class at SJTU.

News

May 14, 2024 I delivered the oral presentation for our MLSys 2024 Best Paper, AWQ!
May 7, 2024 We announced the QServe inference engine for cloud LLM serving with W4A8KV4 precision. It matches the TRT-LLM throughput on A100 with 3x cheaper L40S GPUs. I led the system design with Shang Yang.
Jan 16, 2024 Our new paper, LongLoRA, has been accepted to ICLR 2024 as an oral presentation.
Jul 24, 2023 Our new paper, TorchSparse++, has been accepted by MICRO 2023. This is my first paper as a leading author at a computer architecture conference.
May 30, 2023 I started my internship at Waymo Research with Kan Chen, Rami Al-Rfou, Charles R. Qi, Kratarth Goel. I also work closely with Mingxing Tan, Yin Zhou and their teams.

Recent Publications

  1. QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Yujun Lin*, Haotian Tang*, Shang Yang*, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, and Song Han arXiv 2024 [Abs] [arXiv] [Website] [Code]
  2. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Ji Lin*, Jiaming Tang*, Haotian Tang+, Shang Yang+, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han MLSys (Best Paper Award) 2024 [Abs] [arXiv] [Website] [Code]
  3. LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, and Jiaya Jia ICLR (Oral) 2024 [Abs] [arXiv] [Website] [Code]
  1. TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs Haotian Tang*, Shang Yang*, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, and Song Han IEEE/ACM International Symposium on Microarchitecture (MICRO) 2023 [Abs] [arXiv] [Website] [Code]
  2. BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation Zhijian Liu*, Haotian Tang*, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han ICRA 2023 [Abs] [arXiv] [Website] [Code]
  1. TorchSparse: Efficient Point Cloud Inference Engine Haotian Tang*, Zhijian Liu*, Xiuyu Li*, Yujun Lin, and Song Han MLSys 2022 [Abs] [arXiv] [Website] [PDF] [Code]
  2. Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications Han Cai*, Ji Lin*, Yujun Lin*, Zhijian Liu*, Haotian Tang*, Hanrui Wang*, Ligeng Zhu*, and Song Han ACM Transactions on Design Automation of Electronic Systems (TODAES) 2022 [Abs] [arXiv] [PDF]
  1. PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution Zhijian Liu*, Haotian Tang*, Shengyu Zhao, Kevin Shao, and Song Han IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021 [Abs] [arXiv] [PDF]
  2. PointAcc: Efficient Point Cloud Accelerator Yujun Lin, Zhekai Zhang, Haotian Tang, Hanrui Wang, and Song Han MICRO 2021 [Abs] [arXiv] [Website]
  3. SemAlign: Annotation-Free Camera-LiDAR Calibration with Semantic Alignment Loss Zhijian Liu*, Haotian Tang*, Sibo Zhu*, and Song Han IROS 2021 [Abs] [Website] [PDF]
  1. Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution Haotian Tang*, Zhijian Liu*, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, and Song Han ECCV 2020 [Abs] [arXiv] [Website] [Code]
  1. Point-Voxel CNN for Efficient 3D Deep Learning Zhijian Liu*, Haotian Tang*, Yujun Lin, and Song Han NeurIPS (Spotlight) 2019 [Abs] [arXiv] [Website] [Code]

    Service

    I regularly serve as a reviewer for ICML (outstanding reviewer, 2022), NeurIPS (top reviewer, 2022 and 2023), ICLR (highlighted reviewer, 2022), TPAMI, IJCV, CVPR, ICCV.