Pengxiang Ding 丁鹏翔

Hi! I am Pengxiang Ding (丁鹏翔 in Chinese). I am a first-year Ph.D. student at Zhejiang University, advised by Prof. Donglin Wang. Additionally, I am involved in a joint program with Westlake University as a member of Machine Intelligence Laboratory (MiLAB). Prior to my Ph.D. career, I received my Msc. Degree from School of Artificial Intelligence, Beijing University of Posts and Telecommunications in 2022.

Research Interests

Currently, My research has centered on multi-modal large models (mainly on vision-language models), including

  • Multi-modal large models: multimodal large language models (MLLM), vision-language pre-trained models (VLM)
  • Embodied AI: foundation models for robotics
  • AIGC: human motion analysis

I am always looking for related collaborations, and some of them have produced top-level publications. If you are interested in communication, feel free to drop me an email.

News

  • [March 21, 2024] A new paper about Cobra, an efficient multi-modal large language model, was released. Project page has been available. The paper has been featured by Hugging Face Daily Papers! Demo has been available!
  • [December 9, 2023] One paper about whole-body human motion prediction got accepted for AAAI 2024.

Hiring: We are looking for postdoctors, research assistants and visiting students for MiLAB in Westlake University. More information about requirements can be found here, and if you are still in school, being a visiting student is also welcome. Please send email to mi_lab[AT]westlake.edu.cn with your CV if you are interested. Specially, if you are interested in my research direction and would like to be my collaborator after coming, please specify in the email and also send a copy to me.

Publications

Google Scholar †: Equal contribution

Peer-reviewed Conference

Pengxiang Ding, Qiongjie Cui, Min Zhang, Mengyuan Liu, Haofan Wang, Donglin Wang, "Expressive Forecasting of 3D Whole-body Human Motions". In Proceedings of the 38th AAAI Conference on Artificial Intelligence. [paper]

Chao Qi, Jianqin Yin, Jinghang Xu, Pengxiang Ding, "Instance-incremental Scene Graph Generation from Real-world Point Clouds via Normalizing Flows". In IEEE Transactions on Circuits and Systems for Video Technology. [paper]

Pengxiang Ding, Jianqin Yin, "Towards more realistic human motion prediction with attention to motion coordination". In IEEE Transactions on Circuits and Systems for Video Technology. [paper]

Xiaoli Liu, Jianqin Yin, Jin Liu, Pengxiang Ding, Jun Liu, Huaping Liu, "Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction". In IEEE Transactions on Circuits and Systems for Video Technology. [paper]

Preprints & Under Submission

Wenxuan Song, Han Zhao, Pengxiang Ding, Can Cui, Shangke Lyu, Yaning Fan, Donglin Wang, "GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot". arXiv preprint arXiv:2403.13358. [paper]

Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang, "Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference". arXiv preprint arXiv:2403.14520. [paper] [project page] [Chinese intro] [github] [demo] [Twitter@AK]

Pengxiang Ding, Han Zhao, Zhitao Wang, Zhenyu Wei, Shangke Lyu, Donglin Wang, "QUAR-VLA: Vision-Language-Action Model for Quadruped Robots". arXiv preprint arXiv:2312.14457. [paper]

Pengxiang Ding, Jianqin Yin, "Uncertainty-aware Human Motion Prediction". arXiv preprint arXiv:2107.03575. [paper]

Experience

  • Research Intern - RedBook, Intelligent Creation Group (小红书/智能创作组)
  • Research Intern - Sensetime/Smart City Group (商汤/智慧城市事业群)

Services

Journal Reviewer

  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

Conference Reviewer

  • International Conference on Intelligent Robots and Systems (IROS)
  • IEEE International Conference on Multimedia and Expo (ICME)
  • Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

Misc

Welcome to follow my Zhihu.