Pengxiang Ding 丁鹏翔
Hi! I am Pengxiang Ding (丁鹏翔 in Chinese). I am a third-year Ph.D. student at Zhejiang University, advised by Prof. Donglin Wang. Additionally, I am involved in a joint program with Westlake University as a member of Machine Intelligence Laboratory (MiLAB). Prior to my Ph.D. career, I received my Msc. Degree from School of Artificial Intelligence, Beijing University of Posts and Telecommunications in 2022, advised by Prof. Jianqin Yin.
Research Interests
My research has centered on embodied ai, including VLA/VLM/World Model. During my Ph.D. studies, I primarily focused on the VLA direction. As first author, co-first author, or project leader, I published 15 papers, with representative works including Quar-VLA, Humanoid-VLA, OpenHelix, and CARP.
Major Publications
†: Equal contribution or Project Leader
Yiguo Fan†, Pengxiang Ding†,Shuanghao Bai†, Xinyang Tong†, Yuyang Zhu, Hongchao Lu, Fengqi Dai, Wei Zhao, Yang Liu, Siteng Huang, Zhaoxin Fan, Badong Chen, Donglin Wang. "Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation".
Zhefei Gong†, Pengxiang Ding†, Shangke Lyu, Siteng Huang, Mingyang Sun, Wei Zhao, Zhaoxin Fan, Donglin Wang. "CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction". [paper] [webpage] [code]
Wenxuan Song†, Jiayi Chen†, Pengxiang Ding†, Yuxin Huang, Han Zhao, Donglin Wang, Haoang Li. "CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding". [paper] [webpage] [code]
Yang Liu†, Ming Ma†, Xiaomin Yu†, Pengxiang Ding†, Han Zhao, Mingyang Sun, Siteng Huang, Donglin Wang. "SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning". [paper] [webpage] [code]
Xiaomin Yu†, Pengxiang Ding†, Wenjie Zhang, Siteng Huang, Songyang Gao, Chengwei Qin, Kejian Wu, Zhaoxin Fan, Ziyue Qiao, Donglin Wang. "Unicorn: Text-Only Data Synthesis for Vision Language Model Training". [paper] [code]
Can Cui†, Pengxiang Ding†, Wenxuan Song, Shuanghao Bai, Xinyang Tong, Zirui Ge, Runze Suo, Wanqi Zhou, Yang Liu, Bofang Jia, Han Zhao, Siteng Huang, Donglin Wang. "Openhelix: A short survey, empirical analysis, and open-source dual-system vla model for robotic manipulation". [paper] [webpage] [code]
Pengxiang Ding†, Jianfei Ma†, Xinyang Tong†, Binghong Zou, Xinxin Luo, Yiguo Fan, Ting Wang, Hongchao Lu, Panzhong Mo, Jinxin Liu, Yuefan Wang, Huaicheng Zhou, Wenshuo Feng, Jiacheng Liu, Siteng Huang, Donglin Wang. "Humanoid-vla: Towards universal humanoid control with visual integration". [paper]
Wenxuan Song†, Jiayi Chen†, Pengxiang Ding†, Han Zhao, Wei Zhao, Zhide Zhong, Zongyuan Ge, Jun Ma, Haoang Li. "Accelerating vision-language-action model integrated with action chunking via parallel decoding". [paper]
Shuanghao Bai†, Wanqi Zhou†, Pengxiang Ding†, Wei Zhao, Donglin Wang, Badong Chen. "Rethinking Latent Representations in Behavior Cloning:An Information Bottleneck Approach for Robot Manipulation". [paper] [webpage] [code]
Xinyang Tong†, Pengxiang Ding†, Donglin Wang, Wenjie Zhang, Can Cui, Mingyang Sun, Yiguo Fan, Han Zhao, Hongyin Zhang, Yonghao Dang, Siteng Huang, Shangke Lyu "QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning". [paper] [webpage]
Hongyin Zhang, Pengxiang Ding†, Shangke Lyu, Ying Peng, Donglin Wang, "GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation". The Thirteenth International Conference on Learning Representations (ICLR2025). [paper]
Wei Zhao, Pengxiang Ding†, Zhang Min, Zhefei Gong, Shuanghao Bai, Han Zhao, Donglin Wang, "VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation". The Thirteenth International Conference on Learning Representations (ICLR2025). [paper]
Pengxiang Ding, Han Zhao, Wenxuan Song, Wenjie Zhang, Min Zhang, Siteng Huang, Ningxi Yang, Donglin Wang, "QUAR-VLA: Vision-Language-Action Model for Quadruped Robots". The 18th European Conference on Computer Vision (ECCV2024). [paper] [Project]
Yang Liu†, Pengxiang Ding†, Siteng Huang, Min Zhang, Han Zhao, Donglin Wang, "PiTe: Pixel-Temporal Alignment for Large Video-Language Model". The 18th European Conference on Computer Vision (ECCV2024).[paper][code]
Pengxiang Ding, Qiongjie Cui, Min Zhang, Mengyuan Liu, Haofan Wang, Donglin Wang, "Expressive Forecasting of 3D Whole-body Human Motions". In Proceedings of the 38th AAAI Conference on Artificial Intelligence. [paper] [code]
Pengxiang Ding, Jianqin Yin, "Towards more realistic human motion prediction with attention to motion coordination". In IEEE Transactions on Circuits and Systems for Video Technology. [paper][code]
Other Pubulications
Hongyin Zhang, Zifeng Zhuang, Han Zhao, Pengxiang Ding, Hongchao Lu, Donglin Wang. "ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning."
Mingyang Sun, Pengxiang Ding, Weinan Zhang, Donglin Wang.
"Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport." [paper]
Han Zhao, Wenxuan Song, Donglin Wang, Xinyang Tong, Pengxiang Ding, Xuelian Cheng, Zongyuan Ge.
"MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models". [paper]
Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang, "Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference". AAAI2025. [paper] [project page] [Chinese intro] [github] [demo] [Twitter@AK]
Yonghao Dang, Jianqin Yin, Liyuan Liu, Pengxiang Ding, Yuan Sun, Yanzhu Hu, "DHRNet: A Dual-path Hierarchical Relation Network for multi-person pose estimation". Knowledge-Based Systems 2024 (KBS2024). [paper][code]
Can Cui, Siteng Huang, Wenxuan Song, Pengxiang Ding, Min Zhang, Donglin Wang, "ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification". ACM Multimedia 2024 (ACMMM24). [paper][code]
Wenxuan Song, Han Zhao, Pengxiang Ding, Can Cui, Shangke Lyu, Yaning Fan, Donglin Wang, "GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot". IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2024). [paper]
Shangke Lyu, Xin Lang, Han Zhao, Hongyin Zhang, Pengxiang Ding, Donglin Wang, "RL2AC: Reinforcement Learning-based Rapid Online Adaptive Control for Legged Robot Robust Locomotion". Robotics: Science and Systems 2024 (RSS24).
Chao Qi, Jianqin Yin, Jinghang Xu, Pengxiang Ding, "Instance-incremental Scene Graph Generation from Real-world Point Clouds via Normalizing Flows". In IEEE Transactions on Circuits and Systems for Video Technology. [paper][code]
Xiaoli Liu, Jianqin Yin, Jin Liu, Pengxiang Ding, Jun Liu, Huaping Liu, "Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction". In IEEE Transactions on Circuits and Systems for Video Technology. [paper][code]
Experience
- Research Intern - DAMO Academy, Machine Intelligence Laboratory (达摩院/机器智能实验室)
- Advisor: Xin Li
- Time: Jan 2025 - June 2025.
- Research Intern - RedBook, Intelligent Creation Group (小红书/智能创作组)
- Advisor: Haofan Wang
- Time: Sep 2022 - March 2023.
- Research Intern - Sensetime/Smart City Group (商汤/智慧城市事业群)
- Advisor: Dongliang Wang
- Time: Sep 2021 - March 2022.
Services
Journal/Conference Reviewer
- ICML, ICLR, NeurIPS, CVPR, ICCV, ACMMM, AAAI, ICRA, IROS, CoRL, TNNLS, TASE, TSCVT
Talk
- Vision-Language-Action Models (Hao Dong, Peking University)
- MiLab VLA 专场 (3D 视觉工坊)
- End-to-End Quadruped Robot Large Model(深蓝学院)
Misc
Welcome to follow my Redbook.