Pengxiang Ding 丁鹏翔

Hi! I am Pengxiang Ding (丁鹏翔 in Chinese). I am a third-year Ph.D. student at Zhejiang University, advised by Prof. Donglin Wang. Additionally, I am involved in a joint program with Westlake University as a member of Machine Intelligence Laboratory (MiLAB). Prior to my Ph.D. career, I received my Msc. Degree from School of Artificial Intelligence, Beijing University of Posts and Telecommunications in 2022, advised by Prof. Jianqin Yin.

News

  • 2 papers have been accpeted by CVPR26.
  • 3 papers have been accpeted by ICRA26.
  • 2 papers have been accpeted by ICLR26.
  • 4 papers have been accpeted by AAAI26.
  • 1 papers have been accpeted by CoRL25.
  • 1 papers have been accpeted by Neurips25.
  • 1 papers have been accpeted by ICCV25.
  • 3 papers have been accpeted by ICML25.

Selected Publications (First author or Project leader) x 18

Google Scholar ∗: Equal contribution †: Project Leader

Minghui Lin, Pengxiang Ding†, Shu Wang, Zifeng Zhuang, Yang Liu, Xinyang Tong, Wenxuan Song, Shangke Lyu, Siteng Huang†, Donglin Wang. "HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models".

Xingyu Wang*, Pengxiang Ding*†, Jingkai Xu, Donglin Wang, Zhaoxin Fan. "CUBic: Coordinated Unified Bimanual Perception and Control Framework".

Jiayi Chen, Wenxuan Song, Pengxiang Ding†, Ziyang Zhou, Han Zhao, Feilong Tang, Donglin Wang, Haoang L. "Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process".

Jiacheng Liu*, Pengxiang Ding†*, Qihang Zhou, Yuxuan Wu, Da Huang, Zimian Peng, Wei Xiao, Weinan Zhang, Lixin Yang, Cewu Lu, Donglin Wang. "TrajBooster: Boosting Humanoid Whole-Body Manipulation via Trajectory-Centric Learning".

Yihao Wang*, Pengxiang Ding†*, Lingxiao Li, Can Cui, Zirui Ge, Xinyang Tong, Wenxuan Song, Han Zhao, Wei Zhao, Pengxu Hou, Siteng Huang, Yifan Tang, Wenhui Wang, Ru Zhang, Jianyi Liu, Donglin Wang "VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model".

Yiguo Fan†, Pengxiang Ding†,Shuanghao Bai†, Xinyang Tong†, Yuyang Zhu, Hongchao Lu, Fengqi Dai, Wei Zhao, Yang Liu, Siteng Huang, Zhaoxin Fan, Badong Chen, Donglin Wang. "Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation".

Zhefei Gong*, Pengxiang Ding†, Shangke Lyu, Siteng Huang, Mingyang Sun, Wei Zhao, Zhaoxin Fan, Donglin Wang. "CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction". [paper] [webpage] [code]

Yang Liu†, Ming Ma†, Xiaomin Yu†, Pengxiang Ding†, Han Zhao, Mingyang Sun, Siteng Huang, Donglin Wang. "SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning". [paper] [webpage] [code]

Pengxiang Ding†, Jianfei Ma†, Xinyang Tong†, Binghong Zou, Xinxin Luo, Yiguo Fan, Ting Wang, Hongchao Lu, Panzhong Mo, Jinxin Liu, Yuefan Wang, Huaicheng Zhou, Wenshuo Feng, Jiacheng Liu, Siteng Huang, Donglin Wang. "Humanoid-vla: Towards universal humanoid control with visual integration". [paper]

Wenxuan Song†, Jiayi Chen†, Pengxiang Ding†, Han Zhao, Wei Zhao, Zhide Zhong, Zongyuan Ge, Jun Ma, Haoang Li. "Accelerating vision-language-action model integrated with action chunking via parallel decoding". [paper]

Shuanghao Bai, Wanqi Zhou, Pengxiang Ding†, Wei Zhao, Donglin Wang, Badong Chen. "Rethinking Latent Representations in Behavior Cloning:An Information Bottleneck Approach for Robot Manipulation". [paper] [webpage] [code]

Xinyang Tong†, Pengxiang Ding†, Donglin Wang, Wenjie Zhang, Can Cui, Mingyang Sun, Yiguo Fan, Han Zhao, Hongyin Zhang, Yonghao Dang, Siteng Huang, Shangke Lyu "QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning". [paper] [webpage]

Hongyin Zhang, Pengxiang Ding†, Shangke Lyu, Ying Peng, Donglin Wang, "GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation". The Thirteenth International Conference on Learning Representations (ICLR2025). [paper]

Wei Zhao, Pengxiang Ding†, Zhang Min, Zhefei Gong, Shuanghao Bai, Han Zhao, Donglin Wang, "VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation". The Thirteenth International Conference on Learning Representations (ICLR2025). [paper]

Pengxiang Ding, Han Zhao, Wenxuan Song, Wenjie Zhang, Min Zhang, Siteng Huang, Ningxi Yang, Donglin Wang, "QUAR-VLA: Vision-Language-Action Model for Quadruped Robots". The 18th European Conference on Computer Vision (ECCV2024). [paper] [Project]

Yang Liu†, Pengxiang Ding†, Siteng Huang, Min Zhang, Han Zhao, Donglin Wang, "PiTe: Pixel-Temporal Alignment for Large Video-Language Model". The 18th European Conference on Computer Vision (ECCV2024).[paper][code]

Pengxiang Ding, Qiongjie Cui, Min Zhang, Mengyuan Liu, Haofan Wang, Donglin Wang, "Expressive Forecasting of 3D Whole-body Human Motions". In Proceedings of the 38th AAAI Conference on Artificial Intelligence. [paper] [code]

Pengxiang Ding, Jianqin Yin, "Towards more realistic human motion prediction with attention to motion coordination". In IEEE Transactions on Circuits and Systems for Video Technology. [paper][code]

Experience

  • Research Intern - DAMO Academy, Machine Intelligence Laboratory (达摩院/机器智能实验室)
    • Advisor: Xin Li
    • Time: Jan 2025 - June 2025.
  • Research Intern - RedBook, Intelligent Creation Group (小红书/智能创作组)
  • Research Intern - Sensetime/Smart City Group (商汤/智慧城市事业群)

Services

Journal/Conference Reviewer

  • ICML, ICLR, NeurIPS, CVPR, ICCV, ACMMM, AAAI, ICRA, IROS, CoRL, TNNLS, TASE, TSCVT

Talk

  1. Vision-Language-Action Models (Hao Dong, Peking University)
  2. MiLab VLA 专场 (3D 视觉工坊)
  3. End-to-End Quadruped Robot Large Model(深蓝学院)

Misc

Welcome to follow my Redbook.