Generalist robot policies are those capable of performing complex manipulation tasks across a wide range of environments. Recent years have seen significant progress toward this goal, driven by advances in large-scale teleoperated robot datasets, structured representations for policy learning, hierarchical planning with vision-language models (VLMs), and online learning for adaptation to novel tasks and environments. Although multiple promising approaches are emerging, it is essential to understand the tradeoffs inherent in each to develop methods that not only generalize to new scenarios but also execute tasks with high precision and reliability.
An ideal framework for learning such generalizable policies should: (1) scale beyond simple pick-and-place tasks and continually improve as more task data becomes available, (2) learn effectively from diverse data sources, including robot teleoperation, simulated environments, and human demonstration videos, and (3) utilize representations of the world that are applicable across tasks requiring varying levels of precision and dexterity. This workshop will focus on a central question: What are the right priors for generalizable policy learning, and how can we best incorporate these priors into policy learning frameworks?
Our speakers and panelists are leading researchers in robotics and machine learning, working at the forefront of topics including end-to-end control, sim-to-real transfer, learning from human videos, and large-scale robotic data collection, among others. We invite the community to submit their latest work and ideas for discussion.
We aim to investigate the following topics and research questions:
Session 1 |
|
| 9:30 AM - 9:40 AM | Opening Remarks |
| 9:40 AM - 10:05 AM | Talk - Prof. Yang Gao Title: Scaling Robot Manipulation with VLMs and Human Videos: Lessons Learned |
| 10:05 AM - 10:30 AM | Talk - Prof. Harold Soh |
| 10:30 AM - 11:00 AM | Poster Session, Coffee Break |
Session 2 |
|
| 11:00 AM - 11:25 AM | Talk - Dr. Ajay Mandlekar Title: Scaling Synthetic Data Generation for Robotics with Point-Based Representations |
| 11:25 AM - 11:50 AM | Talk - Prof. Georgia Chalvatzaki Title: Structured Priors for Efficient Robot Learning |
| 11:50 AM - 12:15 PM | Talk - Prof. Edward Johns Title: The Priors Needed for In-Context Imitation Learning |
| 12:15 AM - 12:30 AM | Spotlight Talks |
| 12:30 PM - 1:30 PM | Lunch Break |
Session 3 |
|
| 1:30 PM - 2:00 PM | Talk - Prof. Jeannette Bohg Title:Three Layers of Priors for Generalizable Robot Manipulation: From Control to Human Data to Physics |
| 2:00 PM - 2:30 PM | Talk - Prof. Xiaolong Wang Title: Going Beyond Teleoperation for Humanoid Manipulation |
| 2:30 PM - 3:00 PM | Coffee Break, Poster Session |
Session 4 |
|
| 3:00 PM - 3:25 PM | Talk - Jiafei Duan Title: Grounding Vision and Language Models for robotics manipulation |
| 3:25 PM - 4:20 PM | Panel Discussion |
| 4:20 PM - 4:30 PM | Closing Remarks and Awards |
Each poster panel will be shared between two papers (arranged side-by-side, vertically). Please make sure your poster does not exceed 0.92 m (H) × 0.94 m (W) (close to A0 portrait, but slightly scaled down).
Each oral presentation will be a 5-minute spotlight (4 minutes for the talk and 1 minute for Q&A).
All accepted papers will be presented as posters.