Chelsea Finn: Building Robots That Can Do Anything
Developing general-purpose robots requires a shift from specialized, single-task systems to broad foundation models. This is achieved through a combination of large-scale, diverse, real-world data collection and a specific training methodology: pre-training on all available data and then fine-tuning on a curated, high-quality subset of demonstrations. This recipe, combined with architectural innovations to preserve the capabilities of Vision-Language Model (VLM) backbones, enables robots to perform complex, long-horizon tasks, generalize to unseen environments, and respond to open-ended human instructions.