A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Foundation Models

Foundation models in robotics are large, general-purpose models trained on broad datasets spanning multiple robots, tasks, and modalities. When applied to humanoids, they often operate end-to-end: mapping raw inputs like vision, proprioception, or language directly to control actions, without task-specific modules or manually engineered pipelines.

This architecture enables flexible behavior across diverse environments, with a single model adapting to perception, planning, and actuation. NVIDIA’s GR00T N1, introduced in 2025, and Google DeepMind’s RT‑2, released in 2023, are examples of vision-language-action foundation models trained end-to-end. These systems map sensory inputs to robot actions, enabling generalist performance across tasks and platforms, with GR00T N1 pushing this approach toward humanoid-scale deployment.

In humanoids, foundation models are now a core strategy for scaling capability — compressing what once required dozens of hand-built modules into a single adaptive model that learns from interaction.

Contact us

Have another role in mind? Let us know what you could bring to the team.