Zero-Shot
Zero-shot capability in humanoid robotics refers to a system’s ability to perform a task it has never encountered during training without task-specific fine-tuning or demonstrations. This relies on generalization: mapping from high-level instructions or observations to effective actions based on prior knowledge.
Large foundation models, especially those integrating language and vision, are driving this capability. For example, a humanoid robot using a vision-language-action model might receive a prompt like “pick up the blue book on the shelf” and successfully complete the task despite never having seen that exact combination of object, color, or location before.
Zero-shot performance marks a shift away from rigid, pre-programmed routines toward open-ended generality.