Large Language Models (LLMs)
Large Language Models (LLMs) give humanoid robots the ability to understand, generalize, and respond to natural-language instructions. Instead of relying on hand-coded scripts, they generate executable plans and even write control policies on the fly. Paired with perception systems, LLMs help robots clarify intent and complete tasks.
The frontier is grounding — linking language to physical action. Projects like SayCan, RT-2, and PaLM-E laid the groundwork, integrating LLMs with sensor data and planning stacks to turn abstract prompts into embodied behavior.
Recent models have pushed this further: GR00T N1 (NVIDIA) enables end-to-end control on humanoids; Gemini Robotics (DeepMind) supports language-conditioned planning and on-device inference; Pi-Zero (Physical Intelligence) uses continuous control at 50 Hz across embodiments.
These systems define the emerging class of vision-language-action models that bring LLMs into physical space — turning generalist reasoning into real-world performance.