DeepMind Unveils its First Thinking Robot AI

Image Credits:Google

Generative AI can also produce robot actions—the core idea behind DeepMind’s Gemini Robotics project. The team has unveiled two new models that work together to let robots “think” before they act. Simulated reasoning has improved language models, and that advance may soon reach robotics.

DeepMind argues that generative AI is critical for robots because it enables broad, flexible functionality. Unlike today’s robots, which must be painstakingly trained for narrow tasks, generative systems could handle entirely new environments without reprogramming. DeepMind’s Carolina Parada noted that most robots are custom-built and take months to set up for a single task. Gemini Robotics instead uses a two-model approach: one for reasoning and one for execution.

Gemini Robotics 1.5 vs. Gemini Robotics-ER 1.5

The two models are called Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. Gemini Robotics 1.5 is a vision-language-action (VLA) model that interprets visual and text input to produce robotic actions. Gemini Robotics-ER 1.5, where “ER” stands for embodied reasoning, is a vision-language model (VLM) that processes the same types of input but outputs step-by-step plans for completing complex tasks.

Gemini Robotics-ER 1.5 is the first robotics AI with simulated reasoning, scoring highly on tests for decision-making in physical environments. However, it doesn’t perform actions itself—that role is handled by Gemini Robotics 1.5.

For example, if you asked a robot to separate laundry into whites and colors, Gemini Robotics-ER 1.5 would analyze the request along with images of the clothing pile. It can also use external tools like Google Search to collect additional information. Based on this, the ER model produces natural language instructions—step-by-step directions the robot should follow to carry out the task.

Turning Instructions into Actions

Gemini Robotics 1.5, the action model, takes the step-by-step instructions from the ER model and translates them into robot movements, using visual input for guidance. It also engages in its own reasoning process to decide how to carry out each step. As DeepMind’s Kanishka Rao explained, humans rely on intuitive thoughts to complete tasks, but robots lack that intuition—so a key breakthrough with Gemini 1.5’s VLA is its ability to “think before it acts.”

DeepMind built both new robotics AIs on the Gemini foundation models and fine-tuned them with data for physical interaction. This design enables robots to handle more complex, multi-stage tasks, effectively giving them agent-like capabilities.

To test this system, DeepMind has deployed it on machines like the two-armed Aloha 2 and the humanoid Apollo. Unlike earlier approaches that required custom models for each robot, Gemini Robotics 1.5 can generalize across different embodiments—for example, transferring skills from Aloha 2’s grippers to Apollo’s more dexterous hands without special adjustments.

That said, practical household robots are still a distant goal. For now, only trusted testers can access Gemini Robotics 1.5, the model that controls physical machines. The ER model, however, is already available in Google AI Studio, giving developers the ability to generate robotic instructions for real-world experiments.

Read the original article on: Arstechnica

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

DeepMind Unveils its First Thinking Robot AI

Gemini Robotics 1.5 vs. Gemini Robotics-ER 1.5

Turning Instructions into Actions

Like this:

More posts

China Installed 2,200 AI Medical Booths Delivering 4-Minute Diagnoses

Japan has Created Technology that lets your Body Control Humanoid Robots

Chinese Robot sets new Milestone by Walking more than 100 km

A Lithium Cloud in Earth’s Atmosphere Was Traced to a Returning SpaceX Rocket

DeepMind Unveils its First Thinking Robot AI

Gemini Robotics 1.5 vs. Gemini Robotics-ER 1.5

Turning Instructions into Actions

Share this:

Like this:

More posts

China Installed 2,200 AI Medical Booths Delivering 4-Minute Diagnoses

Japan has Created Technology that lets your Body Control Humanoid Robots

Chinese Robot sets new Milestone by Walking more than 100 km

A Lithium Cloud in Earth’s Atmosphere Was Traced to a Returning SpaceX Rocket