Google Introduces a New Generation of AI Reasoning Models

Google Introduces a New Generation of AI Reasoning Models

Image Credits:Google DeepMind

On Tuesday, Google introduced Gemini 2.5, a new family of AI reasoning models designed to pause and “think” before responding to questions.

The first in this series, Gemini 2.5 Pro Experimental, is a multimodal AI model that Google describes as its most advanced yet. Starting Tuesday, it will be available in Google AI Studio and the Gemini app for subscribers of the company’s $20-per-month AI plan, Gemini Advanced.

Google plans to integrate reasoning capabilities into all future AI models.

Tech Giants Compete to Advance AI Reasoning Models Following OpenAI’s o1 Release

Since OpenAI released the first AI reasoning model, o1, in September 2024, tech companies have been competing to develop comparable or superior models. Today, firms like Anthropic, DeepSeek, Google, and xAI have introduced AI reasoning models that leverage additional computing power and processing time to verify facts and analyze problems before delivering responses.

AI reasoning techniques have significantly improved performance in math and coding tasks. Many in the tech industry see these models as essential for AI agents—autonomous systems capable of completing tasks with minimal human input. However, these advancements come with higher costs.

Google has previously explored AI reasoning models, introducing a “thinking” version of Gemini in December. However, Gemini 2.5 marks the company’s most ambitious effort to surpass OpenAI’s o-series models.

Google: Gemini 2.5 Pro Tops Rivals in Web App & Agentic Coding

According to Google, Gemini 2.5 Pro outperforms both its earlier frontier AI models and some leading competitors on various benchmarks. The model is specifically designed to excel in developing visually rich web apps and agentic coding applications.

On the Aider Polyglot evaluation, which measures code editing performance, Google reports that Gemini 2.5 Pro achieved a 68.6% score, surpassing leading AI models from OpenAI, Anthropic, and DeepSeek.

However, in the SWE-bench Verified test, which assesses software development capabilities, Gemini 2.5 Pro scored 63.8%. While it outperformed OpenAI’s o3-mini and DeepSeek’s R1, it fell short of Anthropic’s Claude 3.7 Sonnet, which led with 70.3%.

In Humanity’s Last Exam—a multimodal assessment covering mathematics, humanities, and natural sciences—Gemini 2.5 Pro scored 18.8%, outperforming most competing flagship models.

At launch, Gemini 2.5 Pro features a 1-million-token context window, allowing it to process approximately 750,000 words in a single session—exceeding the length of The Lord of the Rings series. Google also plans to expand this capacity to 2 million tokens soon.

The company has yet to release API pricing for Gemini 2.5 Pro but promises more details in the coming weeks.


Read the original article on: TechCrunch

Read more: Colossal CEO Ben Lamm Believes Humanity has a Moral Duty to Develop De-Extinction Technology

Share this post