Claude 4 Tested: Why It Might Outperform ChatGPT in the AI Race

By Samilton Santos Artificial Intelligence, Tech ChatGPT, Claude 4 0 Comments

Claude 4 from Anthropic sets new records: 72.5% on SWE-bench and 7 hours of autonomous work. We may have the best AI model for coding. And that’s not all.

Alex Albert from Anthropic didn’t hold back: “Claude 4 is the best programming model in the world.” A bold claim — but the data seems to support it. Scoring 72.5% on the SWE-bench Verified benchmark, running autonomously for hours on end, and handling thousands of sequential steps, Claude 4 isn’t just another upgrade. It’s a shift in what we expect AI to do. And the best part? You can try it yourself right now.

When AI Pulls an All-Nighter

Anthropic made waves today with the release of Claude 4 Opus and Claude 4 Sonnet — marking its return to full-scale models after months refining Sonnet variants. What truly stands out is Claude 4’s endurance: it can stay operational for 24 hours straight without losing accuracy or context.

That’s right. While your fellow developer might start grumbling after eight hours of debugging, Claude 4 Opus has been shown to play Pokémon non-stop or refactor code for seven hours straight. Earlier models typically hit their limit after a couple of hours. As Alex Albert put it, “There’s a massive demand for agent-based applications, and Claude 4 fits that role perfectly.”

**The email I received a few hours ago. From then on, it was just test, test, test.**

The Numbers That Are Making Rivals Nervous

Think of benchmarks like poker — and Claude 4 just laid down a royal flush. With 72.5% on the SWE-bench Verified test, it leaves previous models far behind. For context, scoring over 50% was once seen as a major win. It also pulled in a solid 43.2% on Terminal-bench.

The impact was immediate: GitHub has adopted Claude 4 Sonnet as the foundation for its latest Copilot coding agent. That kind of switch doesn’t happen without good reason. Sourcegraph called it “a major leap in software development,” and Augment Code reported “higher success rates and cleaner, more precise code changes.” In short, everyone wants a ticket on the Claude 4 express.

Claude 4: Powerful, But Built With Safety in Mind

Anthropic activated its Level 3 AI safety protocol for the first time — typically reserved for models with the potential to aid in developing chemical, biological, or nuclear weapons. Claude 4 Opus is that powerful, requiring strict safeguards to prevent misuse.

Introducing ‘Deep Thought’ Mode

One standout feature of Claude 4 is its ability to toggle between rapid responses and deeper, more deliberate reasoning. Activate its extended thinking mode, and the model takes a moment to “think” — even showing a preview of what it’s processing. It’s like having a colleague who walks you through their logic instead of just giving you the answer.

Claude Code integration is also now widely available, complete with GitHub Actions and built-in support for VS Code and JetBrains. Suggested code changes show up directly in your files — no more tedious copying and pasting. It’s seamless.

A Multi-Billion Dollar Bet That’s Paying Off

Anthropic is now generating over $2 billion in annualized revenue, doubling its previous results. Chief Product Officer Mike Krieger — who also co-founded Instagram — openly says, “I used to do most of the writing myself, using Claude to bounce ideas. Now Claude 4 writes most of it.”

And he’s not the only one impressed. Cursor describes Claude 4 as “cutting-edge for coding,” while Replit highlights “massive gains in handling multi-file edits.” When the top developer tools are building around it, you know you’ve got something game-changing.

Claude 4’s Defining Moment

As seen with Claude 2.0, the generative AI space is more competitive than ever. But this time, I took a different approach — one that makes this review unlike any other.

Over the past several hours, I pushed Claude 4 to its limits. Web research, source validation, structured writing, creative storytelling — even humor and satire. And the results? Genuinely impressive. In fact, the 700 words you’ve just read? They’re the outcome of those tests.

So here’s the real question: can you tell which parts I wrote and which Claude 4 did? Because honestly, after this experiment, I’m not sure I can anymore.

The future of AI is no longer a far-off promise. It’s already here — and it may have just told you that story itself.

Read the original article on: Futuro Prossimo

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Claude 4 Tested: Why It Might Outperform ChatGPT in the AI Race