Claude 4 Tested: Why It Might Outperform ChatGPT in the AI Race

Alex Albert from Anthropic didn’t hold back: “Claude 4 is the best programming model in the world.” A bold claim — but the data seems to support it. Scoring 72.5% on the SWE-bench Verified benchmark, running autonomously for hours on end, and handling thousands of sequential steps, Claude 4 isn’t just another upgrade. It’s a shift in what we expect AI to do. And the best part? You can try it yourself right now.
When AI Pulls an All-Nighter
Anthropic made waves today with the release of Claude 4 Opus and Claude 4 Sonnet — marking its return to full-scale models after months refining Sonnet variants. What truly stands out is Claude 4’s endurance: it can stay operational for 24 hours straight without losing accuracy or context.
That’s right. While your fellow developer might start grumbling after eight hours of debugging, Claude 4 Opus has been shown to play Pokémon non-stop or refactor code for seven hours straight. Earlier models typically hit their limit after a couple of hours. As Alex Albert put it, “There’s a massive demand for agent-based applications, and Claude 4 fits that role perfectly.”

The Numbers That Are Making Rivals Nervous
Think of benchmarks like poker — and Claude 4 just laid down a royal flush. With 72.5% on the SWE-bench Verified test, it leaves previous models far behind. For context, scoring over 50% was once seen as a major win. It also pulled in a solid 43.2% on Terminal-bench.
The impact was immediate: GitHub has adopted Claude 4 Sonnet as the foundation for its latest Copilot coding agent. That kind of switch doesn’t happen without good reason. Sourcegraph called it “a major leap in software development,” and Augment Code reported “higher success rates and cleaner, more precise code changes.” In short, everyone wants a ticket on the Claude 4 express.
Claude 4: Powerful, But Built With Safety in Mind
Anthropic activated its Level 3 AI safety protocol for the first time — typically reserved for models with the potential to aid in developing chemical, biological, or nuclear weapons. Claude 4 Opus is that powerful, requiring strict safeguards to prevent misuse.
Introducing ‘Deep Thought’ Mode
One standout feature of Claude 4 is its ability to toggle between rapid responses and deeper, more deliberate reasoning. Activate its extended thinking mode, and the model takes a moment to “think” — even showing a preview of what it’s processing. It’s like having a colleague who walks you through their logic instead of just giving you the answer.
Claude Code integration is also now widely available, complete with GitHub Actions and built-in support for VS Code and JetBrains. Suggested code changes show up directly in your files — no more tedious copying and pasting. It’s seamless.
A Multi-Billion Dollar Bet That’s Paying Off
Anthropic is now generating over $2 billion in annualized revenue, doubling its previous results. Chief Product Officer Mike Krieger — who also co-founded Instagram — openly says, “I used to do most of the writing myself, using Claude to bounce ideas. Now Claude 4 writes most of it.”
And he’s not the only one impressed. Cursor describes Claude 4 as “cutting-edge for coding,” while Replit highlights “massive gains in handling multi-file edits.” When the top developer tools are building around it, you know you’ve got something game-changing.

Claude 4’s Defining Moment
As seen with Claude 2.0, the generative AI space is more competitive than ever. But this time, I took a different approach — one that makes this review unlike any other.
Over the past several hours, I pushed Claude 4 to its limits. Web research, source validation, structured writing, creative storytelling — even humor and satire. And the results? Genuinely impressive. In fact, the 700 words you’ve just read? They’re the outcome of those tests.
So here’s the real question: can you tell which parts I wrote and which Claude 4 did? Because honestly, after this experiment, I’m not sure I can anymore.
The future of AI is no longer a far-off promise. It’s already here — and it may have just told you that story itself.
Read the original article on: Futuro Prossimo
Read more: Study Finds AI Chatbots Still Easy to Manipulate into Giving Harmful Advice
Leave a Reply