Anthropic’s CEO Aims to Make AI Models More Transparent by 2027

By Jorge Paka Artificial Intelligence, Tech AI Models, Anthropic’s CEO Comments Off

On Thursday, Anthropic CEO Dario Amodei published an essay underscoring how little is known about the inner workings of today’s most advanced AI models. To tackle this, he set a bold target for Anthropic: by 2027, the company aims to reliably detect and address most issues within AI systems.

In his essay, The Urgency of Interpretability, Amodei admits the road ahead won’t be easy. While Anthropic has made early progress in tracking how models generate their outputs, he stresses that much deeper research is necessary to truly understand these increasingly complex systems.

“I’m deeply concerned about deploying these models without a clearer understanding of how they operate,” Amodei wrote. “They’ll be central to our economy, technology, and national security, and so autonomous that it’s simply unacceptable for us to remain in the dark about their decision-making.”

Anthropic Leads the Charge in Decoding AI Decision-Making

Anthropic is at the forefront of mechanistic interpretability—a field focused on unraveling the “black box” of AI models to understand the reasoning behind their decisions. Despite rapid advances in AI capabilities, researchers still know relatively little about how these systems reach their conclusions.

For instance, OpenAI recently introduced new reasoning models, o3 and o4-mini, which outperform earlier versions on some tasks—but they also tend to hallucinate more frequently. The cause remains unclear, even to their creators.

In his essay, Dario Amodei points out a major limitation of today’s generative AI systems: when an AI summarizes something like a financial report, we can’t explain—at a detailed level—why it chooses specific words or makes occasional errors, even when it’s usually accurate.

He highlights a comment by Anthropic co-founder Chris Olah, who said AI models are “grown more than they are built,” meaning researchers have found ways to improve model performance without fully understanding why these improvements work.

Amodei warns that approaching artificial general intelligence (AGI)—which he describes as “a country of geniuses in a data center”—without truly grasping how these models function could be risky. Although he previously estimated AGI might arrive by 2026 or 2027, he now believes understanding these systems could take much longer.

**Amodei Proposes “Brain Scans” for AI to Ensure Safer Deployment**

Looking ahead, Amodei envisions conducting deep diagnostic tests—like “brain scans” or “MRIs” for AI—to uncover a range of potential issues, such as tendencies toward dishonesty or power-seeking behavior. He estimates this kind of interpretability could take five to ten years to achieve, but sees it as essential for safely deploying future AI models.

Anthropic has already made progress in this area. The company has begun mapping “circuits” within its models—pathways that reveal how the AI processes information. One such circuit helps the model understand the relationship between U.S. cities and states. Although only a few circuits have been identified, Amodei estimates there could be millions of them in large models.

The company has also started investing in external startups focused on interpretability, reinforcing its commitment to this research. While currently viewed as part of AI safety, Amodei believes understanding how models reach conclusions could eventually become a business advantage as well.

In his essay, Amodei urged major players like OpenAI and Google DeepMind to ramp up their efforts in interpretability research. He also called on governments to adopt “light-touch” regulations that promote transparency—such as requiring companies to disclose their safety practices—and advocated for export controls on advanced AI chips to China to prevent a global AI arms race.

Anthropic has long distinguished itself from rivals by prioritizing AI safety. While other tech firms resisted California’s proposed AI safety bill (SB 1047), Anthropic offered cautious support and suggestions, aligning with its broader call for a more responsible, industry-wide approach to understanding—and not just advancing—AI capabilities.

Read the original article on: TechCrunch

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Anthropic’s CEO Aims to Make AI Models More Transparent by 2027