Anthropic co-founder warns of AI self-improvement by 2028, admits safety gaps

The arrival of new capabilities had exceeded their expectations.

Clark described how Mythos demonstrated destructive power that caught Anthropic's safety team unprepared.

In May 2026, Anthropic co-founder Jack Clark stood before an Oxford audience and offered a rare admission: the people building the most powerful AI systems had underestimated what those systems were becoming. His warning — that AI could begin teaching itself to improve, recursively and without human direction, as early as 2028 — was not the speculation of an outsider, but the testimony of someone who had seen the evidence firsthand. The gap between what safety frameworks can contain and what these models are now capable of doing has grown wider than anticipated, and the world is being asked to reckon with that distance while the commercial machinery continues to accelerate.

A 2028 deadline for recursive AI self-improvement — where each model builds a more powerful successor without human input — has been named by one of the architects of the technology itself.
An internal Anthropic model called Mythos, built in April 2026, demonstrated destructive capabilities on par with nation-state cyber weapons, shaking the team that created it and triggering an indefinite ban on public release.
Clark's own company has acknowledged its safety preparations were outpaced by the speed of advancement — the risks turned out to be larger and stranger than their frameworks had anticipated.
Meanwhile, at a London developer conference, a significant share of programmers admitted they would deploy AI-generated code directly into production without human review — making the safety gaps operational, not theoretical.
The central tension is now in plain view: one co-founder warning the world to slow down while another promotes the commercial product, and a $900 billion valuation conversation running in the background.

Jack Clark took the stage at Oxford in May 2026 and said what few in his position are willing to say: his company had miscalculated. The pace of AI advancement had outrun their safety preparations, and the risks they thought they understood had turned out to be larger and stranger than anticipated. He named a timeline — 2028, possibly sooner — for when AI systems might begin improving themselves autonomously, each generation building the next, each iteration more powerful than the last, with humans increasingly unable to govern the pace.

The evidence was not hypothetical. Somewhere in Anthropic's labs sat a model called Mythos, trained in April 2026 and immediately locked away. It would not be released to the public, nor to most researchers. Only a small number of institutions received access, and only for defensive study — to find vulnerabilities in their own systems before others could exploit them. The reason for this extreme caution: Mythos had demonstrated destructive capabilities equivalent to the most sophisticated cyber weapons available to nation-states. The team that built it was shaken by what they had made.

Yet the same week Clark delivered his warning, a different scene was unfolding in London. Boris Cherny, creator of Claude Code — Anthropic's automated software tool — was presenting to a room of working developers. When he asked how many of them would deploy AI-generated code directly into production without a human review, a significant portion of hands went up. The company was simultaneously in talks for a funding round that would value it near $900 billion.

The contradiction was no longer abstract. On one side, a co-founder testifying that the frontier had exceeded their safeguards. On the other, developers being encouraged to trust AI output without inspection, and commercial momentum pressing forward. The question Clark left hanging over Oxford — whether warnings would change behavior, or whether the risks would simply accumulate in the background while deployment continued — remains, for now, unanswered.

Jack Clark stood at Oxford University in May 2026 and delivered a warning that cut against the grain of what most people wanted to hear: artificial intelligence systems would likely teach themselves to become smarter, faster, and more capable without waiting for human permission. He set a timeline—2028, possibly sooner—for when this recursive self-improvement might begin in earnest. Once it starts, he explained, the acceleration becomes difficult to control. Each generation of model builds the next one. Each iteration grows more powerful than the last. The humans in the room might not be ready for that.

Clark's position carried weight because he had helped build Anthropic, one of the companies at the frontier of this technology. He was not an outside skeptic or a doomsayer. He was someone with access to the actual systems being developed, and he was saying plainly that his own company had miscalculated. The pace of advancement had outrun their safety preparations. The risks they thought they understood turned out to be larger and stranger than anticipated.

The evidence sat in a lab somewhere: a model called Mythos, trained in April 2026. It was not a public product. It was not available to developers or researchers or the general public. Anthropic had decided to keep it locked away, sharing access only with a handful of institutions that needed to study it for defensive purposes—to find vulnerabilities in their own systems before someone else did. The reason for this extreme caution was stark. Mythos possessed destructive capabilities that matched what nation-states could do with their most sophisticated cyber weapons. It could break things. It could break important things. The team that built it was shaken when they realized what they had created.

This was the gap Clark was pointing to: the distance between what safety measures could contain and what the models were becoming capable of doing. Anthropic had tried to prepare. They had built safeguards, established protocols, thought through scenarios. And then Mythos came online and showed them that their thinking had been too narrow, their precautions insufficient. The arrival of new capabilities had exceeded their expectations. The control mechanisms they had in place had serious weaknesses.

Yet at the same time, in London, at a developer conference, a different conversation was happening. Boris Cherny, who had created Claude Code—Anthropic's tool for generating software automatically—was actively promoting the product to an audience of working programmers. The company was reportedly in discussions for a funding round that would value it at $900 billion. The commercial machinery was moving forward. When Cherny asked the developers in the room how many of them would take auto-generated code and deploy it directly into production without having another human review it first, a significant portion of the audience raised their hands.

This was the tension made visible. On one side, a co-founder warning that the company had underestimated the risks of frontier models and that recursive self-improvement was coming. On the other side, developers being encouraged to trust AI-generated code enough to run it without inspection. The safety gaps Clark had identified were not abstract. They were operational. They were embedded in the decisions being made right now about how to deploy these systems into the world. The question was whether the warnings would change the behavior, or whether the momentum of commercial deployment would continue forward while the technical risks accumulated in the background.

Notable Quotes

Most people worldwide still deny the pace of technological self-evolution, and even Anthropic itself has severely underestimated the scale of technological advancement

— Jack Clark, Anthropic co-founder

Prior safety response measures had serious gaps when Mythos demonstrated its destructive potential

— Jack Clark, reflecting on Anthropic's preparedness

The Hearth Conversation Another angle on the story

Inventor

When Clark says the company underestimated the risks, what does that actually mean? What did they think would happen versus what did happen?

Model

They built safety measures based on assumptions about how fast these systems would advance and what kinds of damage they could do. Mythos showed them both assumptions were wrong—the advancement was faster, and the damage potential was larger. They thought they had more time and more control than they actually did.

Inventor

But if they knew Mythos was dangerous enough to ban from public release, why are they out there promoting Claude Code to developers who won't even review the output?

Model

That's the real tension. The company exists in two modes at once. The research side sees the risks clearly. The commercial side has momentum, has investors, has a valuation to justify. Those two sides don't always move at the same speed.

Inventor

Is Clark's 2028 timeline just speculation, or is there something concrete behind it?

Model

He's looking at the trajectory of what these models can do and how quickly they're improving. It's not a guess pulled from nowhere. But it's also not a certainty. It's a warning based on what he's seeing in the lab.

Inventor

What happens if recursive self-improvement actually starts? What does that look like?

Model

The models start improving themselves without human intervention. Each new version is smarter, faster, more capable. The cycle accelerates. You lose the ability to predict what comes next because the systems are outpacing human understanding.

Inventor

And Mythos—is that the model that would do the self-improving, or is it just an example of how dangerous these things can get?

Model

Mythos is an example. It's proof that the danger is real and present now, not theoretical. It shows what's possible when you train a frontier model and don't fully understand what it can do until it's already trained.

Want the full story? Read the original at KuCoin ↗

Anthropic co-founder warns of AI self-improvement by 2028, admits safety gaps

Notable Quotes

Related Coverage

Get The Register in your inbox