Anthropic's Safety Pledge Is Gone — And Every Major AI Lab Has Now Broken Its Own Rules

When Anthropic unveiled its Responsible Scaling Policy in 2023, the document was positioned as something unprecedented: a voluntary, binding commitment by an AI company to pause the development and deployment of increasingly powerful systems if it couldn't guarantee in advance that those systems were safe. It was the central pillar of Anthropic's identity — the thing that made the company, in its own words, different from every other AI lab racing toward more capable models. For two years, the policy held. Last week, Anthropic quietly dismantled it.

The company published a revised version of the RSP in late February 2026, stripping out the core promise that made the original document significant: the categorical commitment to halt AI training if adequate safety measures weren't already in place. The new policy replaces that hard stop with a more flexible framework — one that commits to transparency about safety risks, promises to match or surpass competitors' safety efforts, and says Anthropic will "delay" development only if it both leads the AI race and considers the risk of catastrophe to be significant.

That final clause, with its nested conditions, illustrates the transformation precisely. The old RSP said: safety first, competition second. The new RSP says: safety when it's convenient, competition always.

Why Anthropic Said It Changed Course

Anthropic's chief science officer Jared Kaplan gave the clearest public explanation in an exclusive interview with TIME magazine. His reasoning centers on three intersecting realities: the failure of regulation, the advance of competition, and the limitations of safety science itself.

On regulation: when Anthropic wrote the original RSP, the company hoped it would serve as a blueprint — that rivals would adopt similar measures, and that voluntary commitments would eventually be codified into binding national or international law. Neither happened. The Trump administration has actively worked to strip states of their authority to regulate AI. No federal AI law is on the legislative horizon. And the broader international governance conversation that seemed possible in 2023 has effectively collapsed as the U.S. and China both prioritize national AI competitiveness over coordinated oversight.

On competition: "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments — if competitors are blazing ahead," Kaplan told TIME. The logic is coherent, if uncomfortable: if Anthropic paused while OpenAI, Google DeepMind, Meta, and Chinese labs continued training more capable models, the result wouldn't be a safer AI landscape. It would just be a landscape where Anthropic was less relevant.

On safety science: the company also acknowledged that AI safety evaluations have proven significantly harder than anticipated. When newer, more powerful Claude models passed internal capability evaluations in 2025, Anthropic couldn't rule out that they might help bad actors develop bioweapons. The RSP's hard stop, in this context, would have effectively required pausing development indefinitely. "We felt that it wouldn't actually help anyone for us to stop training AI models," Kaplan said.

The Pattern That Preceded It

Anthropic's reversal didn't happen in a vacuum. It is the latest — and most ironic — chapter in a pattern that has now run through every major AI lab.

Google's original AI principles included a bold commitment: the company would not build AI systems that could cause harm. In early 2025, Google quietly dropped those principles, updating its policies to permit AI use in surveillance and weapons systems. The original pledge that made Google DeepMind's safety reputation was gone with almost no public discussion.

OpenAI, which was founded explicitly as a nonprofit research organization to ensure that artificial general intelligence benefits all of humanity, spent years emphasizing the word "safety" in its mission. In early 2025, OpenAI dropped "safety" from its core mission statement. By late 2025, the company was actively pursuing Pentagon contracts for classified AI deployment — a direction its founders had explicitly argued against when they left to form Anthropic in 2021.

Elon Musk's xAI effectively dismantled its safety function earlier this year, with its safety lead and other key figures departing amid reports from former employees that the team had been dissolved. The company that positioned itself as the "truth-seeking" alternative to what Musk characterized as OpenAI's ideological capture became the most direct embodiment of his stated goal: to "win the AI race."

MIT physicist Max Tegmark, founder of the Future of Life Institute and one of the earliest voices calling for a pause in advanced AI development, summarized the pattern with notable acidity in a recent interview with TechCrunch: all four major AI labs have now walked back their own safety commitments, he argued, and the industry's voluntary governance promises have proven structurally unable to survive competitive pressure. "The road to hell is paved with good intentions," he said.

The Deeper Contradiction: Safety Pledge vs. Safety in Practice

The timing of the RSP revision makes it particularly resonant. Anthropic quietly published the new policy in the same week that it was simultaneously being blacklisted by the Pentagon for refusing to remove a different set of safety guardrails — its red lines against autonomous weapons and domestic mass surveillance.

The juxtaposition is striking. On one front, the company was shedding its most prominent voluntary safety commitment, arguing that unilateral restraint in a competitive market is counterproductive. On another front, it was losing a $200 million government contract — and being designated a national security supply chain risk — precisely because it refused to abandon its position on two specific use cases that it considered genuinely dangerous.

Critics of Anthropic have pointed to this as evidence of inconsistency. Supporters argue the two situations are fundamentally different: the RSP represented a sweeping, categorical pause on capability development that had become impossible to operationalize meaningfully in a competitive environment, while the weapons and surveillance red lines represent specific, articulable ethical limits on deployment that Anthropic argues any responsible actor should respect.

The distinction is valid but uncomfortable. What it reveals is that Anthropic — and by extension, every major AI lab — has effectively conceded that broad, categorical safety constraints on development cannot survive indefinitely in a competitive market without binding legal enforcement. The company is left defending specific red lines as the symbolic remnants of a larger safety commitment it can no longer afford to keep whole.

What the New RSP Actually Says

The revised Responsible Scaling Policy is not a document that abandons safety rhetoric. It is a document that replaces categorical safety constraints with process commitments — and that difference matters enormously.

The new RSP includes several genuine improvements in transparency. Anthropic commits to publishing more detailed disclosures about how its models perform on internal safety evaluations, including cases where the models raise concerns about catastrophic risk. It sets up clearer communication channels between the safety team and board-level governance. It establishes external advisory mechanisms intended to provide independent scrutiny.

What the new RSP removes is the structural feature that made the original document powerful: the hard constraint. The original RSP operated like a circuit breaker — if safety conditions weren't met, capability development stopped. The new RSP operates more like a set of management guidelines — it creates process, documentation, and accountability structures without a categorical off switch.

Anthropic argues, not without merit, that a circuit breaker that the company has effectively decided it cannot afford to trip is no circuit breaker at all — it's a false assurance. The new document, in this reading, is simply more honest about the reality the company faces. But honesty about a difficult reality is not the same as resolving it.

The Governance Vacuum Now Fully Exposed

What the RSP collapse really illuminates is the structural problem that advocates of AI governance have warned about for years: the fundamental inadequacy of voluntary corporate self-regulation as a safety mechanism for potentially transformative technology.

In industries where the competitive stakes are existential — where companies that fall behind risk becoming irrelevant — voluntary constraints face constant pressure. The history of every major industrial technology offers variations on this pattern. Environmental self-regulation in the chemicals industry gave way to EPA standards. Financial industry self-regulation gave way to Dodd-Frank. Aviation safety practices were formalized into FAA rules precisely because the combination of high consequence and competitive pressure makes voluntary restraint structurally unreliable.

AI has followed the same arc, just faster. In 2023, voluntary commitments seemed like a reasonable interim solution while the legal and political infrastructure for formal regulation caught up. By 2026, the political infrastructure has not caught up — and the voluntary commitments are dissolving, one by one, under the weight of competition they were never designed to withstand permanently.

This is not a story about any single company's failure of integrity. It is a structural story about the limits of what any company can commit to unilaterally when its competitors are not bound by the same constraints. Anthropic's Kaplan essentially said as much. The problem is that this accurate diagnosis of the structural problem doesn't offer a solution to it.

What Comes Next

The collapse of AI industry self-governance does not mean that no governance mechanisms remain. It means that the mechanisms that exist are weaker, less transparent, and more fragmented than the voluntary policy era suggested they might become.

State-level AI regulation in the United States — targeted laws on specific use cases like AI in hiring, AI in medical devices, and deepfake disclosure — continues to advance, even as the Trump administration attempts to preempt it. The EU's AI Act, now in its implementation phase, provides binding requirements on high-risk AI systems for European deployments. Individual companies retain the right to refuse specific applications of their technology, as Anthropic has demonstrated with its autonomous weapons and surveillance red lines.

But the comprehensive, industry-wide, proactive safety governance that advocates hoped voluntary commitments might pioneer is not arriving through voluntary means. If it arrives at all, it will require legislative action of the kind that the current U.S. political environment makes unlikely in the near term. In the meantime, the most powerful AI systems in history are being trained and deployed under frameworks that are, by the admission of the companies building them, less constrained than their own previous policies required.

Anthropic's RSP was always a bet — a bet that good-faith voluntary restraint could substitute for binding rules long enough for binding rules to arrive. That bet has now expired. The industry is left to compete without the safety net that, at least in principle, it had built for itself. The question isn't whether that matters. It's who, if anyone, will build another one.

AI Regulation & Policy Anthropic AI Safety RSP Self-Governance OpenAI Google DeepMind

Anthropic's Safety Pledge Is Gone — And Every Major AI Lab Has Now Broken Its Own Rules

Why Anthropic Said It Changed Course

The Pattern That Preceded It

The Deeper Contradiction: Safety Pledge vs. Safety in Practice

What the New RSP Actually Says

The Governance Vacuum Now Fully Exposed

What Comes Next

Related Articles

Anthropic Takes Pentagon to Court: Trump Bans Federal AI Use as Constitutional Showdown Begins

OpenAI Steps Into the Pentagon as Anthropic Is Blacklisted

March 11 Is the AI Governance Reckoning: What Happens When Commerce Names Its Targets

DOJ's AI Litigation Task Force Is Now Active — And Every State AI Law Is a Target

Trump vs. the States: The AI Regulation Showdown That Could Reshape Tech Law