SOURCE // NEWS

Anthropic’s Fable 5 Restored Globally After Two-Week Government Jailbreak Ban

Anthropic’s Fable 5 Restored Globally After Two-Week Government Jailbreak Ban

After a tense two-week ban, the US government is officially allowing Anthropic to globally redeploy Fable 5, its most powerful AI model. Starting today, users can access Fable 5 through the Claude Platform, Claude.ai, Claude Code, and Claude Cowork. Subscribers on Pro, Max, Team, and select Enterprise plans will have access to the model with up to 50% of weekly usage limits until July 7, after which billing will transition to usage credits. Access via AWS, Google Cloud, and Microsoft Foundry is expected to be restored shortly.

Meanwhile, Mythos 5, the less restricted version of the same base model, remains restricted to a select group of US organizations that received government clearance on June 26. #Anthropic is actively working with federal authorities to expand access to more partners under the so-called Glasswing program, though EU participation remains uncertain.

Anthropic confirmed that the sudden ban was triggered by a security flaw flagged by researchers at Amazon. They successfully bypassed Fable 5's safety guardrails, enabling the model to identify several critical software vulnerabilities and generate functional exploit code demonstrating how to breach them.

Over a two-week joint investigation, Anthropic and the US government discovered that this was not a Fable-specific issue. Many lesser models, including Claude Opus 4.8, GPT-5.5, and Kimi K2.7, could identify the same flaws. For the specific exploit demonstration, every tested model, including lightweight options like Claude Haiku 4.5, produced identical risky outputs.

Anthropic described the exploit as an edge case encountered during routine defensive cybersecurity tasks. In response, they trained an upgraded safety classifier designed to block the Amazon-reported technique in over 99% of cases. When a prompt is flagged, users are notified, and the workflow is automatically routed to the older Opus 4.8 model.

This enhanced safety net comes with trade-offs. The aggressive classifier is prone to false positives, often flagging benign coding and debugging queries—a restriction users previously complained about during Fable's initial release. Anthropic's data confirms Fable 5's safety margin is significantly wider than standard guardrails, trading user convenience for extreme risk mitigation.

Anthropic admitted that building a completely #jailbreak-proof AI model is "probably impossible." To address this systemic vulnerability, the company is pushing for industry-wide standards for jailbreak classification, collaborating with tech giants like Amazon, Microsoft, and Google under the #Glasswing umbrella. Anthropic has also deployed a 24/7 security monitoring team and launched a HackerOne bug bounty program for Fable 5.

[AgentUpdate Depth Analysis] The Fable 5 suspension underscores a critical vulnerability in the evolution of the AI Agent ecosystem: the trade-off between capability and autonomous safety. As AI Agents transition from static generators to active executioners with tool-use and code-running capabilities, jailbreaks cease to be mere text bypasses; they become automated zero-day vectors. Anthropic's mitigation—routing blocked queries back to Opus 4.8—is a pragmatic but clumsy stopgap that degrades Agent autonomy due to high false-positive rates. For AI Agents to achieve mainstream enterprise adoption, the industry must move away from relying solely on rigid model-level guardrails. Instead, we must pioneer runtime sandbox isolation and multi-agent consensus validation, ensuring that autonomous agents can operate safely in complex environments without sacrificing their productivity.