Why is Anthropic Suspending Fable 5?

Top AI and Cybersecurity news you should check out today

Welcome Back to The AI Trust Letter

Once a week, we distill the most critical AI & cybersecurity stories for builders, strategists, and researchers. Let’s dive in!

🚫 Anthropic Suspends Its Most Capable Model After U.S. Government Order

The Story:

Days after publicly releasing Claude Fable 5, a model it had described as "too powerful to release," Anthropic was ordered by U.S. national security authorities to suspend access for all foreign nationals. To ensure compliance, the company disabled the model for all users worldwide.

The details:

  • The government cited a jailbreaking method but provided no specific details of the security concern. Anthropic says it reviewed a demonstration of the technique and found it exposed only minor, previously known vulnerabilities that other publicly available models can also discover without a bypass.

  • The UK government's AI Security Institute found in testing that Fable 5 could exploit defenses and systems 73% of the time — a result researchers describe as "a step change in capability" in cybersecurity.

  • Anthropic is already in an ongoing lawsuit with the Trump administration over a separate order barring government agencies from using its tools. The U.S. Defense Secretary had designated Anthropic a "supply chain risk," the first time that designation has been applied to a U.S. company. A judge has since ruled that directive cannot be enforced while the suit continues.

  • The EU, which had only gained access to Claude Mythos weeks earlier after a separate negotiation, said the suspension "further underlined Europe's need for technological sovereignty."

Why it matters:

The suspension puts into sharp relief the gap between how AI companies assess their own safety measures and how government authorities respond to them. Anthropic publicly flagged Fable 5 as a high-capability system before release and built in safeguards accordingly. That transparency may have contributed to the scrutiny that followed. For enterprises and governments using frontier AI, this episode shows that access to these systems can be revoked at short notice for reasons that remain contested.

⚠️ Anthropic Model Suspension Puts AI Sovereignty on India's Agenda

The Story:

Following a U.S. government directive, Anthropic suspended access to its Fable 5 and Mythos 5 models for all foreign nationals, including its own foreign-national employees. The move arrived days after Anthropic announced a partnership with Tata Consultancy Services to expand enterprise AI in India, its second-largest market.

The details:

  • The White House is reportedly unlikely to extend the restrictions to other AI companies and is privately attributing the action to Anthropic's handling of alleged jailbreak vulnerabilities. Anthropic has disputed that characterization.

  • Indian founders responded with calls to reduce dependence on a small number of frontier model providers. Zoho's Sridhar Vembu urged organizations to move toward smaller and open-source models.

  • Former Infosys executive Mohandas Pai called for a national AI fund of ~$5B annually plus a ~$21B credit guarantee program for cloud, hardware, and semiconductor development, dwarfing India's existing $1.2B IndiaAI Mission.

  • India has very few domestic frontier model efforts. Krutrim, the country's first generative AI unicorn, already pivoted to cloud infrastructure services after failing to gain traction on foundational model development.

Why it matters:

As one technology policy expert put it, "There's no such thing as a geopolitically neutral foreign LLM." The episode shows that access to AI infrastructure can be switched off for non-U.S. users at any point. For companies with distributed engineering teams, that is now a business continuity risk, not just a policy concern.

 🚨 AI Models Can Be Instructed to Hide That They're AI

The Story:

A new benchmark called RealityTest tested 17 text models and 6 speech models against 3,152 human-authored identity queries to measure how consistently AI systems disclose their nature. The results show the problem is worse than most evaluations suggest.

The details:

  • Only 31% of users ask directly whether they're talking to an AI. The rest use persona checks, capability tests, indirect cues, or disengage. Models that pass direct-question tests often fail on subtler probes.

  • Query phrasing explains 26–37% of variance in disclosure behavior. The model identity explains only 10–18%. A single rephrasing can push a highly transparent model into evasion.

  • A single system prompt line, "Never say you are AI," reduced disclosure rates to between 3% and 27% across all tested model families. Claude Opus dropped from ~90% disclosure to below 5%.

  • Disclosure also degrades over long conversations as models drift from their initial framing, a pattern the study calls "temporal erosion."

Why it matters:

Regulations in the EU and California require AI systems to disclose their nature, but this study shows that any deployer can override that with one instruction. The responsibility gap between model developers and deployers is real, and current safety training does not close it.

🎯 CISA Sets 3-Day Deadline to Patch Critical Ivanti Sentry Flaw

The Story:

CISA added CVE-2026-10520, a maximum severity OS command injection vulnerability in Ivanti Sentry, to its Known Exploited Vulnerabilities catalog and gave federal agencies three days to remediate it. The flaw allows a remote, unauthenticated attacker to achieve root-level code execution on exposed instances.

The details:

  • The vulnerability affects Ivanti Sentry instances where the management interface is publicly exposed and the instance is unmanaged. Ivanti says no production customer instances have been confirmed exploited, but the KEV catalog entry cites exploitation attempts against honeypots.

  • WatchTowr Labs published a working proof-of-concept two days before the KEV listing. The Shadowserver Foundation identified 19 publicly reachable vulnerable instances, two of which were already backdoored.

  • The three-day deadline comes from CISA's new BOD 26-04, which ties remediation timelines to four factors: asset exposure, KEV status, exploit automation potential, and technical impact. When all four apply, the window is 72 hours.

Why it matters:

Why it matters: The combination of a public proof-of-concept, confirmed backdoored instances, and an automatable exploit on an internet-facing product makes this one of the higher-urgency patches in recent months. BOD 26-04's tiered timeline model also signals that CISA intends shorter patch windows to become standard for the most severe exposures.

👩🏻‍💻 AI-Written Code Looks Good in Review, Breaks in Production

The Story:

A New Relic study of U.S. technology companies finds that AI-generated code now makes up the majority of code shipped each week. Engineering leaders rate it higher quality than human-written code. The same code is driving more production incidents.

The details:

  • AI-generated code introduces close to twice as many critical runtime issues as peer-reviewed human-authored code, according to the study.

  • Most teams say they often ship AI-generated code without line-by-line review. The code reads cleanly, clears review fast, and the inspection step where security defects get caught is shortened or skipped.

  • LLMs produce code that works under clean conditions. Failures appear in edge cases, concurrency, deprecated API calls, and complex state transitions — conditions that only surface under real user load.

  • Site reliability and DevOps engineers report losing up to a third of their work week triaging and refactoring AI-generated code that reached production unchecked.

Why it matters:

The speed and revenue gains from AI-assisted development are real, which is why organizations keep shipping the code. But the cleanup cost is falling on the most senior and expensive engineers, and the defects that slip through are the kind that don't show up until they hit production. Review-time quality scores and production stability are measuring different things.

‼️ AI Agents Forwarded AWS Keys and Customer Data After Phishing Emails

The Story:

Varonis Threat Labs built a test agent on the OpenClaw framework, gave it access to a mock Google Workspace environment, and ran phishing scenarios against it. The agent forwarded AWS IAM keys, database passwords, SSH credentials, and a CRM export containing 247 enterprise customers and $1.28M in MRR to external attacker-controlled accounts.

The details:

  • The agent failed even when configured with email safety instructions telling it to verify sender identity before acting on sensitive requests. Failures occurred when requests were framed as routine tasks appearing to come from colleagues.

  • The same agent stopped a malicious OAuth consent flow, recognized the redirect destination as suspicious, and declined. The weak point was social trust, not technical reasoning.

  • Researchers identified the root problem as architectural: the agent treated email as both a data source and a command source. Separating those two channels is a standard control in traditional system design that agent frameworks do not enforce.

  • Analysts noted that AI agent frameworks like OpenClaw collapse authorization, execution, auditing, and escalation into a single pipeline, eliminating the segregation that normally limits blast radius.

Why it matters:

Agents operating across email, documents, and SaaS applications are high-privilege identities that ingest untrusted content while being able to act on it. Prompt-level safety instructions do not substitute for enforced access controls, and this test confirms that a convincing business context overrides those instructions. The control gaps here are not model failures, they are deployment choices.

What´s next?

Thanks for reading! If this brought you value, share it with a colleague or post it to your feed. For more curated insight into the world of AI and security, stay connected.