The AI Trust Letter
Posts
Claude Mythos/ Capybara: the Anthropic leaked model

Claude Mythos/ Capybara: the Anthropic leaked model

Top AI and Cybersecurity news you should check out today

Rodrigo Fernandez
March 31, 2026

Welcome Back to The AI Trust Letter

Once a week, we distill the most critical AI & cybersecurity stories for builders, strategists, and researchers. Let’s dive in!

🔓 Anthropic Leaks Details on Next-Gen Claude Model

The Story:

A configuration error in Anthropic's content management system made roughly 3,000 internal assets publicly accessible by default. Among them: a draft blog post describing their next frontier model, internally called "Claude Mythos" or "Capybara."

The details:

The CMS was set to make uploaded assets, including images, PDFs, and audio files, public by default unless explicitly marked private. No malicious actor was involved.
The leaked documents describe the model as "a step change" in AI performance and their most capable to date, with notably higher benchmark scores across coding, academic reasoning, and cybersecurity tasks.
Anthropic's own assessment states the model is "currently far ahead of any other AI model in cyber capabilities" and warns it could enable exploits that outpace defenders.
Anthropic plans to release the model first to cyber defenders, giving them a head start before wider availability.
OpenAI has made similar disclosures, classifying GPT-5.3-Codex as its first model with "high capability" for cybersecurity tasks under its Preparedness Framework.

Why it matters:

The leak itself is a textbook case of misconfiguration risk, the kind of basic operational failure that affects organizations at every level. But the contents are what demand attention. Both Anthropic and OpenAI are now openly acknowledging that their frontier models have crossed into territory where offensive cyber capabilities are a first-order concern, not a theoretical one. The question of who gets access to these models, and in what order, is becoming a security decision as much as a product one.

Read full article

🤖 Vulnerabilities from AI-Generated Code Are Accelerating Fast

The Story:

Researchers have been tracking security flaws directly introduced by AI coding tools since May 2025. The numbers are climbing fast: 6 confirmed cases in January, 15 in February, at least 35 in March.

The details:

The project, called Vibe Security Radar, monitors public vulnerability databases and traces each security flaw back to the code commit that introduced it. When an AI tool's fingerprint is present, the case gets flagged and counted.
Across 50 AI coding tools tracked, 74 confirmed security flaws have been attributed to AI-generated code. Claude Code appears most often, but researchers say this is largely because it consistently leaves a traceable signature in the code it writes. Other tools like GitHub Copilot leave no such trace, making attribution harder and the real numbers likely much higher.
The team estimates the actual count is five to ten times what they can detect, somewhere between 400 and 700 cases across publicly available software, since many developers strip AI metadata before publishing their code.
Claude Code alone accounted for over 4% of all public commits on GitHub last month, a share that is still growing.

Why it matters:

More developers are using AI tools to write code and shipping it directly to production with little or no review. The security flaws being introduced are not exotic or new. They are well-known vulnerability types that an experienced developer would likely have caught.

The problem is scale: when AI tools are generating a large share of a codebase at speed, traditional code review cannot keep up. Security teams and engineering leaders need to rethink how they check AI-generated code before it reaches users.

Read full article

⚖️ What to Know About California’s Executive Order on AI

The Story:

California Governor Gavin Newsom signed an executive order on March 30 requiring AI companies that want to do business with the state to meet stricter safety and privacy standards. It is a direct counterpoint to the Trump administration's push to limit state-level AI regulation.

The details:

Companies seeking state contracts must now demonstrate safeguards against AI misuse, including the generation of illegal content, harmful bias, and civil rights violations.
State agencies have 120 days to propose new AI procurement and governance measures, with the Department of General Services and the California Department of Technology central to that work.
If the federal government labels an AI company as a supply chain risk, California will run its own independent assessment and may continue working with that company if it finds no risk. This follows the Pentagon's recent move to bar government contractors from using Anthropic's technology.
The order also requires state agencies to watermark AI-generated images and video, the first such mandate of its kind in the US.
The order builds on the Transparency in Frontier AI Act passed last year, which already requires large AI developers to publish safety frameworks, release transparency reports, and report critical incidents to the state.

Why it matters:

California's position matters because it tends to set precedents that other states and trading partners eventually follow. What starts as a procurement requirement for state contracts often becomes the baseline expectation for enterprise buyers and regulators elsewhere. The immediate pressure is on AI vendors: they will need to document how their systems work, what risks they carry, and who is accountable. That is a higher bar than most have had to clear before.

Read full article

🧠 Prompt Caching: How AI Agents Are Finally Getting Working Memory

The Story:

Every time an AI agent takes a step, it resends its full context to the model, including system instructions, tool definitions, and conversation history. Without caching, the model reprocesses all of it from scratch on every turn. Prompt caching fixes this by storing the processed state of static prompt sections so only new content needs to be computed.

The details:

LLMs store processed tokens in a Key-Value (KV) cache. In a stateless setup, that cache is discarded after each response. Prompt caching keeps it on the provider's servers and reuses it when subsequent requests share the same prefix.
The cost impact is significant. Anthropic and OpenAI offer up to 90% discounts on cached tokens, and latency drops by roughly 80%. A 10,000-token system prompt running across a five-step task goes from 50,000 tokens of repeated input to a one-time cost.
This differs from semantic caching, which stores final answers to repeated questions. Prompt caching stores the model's understanding of the prompt prefix, making it useful for dynamic agents where the base instructions stay fixed but the conversation evolves.
Caching introduces security risks: cache poisoning, multi-tenant data leakage, and the "confused deputy" problem, where a cached agent state is manipulated by malicious user input. Mitigations include per-organization cryptographic hashing, input validation, and short time-to-live settings.

Why it matters:

Most enterprise agent deployments hit a wall on cost and latency before reaching production scale. Prompt caching is the infrastructure change that makes high-context, multi-step agents viable. It also raises the stakes on security: cached agent state is as sensitive as any database, and needs to be treated that way.

Read full article

🎬 OpenAI Shuts Down Sora Six Months After Launch

The Story:

OpenAI killed Sora, its AI video generation tool, just six months after public release. Initial theories pointed to a data grab gone wrong, but the real reason was simpler: the product was losing roughly $1 million per day and barely anyone was using it.

The details:

After a high-profile launch, Sora's global user count peaked around one million then fell below 500,000. Video generation is compute-heavy, and each session consumed expensive GPU capacity.
While an internal team was dedicated to keeping Sora alive, Anthropic was steadily winning over the developers and enterprise customers that drive meaningful revenue. Claude Code, in particular, was pulling OpenAI's core audience.
CEO Sam Altman made the decision to shut Sora down, free up compute, and redirect resources. Disney, which had committed $1 billion to a partnership built around Sora, learned the product was being discontinued less than an hour before the public announcement. The deal collapsed with it.

Why it matters:

Sora was one of OpenAI's most visible bets on consumer AI. Its failure is a signal that flashy demos do not translate into retention or revenue, and that compute allocation is now a strategic constraint that shapes which products survive. The fact that Claude Code was a direct factor in the decision is also worth noting: the developer tooling race is becoming a key front in the broader competition between labs.

Read full article

What´s next?

Thanks for reading! If this brought you value, share it with a colleague or post it to your feed. For more curated insight into the world of AI and security, stay connected.

NeuralTrust | The leading security platform for generative AI

Our platform uncovers vulnerabilities, blocks attacks, monitors performance, and ensures regulatory compliance — everything enterprises need to scale AI

neuraltrust.ai