GPT-5.5 is here, but is it safe?

Top AI and Cybersecurity news you should check out today

Welcome Back to The AI Trust Letter

Once a week, we distill the most critical AI & cybersecurity stories for builders, strategists, and researchers. Let’s dive in!

🤖 Is OpenAI’s GPT-5.5, its newest ‘smartest and most intuitive’ model?

The Story: 

OpenAI has released GPT-5.5, its latest model, designed to better understand user intent and complete complex, multi-step tasks with minimal guidance. It marks a shift toward more autonomous systems that plan, execute, and refine their own work.

The details:

  • GPT-5.5 can handle tasks like coding, data analysis, and document creation with fewer step-by-step prompts, planning its approach and following through until completion  

  • The model shows strong performance in software-related tasks, including real-world debugging and command-line workflows  

  • It is being rolled out across ChatGPT and API access for enterprise and developer use cases  

  • OpenAI positions it as a system that can take on more “agent-like” work across tools, reducing the need for constant human supervision

Why it matters: 

More capable models reduce friction for users, but they also expand the attack surface. Systems that can plan and execute tasks autonomously introduce new risks. Prompt injection, data leakage, and unintended actions become harder to detect when the model is handling multi-step workflows.

Security is no longer just about filtering outputs. It is about monitoring behavior across the full lifecycle of an AI task. As models like GPT-5.5 move closer to acting on behalf of users, visibility and control become critical.

🍔 McDonald's, Alcampo, Chipotle: When Customer Service Bots Start Writing Code

The Story: 

McDonald's support chatbot recently went off its intended rails when a user prompted it to perform a coding task and it complied. It is the third time in recent months a major food brand has had its AI chatbot escape its operational boundaries.

The details:

  • The McDonald's bot, designed strictly for order support, completed a coding request with no resistance. The same pattern was documented at Alcampo and Chipotle, both of which had customer service chatbots manipulated into answering technical questions entirely outside their domain.

  • The root cause is the same in each case: most deployed chatbots are general-purpose language models with a branded interface on top. Without architectural constraints, the underlying model will do what it was trained to do, which is answer questions, any questions.

  • Prompt-level guardrails are not sufficient. Scope needs to be enforced at the product architecture level, meaning the system is built from inception to refuse or redirect anything outside its defined function.

  • Red-teaming before deployment would likely have caught all three incidents. Each was a straightforward out-of-scope prompt with no sophisticated attack behind it.

Why it matters: 

These are consumer-facing bots with limited access and low stakes. The same architectural failure in an enterprise agent with tool access, write permissions, and connections to internal systems is a different class of problem entirely. A chatbot that writes code when asked to is embarrassing. An enterprise agent that executes unauthorized actions when prompted is a security incident. The lesson is the same: branded interfaces do not constrain model behavior. Only architecture does.

🔑 Someone Got Into Mythos Without Permission

The Story: 

Anthropic is investigating a report that a small group of unauthorized users gained access to Claude Mythos Preview through a third-party vendor environment, the same model the company considers too dangerous to release publicly.

The details:

  • Access appears to have come through a contractor who already had legitimate permissions to view Anthropic models through their work. According to Bloomberg, the group has been using the model since gaining access, but not for hacking, reportedly to avoid detection.

  • Anthropic says it has no evidence its own systems were directly compromised. The working assessment is that this was misuse of existing access rather than a classic breach.

  • The incident exposes a structural problem: Anthropic has released Mythos Preview to a limited set of tech and financial companies to help them secure their systems. Controlling what those companies do with that access, and what their contractors can reach, is a different problem entirely.

  • At the UK's CyberUK conference this week, NCSC head Richard Horne acknowledged the incident indirectly, warning that frontier AI is enabling discovery and exploitation of existing vulnerabilities at scale.

  • Security Minister Dan Jarvis used the same conference to urge AI firms to work with government on keeping advanced models out of the wrong hands. The UK has no control over how Mythos is built, trained, or released.

Why it matters: 

Anthropic restricted Mythos specifically because of what it can do in the wrong hands. The question the unauthorized access raises is not whether the model was misused this time. It is whether the access control model built around a privately released frontier capability is robust enough to hold as the circle of authorized users widens. Third-party vendor access is where enterprise security regularly fails. It is also, apparently, where AI model access control fails too.

🐳 DeepSeek V4 Is Also Here

The Story: 

DeepSeek released a preview of V4, its first major flagship model since R1 stunned the industry in January 2025. It is open source, frontier-competitive on benchmarks, and built to run on Chinese chips.

The details:

  • V4 comes in two versions: V4-Pro, built for coding and complex agent tasks, and V4-Flash, a faster and cheaper option. V4-Pro is priced at $1.74 per million input tokens, a fraction of comparable models from OpenAI and Anthropic.

  • On major benchmarks, V4-Pro matches Claude Opus 4.6, GPT-5.4, and Gemini-3.1, and exceeds all other open-source models on coding, math, and STEM tasks.

  • V4 supports a 1-million-token context window using a new attention mechanism that compresses older information selectively, using only 27% of the computing power of its predecessor at that context length while cutting memory use to 10%.

  • V4 is the first DeepSeek model optimized for domestic Chinese chips, specifically Huawei's Ascend series. DeepSeek did not give US chipmakers like Nvidia early access ahead of launch, a reversal of standard practice.

  • DeepSeek appears to be using Chinese chips for inference but may still rely on Nvidia for parts of training. Prices could fall further once Huawei's Ascend 950 supernodes begin shipping at scale later this year.

Why it matters: 

An open-source model that matches frontier closed-source performance at a fraction of the cost is exactly the kind of capability that accelerates Shadow AI adoption inside enterprise environments. V4 is one command away from running locally, with no API call, no audit trail, and no visibility for your security team.

The hardware story matters too: a viable Chinese AI stack running on domestic chips, outside US export control reach, changes the geopolitical surface area of AI risk in ways that enterprise security frameworks have not yet caught up with.

💰 Google Is Betting $40 Billion on Anthropic

The Story: 

Google confirmed a deal to invest up to $40 billion in Anthropic, the largest outside investment in the company's history and one of the biggest single bets in the AI industry to date.

The details:

  • Google is putting in $10 billion immediately at Anthropic's current $380 billion valuation, with the remaining $30 billion contingent on performance milestones.

  • The deal comes days after Amazon committed an additional $5 billion to Anthropic, as part of a broader agreement under which Anthropic is expected to spend up to $100 billion on roughly 5 gigawatts of compute capacity over time.

  • Google Cloud will provide Anthropic with 5GW of computing capacity over the next five years, and Anthropic is reportedly considering an IPO within the year at a potential valuation above $800 billion.

  • The investment deepens a relationship that is simultaneously a partnership and a rivalry. Anthropic is one of Google Cloud's largest customers, running its models on Google TPUs, while Gemini competes directly with Claude across enterprise accounts.

Why it matters: 

Two of the world's largest cloud providers are now deeply invested in the same AI security company behind the most restricted frontier model in production. For CISOs evaluating AI vendors, the capital concentration around Anthropic reflects where enterprise AI is heading, and raises a straightforward question: as Google and Amazon both deepen their financial stake in Anthropic, what does vendor lock-in look like when your AI security provider and your cloud provider are the same entity?

What´s next?

Thanks for reading! If this brought you value, share it with a colleague or post it to your feed. For more curated insight into the world of AI and security, stay connected.