The AI Trust Letter
Posts
We jailbroke GPT-5 in 24 hours

We jailbroke GPT-5 in 24 hours

Top AI and Cybersecurity news you should check out today

Rodrigo Fernandez
August 11, 2025

What is The AI Trust Letter?

Once a week, we distill the most critical AI & cybersecurity stories for builders, strategists, and researchers. Let’s dive in!

🚨 New GPT-5 has major vulnerabilities

The Story:

GPT-5 was released this week by OpenAI, but had a “bumpy” rollout. We discovered a jailbreak path for GPT-5-Chat by pairing our Echo Chamber algorithm with narrative steering. Instead of asking for anything explicitly unsafe, we primed a benign story, let the model echo that context across turns, and used continuity pressure to move it toward the target.

The details:

We seed low-salience keywords inside harmless text, then ask for small elaborations that keep the conversation “in-story.” The model echoes and strengthens the poisoned context on each turn.
When progress stalls, we change the story’s stakes or switch perspective. This resets momentum without surfacing obvious refusal triggers.
The approach adapts our earlier Echo Chamber work. Here the “story” layer increases stickiness because the model tries to stay consistent with its own narrative.
In a controlled test on a hazardous objective, the model advanced to a stepwise description inside the story frame. We redacted operational details in the post.

Why it matters:

Conversation-level context poisoning plus storytelling can bypass intent detection without a single overtly malicious prompt, even in the latest model like GPT-5.

Teams should monitor for persuasion cycles, track context drift over the whole thread, and gate production apps with policy and anomaly detection at the session level, not only at the prompt level. Regular red teaming against narrative-based attacks belongs in every release cycle.

Read full article

🔐 Hacker to Remotely Unlock Cars Worldwide

The Story:

A hacker exploited major vulnerabilities in a carmaker’s web portal, enabling remote, unauthorized vehicle unlocking from anywhere.

The details:

Weak authentication and missing security checks let a single user access all vehicles in the system.
The researcher notified the manufacturer, who disabled vulnerable functions but did not offer a detailed disclosure.
The incident raises urgent questions for IoT and automotive cybersecurity, with real-world safety risks.

Why it matters:

As more vehicles go online, remote exploitation risks multiply, making security and transparency non-negotiable for product trust.

Read full article

🧠 Zero-Click AI Prompt Injection Threatens Major AI Assistants

The Story:

At Black Hat, researchers showed that a single “poisoned” input can make an AI agent follow hidden instructions without any extra clicks. A file, email, or ticket can quietly tell the agent to search connected sources and leak what it finds by tucking data into an image link. Vendors shipped some fixes after disclosure, but the core weakness remains

The details:

A hidden prompt inside a document told an agent to look through connected drives for secrets, then include the results inside an image URL. When the agent fetched the image, the URL carried the secrets out.
Similar tricks worked through routine channels like incoming emails or tickets that the agent processes on its own. The payloads asked the agent to reveal what tools it can access and to pull customer information.
Blacklist-style filters helped in some cases, but were easy to phrase around. The researchers demonstrated working exploits, not theory.

Why it matters:

If your agent reads from shared drives, inboxes, calendars, or ticket systems, attackers can steer it with content that looks harmless. The risk is session-level and workflow-level, not just keyword-level.

Read full article

🏛️ US federal courts tighten security after cyberattacks

The Story:

The US federal judiciary reported “escalated” and persistent cyberattacks against its case management systems and is adding extra protections while working with law enforcement. Separate reporting says the breach may have touched multiple states and included sensitive, non-public filings.

The details:

Targeted systems include the Case Management/Electronic Case Files platform used by court staff and Public Access to Court Electronic Records for the public.
Sensitive materials may have been accessed, including sealed filings and information tied to ongoing investigations. Discovery was in early July. Scope and attribution are still under review.
The judiciary says it is strengthening protections for sensitive filings and coordinating with national security agencies to reduce impact on litigants.

Why it matters:

Courts are a high-trust source for investigations, corporate litigation, and background checks. Breaches here risk exposure of informants, sealed orders, and case strategy, which can harm individuals and disrupt legal and business processes.

Check it out here

🎙️ Podcast recommendation of the week

Host Cleo Abram sits down with OpenAI CEO Sam Altman to map the near future of AI. They cover what GPT-5 adds beyond GPT-4, what superintelligence could mean, and how AI may change jobs, scientific discovery, health, and how we decide what is true. Clear questions, concrete timelines, and tradeoffs so you can understand what is coming and how to prepare.

What´s next?

Thanks for reading! If this brought you value, share it with a colleague or post it to your feed. For more curated insight into the world of AI and security, stay connected.

NeuralTrust | The leading security platform for generative AI

Our platform uncovers vulnerabilities, blocks attacks, monitors performance, and ensures regulatory compliance — everything enterprises need to scale AI

neuraltrust.ai