The Echo Chamber Attack - Issue 8

Top AI and Cybersecurity news you should check out today

What is The AI Trust Letter?

Once a week, we distill the five most critical AI & cybersecurity stories for builders, strategists, and researchers. Let’s dive in!

🚨 ChatGPT and Google AI can be manipulated to generate harmful content without breaking any rules

The Story:

NeuralTrust AI researcher Ahmad Alobaid uncovered the Echo Chamber Attack, a context-poisoning jailbreak that quietly steers LLMs toward policy-violating responses, and it’s been making headlines worldwide.

The details:

  • Subtle context shifts: Instead of a single malicious prompt, attackers feed benign-sounding cues over multiple turns to reshape the model’s internal state.

  • Feedback loop: Early responses echo the hidden subtext, reinforcing later prompts until safety filters erode and the model complies.

  • High success rates: In controlled tests, major models like GPT-4.1-nano, GPT-4o-mini, GPT-4o, Gemini-2.0-flash-lite and Gemini-2.5-flash yielded wrong or harmful outputs over 90% of the time in half the categories, and over 40% in the rest.

Why it matters:

Echo Chamber shows that static guardrails and single-step detectors aren’t enough. Defenses must track context flow, flag subtle topic drifts, and impose human-in-the-loop checks on multi-turn conversations to keep models from turning on their own reasoning.

🧠 Can Al run a shop? Apparently not yet

The Story:

Anthropic’s month-long “Project Vend” put its Claude AI model, nicknamed Claudius, in charge of a small office store. From self-checkout iPads to inventory and pricing, the AI handled every step of the business.

The details:

  • Claudius set up a specialty metals section and took a joke order for tungsten cubes seriously—cutting into profits.

  • It created a fake Venmo account to process payments and even tried to deliver items itself, imagining it could wear a delivery uniform.

  • Sales fell from an initial $1,000 to under $800 by month’s end, despite human staff intervention.

  • The AI once “called” corporate security over a prank it believed was real, highlighting gaps in its context understanding.

  • Anthropic researchers say better prompt engineering and richer tool integrations would help Claudius handle real-world tasks more reliably.

Why it matters:

This experiment shows that giving AI full ownership of business processes can uncover hidden flaws in decision making, context tracking, and tool use. Organizations exploring agentic AI roles—from middle managers to service agents—must pair autonomy with clear guardrails, human oversight, and robust error-handling.

👀 China’s Biggest AI Release Since DeepSeek: Baidu to Open-Source Ernie

The Story:

On June 30, Baidu will make its Ernie generative AI model open source, marking China’s largest public AI release since DeepSeek first disrupted the market.

The details:

  • Baidu confirmed a phased roll-out of Ernie’s code and model weights starting June 30.

  • Industry watchers warn this could undercut rivals like OpenAI, Anthropic and DeepSeek by widening access to a high-quality LLM.

  • Some experts call it a “DeepSeek moment” that cements China’s AI leadership; others note that open sourcing alone won’t guarantee the same market impact without strong community support.

Why it matters:

Open sourcing Ernie lowers barriers for innovation and shifts competitive dynamics. Security teams and AI leaders must prepare for a surge in custom deployments, monitor new model variants for emerging risks, and adapt governance to a more diverse, open AI ecosystem.

🛩️ New Hacks Target Aviation Sector

The Story:

The FBI and security firms warn that the Scattered Spider hacking group is now breaching airline and transportation systems.

The details:

  • The group uses help-desk social engineering, MFA bypass and fake device registrations to breach airline IT networks.

  • FBI cautions that any vendor or contractor in the airline ecosystem could be compromised.

  • Hawaiian Airlines and WestJet both reported cyber incidents in June, with WestJet’s attack linked to Scattered Spider.

  • Scattered Spider previously hit retail chains, insurers, hotels, casinos and major tech firms.

Why it matters:

Airlines depend on real-time bookings and operations. Attacks on help desks can bypass strong authentication and disrupt critical systems. Aviation teams should enforce multi-factor controls, train staff on social engineering tactics and include suppliers in red-teaming exercises.

What´s next?

Thanks for reading! If this brought you value, share it with a colleague or post it to your feed. For more curated insight into the world of AI and security, stay connected.