AI Hallucinations, Cyberattacks & MCPs - Issue 2

Top AI and Cybersecurity news you should check out today

What is The AI Trust Letter?

Once a week, we distill the five most critical AI & cybersecurity stories for builders, strategists, and researchers. Let’s dive in!

🧠 AI Hallucinations Threaten Cybersecurity

The Story:
AI models can invent vulnerabilities or misinterpret threat data, leading security teams to chase false leads or miss real risks.

The details:

  • AI models may suggest fake software packages, a risk known as “slopsquatting”, that attackers exploit by publishing malicious clones under those names.

  • AI can generate false threat reports that divert teams from genuine incidents unless every output is validated against trusted sources.

  • Junior developers may integrate erroneous code without realizing it, while seniors risk over-relying on AI without auditing its suggestions.

Why it matters:
Unchecked hallucinations waste scarce SecOps resources and expose blind spots. To prevent this, build a structured trust framework that validates inputs and outputs, attach metadata for traceability, ground AI in vetted data through retrieval-augmented generation, incorporate hallucination-detection tools during testing, and require human sign-off before any high-stakes action.

🚀 Open AI Introducing Codex: A Coding Agent

The Story:
OpenAI launched Codex, a cloud-based agent that runs multiple coding tasks in parallel in isolated sandboxes.

The details:

  • Perform tasks like writing features, fixing bugs, answering questions, and proposing pull requests

  • Each task runs in its own environment preloaded with your repository

  • Provides citations of terminal logs and test results for every change

  • Guided by AGENTS.md files so it follows your project’s commands and standards

  • Benchmarks show codex-1 outperforms prior models on real engineering tasks

Why it matters:
As AI joins CI/CD pipelines, teams must review and audit both the code and the agents that write it. Implement checks for agent outputs, verify changes before merge, and adapt workflows to include AI-driven steps safely.

🕵️‍♂️ Agentic AI Turbocharges Cyberattacks

The Story:
Threat actors are using autonomous AI tools to plan and run multi-step attacks with minimal human input.

The details:

  • AI translation tools improve ransomware negotiations by helping criminals ask for higher payments

  • AI assistants sweep compromised networks for credentials, cutting reconnaissance from days to minutes

  • Deepfakes impersonate employees in helpdesk scams to bypass security checks

Why it matters:
Agentic AI makes attacks faster and harder to detect. Defenders can fight fire with fire by using AI for traffic monitoring and threat analysis. Core detection and response practices still win the day when AI tools are treated as force multipliers, not replacements.

💡 Meet MCP: AI’s New Superpower

The Post:
Model Context Protocol (MCP) is an open standard that lets LLMs plug into external APIs, databases, and services, with no custom adapters required. And it’s reshaping the AI landscape.

The details:

  • Universal integration: One MCP connection unlocks access to multiple tools (calendars, CRMs, analytics, etc.)

  • Three roles:

    • Host runs the AI model environment

    • Client wraps model calls into MCP requests

    • Server exposes tool endpoints and handles auth

  • Key use cases: real-time data queries, multi-turn context, autonomous actions (send emails, trigger workflows)

  • Stats: 1,600+ MCP servers available; 16,000+ GitHub stars

Why it matters:
MCP accelerates integrations and fuels AI workflows by giving models on-demand data access, automated actions, and seamless context continuity under one scalable security framework.

But every live connection expands your attack surface, so enforce role-based access, maintain continuous monitoring, and apply strict governance to harness real-time AI safely.

📶 Crescendo Attack: Gradual Prompt Injection Exposed

The Crescendo attack is a sophisticated prompt injection technique that incrementally guides an LLM toward producing restricted or harmful outputs without triggering immediate rejection or safety filters. Instead of asking for a sensitive response directly, the attacker gradually escalates the conversation, exploiting the model’s tendency to comply when prompts are framed benignly.

  • A series of benign questions incrementally shifts to restricted requests, avoiding immediate safety triggers

  • A backtracking loop tweaks refused prompts and retries until the model complies or a retry limit is reached

  • Tested on five LLMs (Mistral, Phi-4-mini, DeepSeek-R1, GPT-4.1-nano, GPT-4o-mini) across eight harmful categories

Why it matters:
Gradual prompt escalation can slip past static safety filters and one-off guards. To defend against adaptive adversaries, implement real-time semantic firewalls, continuous red teaming, and prompt-escalation detection.

😂 Our meme of the week

The peace of mind only a real AI Gateway can deliver:

What´s next?

Thanks for reading! If this brought you value, share it with a colleague or post it to your feed. For more curated insight into the world of AI and security, stay connected.