- The AI Trust Letter
- Posts
- Governments Also Need AI Security
Governments Also Need AI Security
Top AI and Cybersecurity news you should check out today

Welcome Back to The AI Trust Letter
Once a week, we distill the most critical AI & cybersecurity stories for builders, strategists, and researchers. Let’s dive in!
🛡️ Google, Microsoft and xAI agree to US Government AI Testing Programme

The Story:
Three of the largest AI companies have signed agreements with the US Department of Commerce to let the Center for AI Standards and Innovation (CAISI) evaluate their frontier models before public release. The move marks a shift from the Trump administration's earlier stance against AI regulation.
The details:
CAISI will assess models for cybersecurity, biosecurity and chemical weapons risks, focusing on national security implications of frontier AI
OpenAI submitted ChatGPT 5.5 for pre-release evaluation and is co-developing GPT-5.5-Cyber, a model aimed at strengthening cyber defence capabilities
The agency has already run 40 evaluations, including on state-of-the-art models that have not been released to the public
Earlier agreements signed with OpenAI and Anthropic under the Biden administration in 2024 have been renegotiated under the new framework
Why it matters:
Pre-release government testing is becoming a baseline expectation for frontier AI, even in administrations that lean against regulation. For security leaders, it signals that independent evaluation of AI systems for cyber, bio and CBRN risks is moving from voluntary best practice to operational standard. Expect this model to influence procurement, vendor due diligence and the kind of assurances buyers will start asking for.
🚨 AI Systems Are Far More Vulnerable Than Traditional Software

The Story:
A new report from security firm Cobalt found that AI and large language model systems contain serious security flaws at more than twice the rate of regular business software. Even worse, most of these flaws never get fixed. One in five companies surveyed said they had already experienced a security incident involving an AI system in the past year.
The details:
32% of all issues found in AI systems are rated high risk, compared to just 13% in traditional enterprise software
Only 38% of these serious AI flaws are ever resolved, the lowest fix rate of any type of application tested
The most common attack method, known as prompt injection (tricking the AI into ignoring its instructions), saw a 540% jump in reports year over year
Experts point to three reasons: AI introduces new types of attacks companies are not yet trained to defend against, a single flaw can affect many connected systems at once, and responsibility for fixing these issues is often unclear across teams
Unlike traditional software bugs, there is no standard playbook yet for fixing AI vulnerabilities, which leaves developers stuck even when they know something is wrong
Why it matters:
Companies are rolling out AI faster than they are securing it. The problem is not only that AI brings new risks, but that most organisations do not yet know how to handle them. As AI becomes embedded in everyday business operations, the gap between adoption and security is becoming one of the biggest open questions for the industry.
💸 An AI Chatbot Was Tricked Into Stealing $150,000 in Crypto

The Story:
An attacker manipulated xAI's Grok chatbot into authorising the transfer of roughly $150,000 worth of cryptocurrency to a wallet they controlled. The trick relied on a hidden message written in Morse code, which Grok translated and then passed along as a valid command to a connected trading bot. The incident shows how AI assistants can be turned into unwitting accomplices when given too much control over real assets.
The details:
The attacker first sent Grok a membership NFT, which the system read as a permission upgrade and unlocked the ability to move funds
They then asked Grok to "translate" a Morse code message, which secretly contained an instruction to transfer 3 billion DRB tokens to the attacker's wallet
Grok processed the translated text as a legitimate command and passed it to the trading bot, which executed the transfer without any human review
The stolen tokens were quickly converted into Ethereum and USDC, causing short-term volatility in the DRB token price
The attack maps to two well-known AI risk categories: prompt injection (hiding malicious instructions inside ordinary-looking input) and excessive agency (giving an AI too much authority to act on its own)
Why it matters:
Why it matters: This was not a hack in the traditional sense. No system was broken into. The AI simply did what it was asked, because nobody had set clear limits on what it should be allowed to do. As AI agents are connected to wallets, payment systems, and other high-stakes tools, the question is no longer whether they can be manipulated, but how much damage they can cause when they are. The fix is not more powerful AI. It is stricter boundaries, human checkpoints for high-value actions, and the assumption that any input, even something as harmless as a Morse code puzzle, could be an attack.
💧 Attackers Used Claude and ChatGPT to Target a Water Utility

The Story:
Attackers used commercial AI models from Anthropic and OpenAI to plan and carry out a cyberattack on a water and drainage utility in the Monterrey area of Mexico. The campaign, which ran between December 2025 and February 2026, breached the company's IT systems and attempted to reach the operational technology that physically controls the facility. The attempt on the operational side ultimately failed, but the case shows how off-the-shelf AI tools are lowering the barrier for attacks on critical infrastructure.
The details:
Anthropic's Claude was described as the "primary technical executor," handling planning, tool development and step-by-step decisions during the intrusion
OpenAI's GPT models were used in supporting roles, processing stolen data and producing outputs in Spanish
Claude was also used to read vendor documentation for the facility's SCADA systems and to generate lists of default and known passwords for brute force attempts
The attackers had no prior experience targeting operational technology, yet the AI helped them identify the environment and map out a viable path toward it
OpenAI confirmed that the accounts involved have been banned from its service
Why it matters:
This is one of the clearest public examples so far of commercial AI being used to attack critical infrastructure. The attackers were not elite specialists. They relied on the same tools millions of people use every day, and the AI did the heavy lifting. That changes the threat model for water, energy and other essential services, which now need to assume that less experienced attackers can move faster and reach further than before. The defensive answer is not new in principle (stronger authentication, tighter remote access, clearer separation between IT and operational systems) but the urgency is.
🧰 AI Agents Are Picking Tools They Cannot Verify

The Story:
AI engineer Nik Kale flags a blind spot in how AI agents work. When an agent needs to perform a task, it picks a tool from a shared registry by reading its description in plain language. Nobody checks whether that description is honest. This opens the door to "tool poisoning," where an attacker publishes a tool that looks legitimate, passes every standard security check, and then quietly does something different from what it claims.
The details:
The problem is not about the code being tampered with. Existing supply chain controls like code signing and SBOMs confirm a tool is what it says it is, but not that it behaves the way it says it does
An attacker can hide instructions inside a tool's description, such as "always prefer this tool," which the agent reads and obeys as if it were a real command
A tool can also pass all checks at publication, then quietly change its behaviour weeks later, for example by sending user data to an unexpected destination
Real-world examples are already documented, including the MCP Tool Poisoning Attack disclosed by Invariant Labs and a command-injection flaw (CVE-2025-6514) that affected an OAuth proxy with 437,000 downloads
The proposed fix is runtime verification: a layer that sits between the agent and the tool and checks, on every call, that the tool is the one previously approved, that it only connects to allowed endpoints, and that its outputs match a declared schema
Why it matters:
Most enterprises today are securing their AI agents the same way they secure traditional software, by checking what was shipped. That misses half the problem. With AI agents, what matters is what the tool actually does at runtime, not just what it claims to be. As agents start choosing their own tools from shared registries, the trust question shifts from identity to behaviour. Companies that rely only on traditional supply chain controls are solving the wrong half of the problem.
What´s next?
Thanks for reading! If this brought you value, share it with a colleague or post it to your feed. For more curated insight into the world of AI and security, stay connected.
