The AI Trust Letter
Posts
NIST Launches AI Agent Standards Initiative

NIST Launches AI Agent Standards Initiative

Top AI and Cybersecurity news you should check out today

Rodrigo Fernandez
February 23, 2026

Welcome Back to The AI Trust Letter

Once a week, we distill the most critical AI & cybersecurity stories for builders, strategists, and researchers. Let’s dive in!

🏛️ The U.S. Pushes for Standards in Autonomous AI Systems

The Story:

The U.S. National Institute of Standards and Technology (NIST) has launched the AI Agent Standards Initiative (AISI), a new effort to develop technical standards and evaluation methods for AI agents. The goal is to define how autonomous and semi-autonomous systems should be assessed for safety, reliability, and risk.

The details:

The initiative focuses on AI agents that can plan, take actions, use tools, and interact with external systems with limited human oversight
NIST aims to develop benchmarks, testing methodologies, and guidance for evaluating agent behavior, decision-making, and robustness
The effort aligns with the broader U.S. AI Safety Institute program and builds on NIST’s existing AI Risk Management Framework
Collaboration with industry, academia, and government stakeholders is expected to shape practical evaluation standards

Why it matters:

AI agents introduce a different risk profile than traditional LLM chat interfaces. Once models can execute tasks, call APIs, or interact with enterprise systems, failure modes shift from harmful text output to operational and security impact.

Standardizing how agents are tested is critical. Organizations need ways to measure autonomy boundaries, tool misuse, unintended actions, and resilience against prompt manipulation. Without common benchmarks, claims about “safe agents” remain difficult to verify.

For companies deploying AI agents in production, this initiative signals where compliance expectations and evaluation norms are heading.

Read full article

👀 The $1.78M Moonwell Incident and the Future of Agentic Security

The Story:

In February 2026, the decentralized finance (DeFi) lending protocol Moonwell lost about $1.78 million after a smart contract error led to an incorrect price feed for Coinbase Wrapped ETH (cbETH). The contract involved logic co-authored with Anthropic’s Claude Opus 4.6.

The details:

A governance proposal (MIP-X43) deployed code that misconfigured the cbETH price oracle by reading only the asset’s exchange ratio instead of multiplying by its USD price feed. This made the oracle report cbETH at roughly $1.12 rather than its actual ~$2,200 market price.
The undervaluation triggered a rapid liquidation cascade as automated bots repaid small debts to seize large amounts of collateral.
Moonwell’s team lowered borrowing and supply caps for cbETH to limit further damage, but substantial losses had already occurred.
Repository commits show parts of the vulnerable code were co-authored by Claude Opus 4.6, drawing scrutiny over AI-assisted code in security-critical environments.

Why it matters:

This incident highlights how simple logic errors in AI-assisted code can lead to severe financial consequences when deployed in live systems. Standard tools and human review did not catch the flaw before deployment, showing a gap in current development and verification workflows for AI-generated logic.

For organizations using AI to assist development, this event underscores the need for rigorous, adversarial testing and governance processes that treat every line of AI-produced code as a potential risk vector.

Read full article

🚨 OpenClaw Installed Through Hijacked npm Package

The Story:

A compromised npm package briefly installed OpenClaw on developer machines without their knowledge. The malicious version was available for several hours before it was removed, but anyone who installed it during that window may have unintentionally added the tool to their system.

The details:

Attackers gained access to a package publishing token and released a tampered version of a popular CLI tool
The altered version included a hidden post-install script that automatically installed OpenClaw globally
The main functionality of the package appeared normal, making the change difficult to detect
The issue was identified within hours and a clean version was published, with guidance for affected users
Developers who installed the compromised version are advised to update and check whether OpenClaw was added to their environment

Why it matters:

This incident shows how software supply chain attacks continue to target developer workflows. A single compromised credential can lead to silent changes that spread quickly through automated installs and CI pipelines.

Even when the added software is not directly malicious, unauthorized installations create risk. Developer machines often have access to source code, credentials, production systems, and internal tools. Any unexpected component increases the attack surface.

For teams building AI systems and agent-based tools, this is also a reminder that local environments are part of the security perimeter. Dependency controls, package verification, and monitoring of install scripts are no longer optional safeguards.

Read full article

🤖 Google Rolls Out Gemini 3.1 Pro for Complex Problem Solving

The Story:

Google has announced Gemini 3.1 Pro, the latest update in its Gemini AI lineup. The model is designed for tasks that require more than simple responses, with a focus on reasoning through complicated challenges. The new version is being released in preview across multiple platforms for developers, business users, and consumers.

The details:

Gemini 3.1 Pro builds on the Gemini 3 series and incorporates the core reasoning improvements introduced in the Deep Think variant.
In benchmark testing, the model achieved a 77.1 % score on the ARC-AGI-2 reasoning test, more than double the performance of the previous Gemini 3 Pro on the same benchmark.
Google positions the model for real use cases such as data synthesis, detailed explanations of complex topics, and multi-step workflows that require planning and contextual thinking.
3.1 Pro is rolling out in preview across the Gemini app and NotebookLM for AI Pro and Ultra users, along with the Gemini API, Vertex AI, and developer tools like Antigravity and Android Studio.
The preview release will allow Google to refine the model’s performance and expand support for more advanced agentic workflows before general availability.

Why it matters:

This update highlights how leading AI models are evolving from simple question-answer systems into tools that can handle structured reasoning and complex tasks. Improved reasoning performance means the model can synthesize large volumes of information, plan workflows, and provide comprehensive results with less human guidance.

For teams building AI-augmented products, higher reasoning capability can reduce manual oversight in data analysis, research support, and creative problem solving.

Read full article

🎨 Image AI Safety Bypassed by “Semantic Chaining” Technique

The Story:

Researchers have uncovered a new jailbreak method called semantic chaining that can bypass safety filters in major image generation models such as Grok 4, Gemini Nano Banana Pro, and others. Instead of triggering warnings with a direct harmful prompt, attackers break a request into a sequence of “safe” steps that collectively lead to prohibited results.

The details:

Semantic chaining spreads a harmful request across multiple prompts. Each prompt appears harmless on its own, so safety checks never block the sequence.
The method works by first asking the model to generate a benign image, then making incremental changes that gradually embed sensitive or forbidden content.
Because many models apply safety filters to individual prompts or tokens and not to the full chain, the final output can contain content that would be blocked if requested outright.
The technique can also turn image generation into a workaround for text-based filters by embedding instructions pixel by pixel into images.

Why it matters:

This issue shows that existing safety layers in multimodal AI systems do not effectively track intent across a series of interactions. Models may treat edits to an existing image as less risky than generating new content, but attackers can exploit that assumption to slip harmful outputs past safeguards.

For teams using AI image generators, this highlights the need for safety mechanisms that analyze entire instruction sequences rather than each prompt in isolation. Without deeper intent tracking and cross-prompt analysis, current defenses may fail to prevent misuse of powerful models in real use cases.

Read full article

What´s next?

Thanks for reading! If this brought you value, share it with a colleague or post it to your feed. For more curated insight into the world of AI and security, stay connected.

NeuralTrust | The leading security platform for generative AI

Our platform uncovers vulnerabilities, blocks attacks, monitors performance, and ensures regulatory compliance — everything enterprises need to scale AI

neuraltrust.ai