Topic

Agent Security

Controls and attack paths for browsing, tool use, memory, identity, and action-taking agents.

agent securityai agentstool securitymemory poisoningaction approvals
Evergreen Overview

Agent security is about what happens when AI systems can browse, use tools, remember state, and take actions across multiple steps. The security boundary moves from a chat response to a longer workflow with identity, permissions, memory, and operational consequences.

Core agent security questions
  • What tools the agent can reach and under which identity
  • How memory, plans, and previous steps influence later actions
  • What approvals or reversibility exist when the agent gets it wrong
Common failure patterns
  • Unsafe tool use and hidden privilege expansion
  • Prompt injection flowing into planning and execution
  • Long-running workflows accumulating risky state or momentum
Who this page is for
  • Teams shipping assistant-to-agent product transitions
  • Practitioners studying autonomy and tool use
  • Operators responsible for controls around high-impact actions
References

Current notes, events, and source material

These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.

Claude Fable Blocked - 11 Quiet Details on What’s Next video thumbnail Play video
AI Explained YouTube June 14, 2026 video

Claude Fable Blocked - 11 Quiet Details on What’s Next

Claude Fable 5 banned, but what’s the bigger story. We go through 11 under-reported details, so you have the context to see what’s coming next for your use of AI. From whether the ban will last, what the possible motives are, what the model can actually do, and some wild over-extrapolations going on. Check out my fast-

The Hacker News AI Security June 11, 2026 news

New Attacks Trick OpenClaw AI Agent Into Running Code and Leaking Secrets

Two security teams have shown, in separate research published this week, that OpenClaw, the popular self-hosted AI agent, can be driven to run attacker-controlled code or hand over sensitive data through ordinary-looking inputs. Imperva buried instructions inside shared contacts, vCards, and location pins that the agen

The Hacker News AI Security June 10, 2026 news

Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards

On June 9, Anthropic released Claude Fable 5, the most capable model it has ever made, generally available. It also did something unusual: it shipped one model as two products, split not by capability but by a layer of safety classifiers. Fable 5 goes to the public. Its twin, Claude Mythos 5, the same underlying model

Microsoft Security Blog June 5, 2026 news

Securing CI/CD in an agentic world: Claude Code Github action case

Microsoft Threat Intelligence identified a prompt injection pathway in Claude Code GitHub Action that allowed access to workflow secrets under specific conditions. This research examines the attack chain, responsible disclosure process, Anthropic's mitigation, and guidance for securing AI-powered CI/CD workflows. The p

Microsoft Security Blog June 4, 2026 news

Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us

A surge in real-world attacks against agentic AI systems is reshaping how we think about risk. Based on 12 months of red teaming, this update introduces seven new failure modes, from supply chain compromise to goal hijacking, and the practical mitigations teams need now. The post Updating the taxonomy of failure modes

Microsoft Security Blog June 3, 2026 news

Preinstall to persistence: Inside the Red Hat npm Miasma credential-stealing campaign

A large-scale npm supply chain attack compromised over 90 versions of @redhat-cloud-services packages, silently infecting CI/CD environments and developer systems. The malicious code steals credentials from GitHub, cloud platforms, and local machines, then spreads like a worm by republishing trusted packages. Discover

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies video thumbnail Play video
AI Explained YouTube April 24, 2026 video

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies

GPT 5.5 full analysis, plus DeepSeek V4 paper highlights, comparisons with Mythos, a vibe-coded game w/ GPT Image 2, and 50 data-points you wouldn’t get from just reading the headlines. https://80000hours.org/aiexplained Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcounci

Anthropic Frontier Red Team April 7, 2026 news

Assessing Claude Mythos Preview’s cybersecurity capabilities

Claude Mythos Preview is a new general-purpose language model that is strikingly capable at computer security tasks. This post provides technical details for researchers and practitioners who want to understand exactly how we have been testing this model, and what we have found over the past month. We hope this will sh