How to audit an AI agent in 2026
Most AI security advice is theater. It tells you to 'validate your prompts' without defining a pass/fail test, or to 'monitor for prompt injection' without specifying which of the 42 documented injection families to monitor for. This piece is different. It's what I'd hand to a new hire on day one.
An AI agent has five attack surfaces: prompts, memory, tools, outputs, and the code that stitches them together. In that order of likelihood — and in that order of impact. A prompt-injection that makes your bot curse is embarrassing; a tool-misuse that wires funds to an attacker is career-ending.
Start with tools. Enumerate every function your agent can call. For each one, write down: who can invoke it, what arguments it accepts, what happens if the arguments are maximum, what happens if invoked 100x/sec, and what happens if the return value is attacker-controlled. Most teams skip step five — that's the confused-deputy vector.
Then look at memory. Long-term memory is a persistence mechanism for attackers. Anything that was true about a past user is now a starting condition for future users. Assume your memory store is a database that will eventually be queried with adversarial inputs — because it will be.
Prompts come third, and they're the loudest but rarely the most dangerous. Direct injection is mostly solved by well-scoped system prompts. Indirect injection — attacks arriving via documents your agent retrieves — is the real battlefield. If your RAG index accepts any user-provided content, you have a prompt-injection surface whether you want one or not.
Outputs are where regulated industries get hurt. An output with a fabricated URL, a hallucinated API endpoint, or an unsafe code snippet is a liability. Evaluate outputs the way you evaluate code: deterministically, with rules, against a corpus of negative examples.
Code is last because it's the most understood. SAST, SCA, secret scanning — you've done this before. The twist in 2026 is that 46% of your code is AI-generated, and AI-generated code carries 40% more vulnerabilities per line than human-written. Budget for that.