Threat Brief

RAG poisoning: field notes from 38 incidents

Ramiz RafiqFounder, Guardra AIFebruary 28, 20266 min read

In the last 18 months Guardra Labs has investigated 38 RAG-poisoning incidents across finance, healthcare, and platform SaaS. The pattern is consistent enough that we can describe the full attack lifecycle in one post.

Step one: the attacker identifies what a target's agent is likely to retrieve. Public docs, wiki pages, support forums, vendor data, SEO-indexed web content — anything that feeds your index. The reconnaissance is cheap; tools like SerpAPI and public robots.txt tell you everything you need.

Step two: the attacker plants content in a location they know will be indexed. The content carries a payload disguised as benign text: a footnote claiming a different API endpoint, an appendix citing a different support number, instructions embedded as legitimate-looking examples.

Step three: the victim's agent retrieves the poisoned chunk as part of a retrieval pipeline. Because the content is embedded as 'authoritative,' the LLM treats it with higher trust than a raw user message. Injection success rates we've measured average 73% against unguarded agents.

Step four: exfiltration or misdirection. Users are sent to attacker-controlled URLs. Support bots quote wrong numbers. Internal tools link to lookalike domains. The incidents we've investigated caused a median of $340K in direct damage and much more in brand.

Defense is layered. Validate documents at ingestion time — scan for instruction-shaped content. Sign trusted sources and downgrade unsigned retrievals. Run an adversarial retrieval test as part of your nightly eval. And when a retrieved chunk does contain instruction-shaped content, treat it as hostile until proven otherwise.

Ready to audit?

Run Guardra on your agent in 60 seconds.

Try the live demo