AgentPoison: a stealthy backdoor that poisons agent memories or RAG to hijack LLM agents
If agents fetch data from third-party or writable corpora, an attacker can inject a few poisoned records to trigger dangerous actions while leaving overall accuracy unchanged, creating a low-noise safety and legal risk.
Key finding
AGENTPOISON forces retrieval of poisoned demonstrations with high probability.
Numbers: Average ASR-r ≈ 81.2% (retrieval success)

